Activity Feed

AI & ML interests

None defined yet.

Recent Activity

seyf1elislamΒ 
posted an update 8 days ago
view post
Post
650
# πŸš€ Run Qwen3-TTS on Colab GPU or Locally

Run **Qwen3-TTS (Text-to-Speech & Voice Cloning)** with minimal effort. This setup is based on the official HF Space.

### πŸ”— Links
* **Official Space:** Qwen/Qwen3-TTS
* **GitHub Repo:** https://github.com/seyf1elislam/qwen-tts-webui-notebook
* **Colab:** https://github.com/seyf1elislam/qwen-tts-webui-notebook/blob/main/Qwen_TTS_(TTS_%26_Voice_Cloning)_Colab.ipynb

---

### πŸ““ Method 1: Google Colab (Fastest)
1. Open the https://github.com/seyf1elislam/qwen-tts-webui-notebook/blob/main/Qwen_TTS_(TTS_%26_Voice_Cloning)_Colab.ipynb.
2. Add your HF_TOKEN to Google Colab Secrets
3. Ensure you are on a **T4 GPU** runtime.
4. Run all cells. Use the gradio.live link to open the UI.

---

### πŸ’» Method 2: Local Installation
Requires an GPU. Uses uv for faster setup.

# 1. Install uv & Clone
pip install uv
git clone https://huggingface.co/spaces/Qwen/Qwen3-TTS && cd Qwen3-TTS

# 2. Setup Environment
uv venv
uv pip install -r requirements.txt

# 3. Auth & Run
uvx hf auth login
python app.py 
# UI available at: http://localhost:7860/


BilsΒ 
posted an update about 1 month ago
view post
Post
268
We just published a workflow that automates trend-style celebrity selfie videos β€” from image generation to cinematic transitions.

This template is a mini creative factory:
β†’ Generate realistic β€œcelebrity selfie” images
β†’ Produce clean, cinematic transitions ready for Shorts/Reels
β†’ Clear structure, easy to customize for your brand

πŸ“Œ Template link:
https://n8n.io/workflows/12119-create-celebrity-selfie-images-and-transition-videos-with-gpt-4-seeddream-and-kling/
Aurelien-MorganΒ 
posted an update about 2 months ago
vikhyatkΒ 
posted an update 3 months ago
view post
Post
4189
Announcing RefCOCO-M, a refreshed RefCOCO with pixel-accurate masks and the problematic prompts removed.

moondream/refcoco-m
BilsΒ 
posted an update 3 months ago
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
1295
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Poster)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Is Less More? Exploring Token Condensation as Training-Free Test-Time Adaptation πŸ”

πŸ“ Description: Token Condensation as Adaptation (TCA) improves the performance and efficiency of Vision Language Models in zero-shot inference by introducing domain anchor tokens.

πŸ‘₯ Authors: Zixin Wang, Dong Gong, Sen Wang, Zi Huang, Yadan Luo

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Is Less More? Exploring Token Condensation as Training-free Test-time Adaptation (2410.14729)

πŸ“ Repository: https://github.com/Jo-wang/TCA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Session 1: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/session-1.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TestTimeAdaptation #TokenCondensation #VisionLanguageModels #TrainingFreeAdaptation #ZeroShotLearning #EfficientAI #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
2427
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching πŸ”

πŸ“ Description: The proposed method enhances stereo matching by efficiently combining unbiased monocular priors from vision foundation models. This method addresses misalignment and local optima issues using a binary local ordering map and pixel-wise linear regression.

πŸ‘₯ Authors: Chengtang Yao, Lidong Yu, Zhidan Liu, Jiaxi Zeng, Yuwei Wu, and Yunde Jia

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Diving into the Fusion of Monocular Priors for Generalized Stereo Matching (2505.14414)

πŸ“ Repository: https://github.com/YaoChengTang/Diving-into-the-Fusion-of-Monocular-Priors-for-Generalized-Stereo-Matching

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the 3D Pose Understanding Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/3d-pose-understanding.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #StereoMatching #MonocularDepth #VisionFoundationModels #3DReconstruction #Generalization #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
2837
πŸš€πŸ‘ŒπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ€ŒπŸš€
πŸ“„ Title: Understanding Co-speech Gestures in-the-wild πŸ”

πŸ“ Description: JEGAL is a tri-modal model that learns from gestures, speech and text simultaneously, enabling devices to interpret co-speech gestures in the wild.

πŸ‘₯ Authors: @sindhuhegde , K R Prajwal, Taein Kwon, and Andrew Zisserman

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Understanding Co-speech Gestures in-the-wild (2503.22668)

🌐 Web Page: https://www.robots.ox.ac.uk/~vgg/research/jegal
πŸ“ Repository: https://github.com/Sindhu-Hegde/jegal
πŸ“Ί Video: https://www.youtube.com/watch?v=TYFOLKfM-rM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Human Modeling Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/human-modeling.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #CoSpeechGestures #GestureUnderstanding #TriModalRepresentation #MultimodalLearning #AI #ICCV2025 #ResearchHighlight
megΒ 
posted an update 3 months ago
view post
Post
4002
πŸ€– Did you know your voice might be cloned without your consent from just *one sentence* of audio?
That's not great. So with @frimelle , we brainstormed a new idea for developers who want to curb malicious use: ✨The Voice Consent Gate.✨
Details, code, here: https://huggingface.co/blog/voice-consent-gate
  • 3 replies
Β·
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
3967
πŸš€πŸ’‘πŸŒŸ New Research Alert - ICCV 2025 (Oral)! 🌟πŸͺ„πŸš€
πŸ“„ Title: LoftUp: Learning a Coordinate-based Feature Upsampler for Vision Foundation Models πŸ”

πŸ“ Description: LoftUp is a coordinate-based transformer that upscales the low-resolution features of VFMs (e.g. DINOv2 and CLIP) using cross-attention and self-distilled pseudo-ground truth (pseudo-GT) from SAM.

πŸ‘₯ Authors: Haiwen Huang, Anpei Chen, Volodymyr Havrylov, Andreas Geiger, and Dan Zhang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models (2504.14032)

🌐 Github Page: https://andrehuang.github.io/loftup-site
πŸ“ Repository: https://github.com/andrehuang/loftup

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Foundation Models and Representation Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/foundation-models-and-representation-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #LoftUp #VisionFoundationModels #FeatureUpsampling #Cross-AttentionTransformer #CoordinateBasedLearning #SelfDistillation #PseudoGroundTruth #RepresentationLearning #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
1955
πŸš€πŸ·οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ§©πŸš€
πŸ“„ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening πŸ”

πŸ“ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

πŸ‘₯ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

πŸ“Ί Video: https://www.youtube.com/watch?v=kAyK_3wskgA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 2 replies
Β·
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
4815
πŸš€πŸ€–πŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ€–πŸš€
πŸ“„ Title: Variance-based Pruning for Accelerating and Compressing Trained Networks πŸ”

πŸ“ Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

πŸ‘₯ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
DmitryRyuminΒ 
posted an update 3 months ago
view post
Post
3020
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Token Activation Map to Visually Explain Multimodal LLMs πŸ”

πŸ“ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

πŸ‘₯ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

πŸ“ Repository: https://github.com/xmed-lab/TAM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
  • 2 replies
Β·
s3nhΒ 
posted an update 4 months ago
view post
Post
647
Eduhelp with more empathy, based on model finetuned on
psychotheraputic preferences just landed on


Beck-8B as a base model, 13000 steps on educational dataset.
Time to go further and build more πŸ₯°
s3nh/EduHelp_Beck_8B
Thanks to @basilic_ai for computations <3