Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

aiqtech

posted an update 2 days ago

Post

1639

🔥 HuggingFace Heatmap Leaderboard
Visualizing AI ecosystem activity at a glance

aiqtech/Heatmap-Leaderboard

🎯 Introduction
A leaderboard that visualizes the vibrant HuggingFace community activity through heatmaps.

✨ Key Features
📊 Real-time Tracking - Model/dataset/app releases from AI labs and developers
🏆 Auto Ranking - Rankings based on activity over the past year
🎨 Responsive UI - Unique colors per organization, mobile optimized
⚡ Auto Updates - Hourly data refresh for latest information

🌍 Major Participants
Big Tech: OpenAI, Google, Meta, Microsoft, Apple, NVIDIA
AI Startups: Anthropic, Mistral, Stability AI, Cohere, DeepSeek
Chinese Companies: Tencent, Baidu, ByteDance, Qwen
HuggingFace Official: HuggingFaceH4, HuggingFaceM4, lerobot, etc.
Active Developers: prithivMLmods, lllyasviel, multimodalart and many more

🚀 Value
Trend Analysis 📈 Real-time open source contribution insights
Inspiration 💪 Learn from other developers' activity patterns
Ecosystem Growth 🌱 Visualize AI community development

@John6666 @Nymbo @MaziyarPanahi @prithivMLmods @fffiloni @gokaygokay @enzostvs @black-forest-labs @lllyasviel @briaai @multimodalart @unsloth @Xenova @mistralai @meta-llama @facebook @openai @Anthropic @google @allenai @apple @microsoft @nvidia @CohereLabs @ibm-granite @stabilityai @huggingface @OpenEvals @HuggingFaceTB @HuggingFaceH4 @HuggingFaceM4 @HuggingFaceFW @HuggingFaceFV @open-r1 @parler-tts @nanotron @lerobot @distilbert @kakaobrain @NCSOFT @upstage @moreh @LGAI-EXAONE @naver-hyperclovax @OnomaAIResearch @kakaocorp @Baidu @PaddlePaddle @tencent @BAAI @OpenGVLab @InternLM @Skywork @MiniMaxAI @stepfun-ai @ByteDance @Bytedance Seed @bytedance-research @openbmb @THUDM @rednote-hilab @deepseek-ai @Qwen @wan-ai @XiaomiMiMo @IndexTeam @agents-course
@Agents-MCP-Hackathon @akhaliq @alexnasa @Alibaba-NLP
@ArtificialAnalysis @bartowski @bibibi12345 @calcuis
@ChenDY @city96 @Comfy-Org @fancyfeast @fal @google

1 reply

tomaarsen

posted an update 2 days ago

Post

1863

‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module for asymmetric models & much more. Sparse + Dense = 🔥 hybrid search performance! Details:

1️⃣ Sparse Encoder Models
Brand new support for sparse embedding models that generate high-dimensional embeddings (30,000+ dims) where <1% are non-zero:

- Full SPLADE, Inference-free SPLADE, and CSR architecture support
- 4 new modules, 12 new losses, 9 new evaluators
- Integration with @elastic-co , @opensearch-project , @NAVER LABS Europe, @qdrant , @IBM , etc.
- Decode interpretable embeddings to understand token importance
- Hybrid search integration to get the best of both worlds

2️⃣ Enhanced Encode Methods & Multi-Processing
- Introduce encode_query & encode_document automatically use predefined prompts
- No more manual pool management - just pass device list directly to encode()
- Much cleaner and easier to use than the old multi-process approach

3️⃣ Router Module & Advanced Training
- Router module with different processing paths for queries vs documents
- Custom learning rates for different parameter groups
- Composite loss logging - see individual loss components
- Perfect for two-tower architectures

4️⃣ Comprehensive Documentation & Training
- New Training Overview, Loss Overview, API Reference docs
- 6 new training example documentation pages
- Full integration examples with major search engines
- Extensive blogpost on training sparse models

Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder

See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0

What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!

andito

posted an update about 12 hours ago

Post

672

🧠👁️ Can AI visualize solutions?

Humans often solve visual problems by sketching ideas in our minds. What if Vision-Language Models (VLMs) could do something similar, not by generating full images, but by using internal “mental sketches”?

That’s the idea behind Mirage, a new framework that empowers VLMs to reason using latent visual tokens. Instead of just thinking in words, Mirage mixes in abstract visual representations that help the model solve complex tasks.

These aren't photorealistic images. They're compact, internal representations optimized purely to support reasoning.

🔧 Mirage is trained in two phases:

1) Grounding: It learns to produce latent tokens anchored in real images.
2) Refinement: The model drops the images and learns to generate visual tokens on its own.

📈 And yes, it works!
On challenging benchmarks like Visual Spatial Planning, Jigsaw puzzles, and Spatial Attention Tasks, Mirage clearly outperforms GPT-4o and other strong baselines.
Smart sketches > empty words.

By mimicking the way humans visualize solutions, Mirage gives AI a new kind of imagination, one that’s faster, more efficient, and more human-like.
Kudos to the teams at UMass Amherst and MIT behind this exciting work.
Check the paper: Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (2506.17218)

1 reply

m-ric

posted an update 1 day ago

Post

1716

If you're using any HF libraries, you should enable the Hub MCP in your agentic coding tool!

The brand new Docs Semantic Search tool is intravenous caffeine supply for Cursor, enables to correct API errors in a few seconds, gj @mishig ⚡️⚡️

👉 To enable Hub MCP, head to your account setting, under MCP, and it will give you everything you need!

Abhaykoul

posted an update 2 days ago

Post

2588

🎉 Dhanishtha 2.0 Preview is Now Open Source!

The world's first Intermediate Thinking Model is now available to everyone!

Dhanishtha 2.0 Preview brings revolutionary intermediate thinking capabilities to the open-source community. Unlike traditional reasoning models that think once, Dhanishtha can think, answer, rethink, answer again, and continue rethinking as needed using multiple blocks between responses.

🚀 Key Features
- Intermediate thinking: Think → Answer → Rethink → Answer → Rethink if needed...
- Token efficient: Uses up to 79% fewer tokens than DeepSeek R1 on similar queries
- Transparent thinking: See the model's reasoning process in real-time
- Open source: Freely available for research and development

HelpingAI/Dhanishtha-2.0-preview
https://helpingai.co/chat

1 reply

AdinaY

posted an update 1 day ago

Post

1719

🔥 June highlights from China’s open source ecosystem.

zh-ai-community/june-2025-open-works-from-the-chinese-community-683d66c188f782dc5570ba15

✨Baidu & MiniMax both launched open foundation models
- Baidu: Ernie 4.5 ( from 0.3B -424B ) 🤯
- MiniMax: MiniMax -M1 ( Hybrid MoE reasoning model )

✨Multimodal AI is moving from fusion to full-stack reasoning: unified Any-to-Any pipelines across text, vision, audio, and 3D
- Baidu: ERNIE-4.5-VL-424B
- Moonshot AI: Kimi-VL-A3B
- Alibaba: Ovis-U1
- BAAI: Video-XL-2/OmniGen2
- AntGroup: Ming-Lite-Omni
- Chinese Academy of Science: Stream-Omni
- Bytedance: SeedVR2-3B
- Tencent: Hunyuan 3D 2.1/ SongGeneration
- FishAudio: Openaudio-s1-mini

✨Domain specific models are rapidly emerging
- Alibaba DAMO: Lingshu-7B (medical MLLM)
- BAAI: RoboBrain (Robotics)

✨ So many small models!
- OpenBMB: MiciCPM4 ( on device )
- Qwen: Embedding/Reranker (0.6B)
- Alibaba: Ovis-U1-3B
- Moonshot AI: Kimi-VL-A3B
- Bytedance: SeedVR2-3B

Jaward

posted an update 2 days ago

Post

1859

I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814

burtenshaw

posted an update 3 days ago

Post

2348

Inference for generative ai models looks like a mine field, but there’s a simple protocol for picking the best inference:

🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index

👷 fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/

🦫 Locals >> If you’re trying to stretch everything you can out of a server or local machine, use Llama.cpp, Jan, LMStudio or vLLM. https://huggingface.co/settings/local-apps#local-apps

🪟 Browsers >> If you need open models running right here in the browser, use transformers.js. https://github.com/huggingface/transformers.js

Let me know what you’re using, and if you think it’s more complex than this.

merve

posted an update 1 day ago

Post

1910

visual reasoning is now in transformers 🔥
THUDM/GLM-4.1V-9B-Thinking is just released and merged into transformers, we gave it a vibe test run 🤠

it's very good, comes with 64k context length and MIT license 😍
it supports 4k image tokens and any aspect ratio as well!
Notebook: http://colab.research.google.com/drive/1atODIiV57hOZLv16Bjzwd6fwx0yoTorj?usp=sharing
Demo: THUDM/GLM-4.1V-9B-Thinking-Demo

prithivMLmods

posted an update 2 days ago

Post

1449

The bunch of comparable demos for Multimodal VLMs (excels in OCR, cinematography understanding, spatial reasoning, etc.) now up on the Hub 🤗 — max recent till Jun'25.

✦ Demo Spaces —

> [Nanonets-OCR-s, MonkeyOCR, Typhoon-OCR-7B, SmolDocling] : prithivMLmods/Multimodal-OCR2
> [GLM-4.1v, docscopeOCR-7B, MonkeyOCR, coreOCR-7B] : prithivMLmods/core-OCR
> [Camel-Doc-OCR, ViLaSR-7B, OCRFlux-3B, ShotVL-7B] : prithivMLmods/Doc-VLMs-v2-Localization
> [SkyCaptioner-V1, SpaceThinker-3B, coreOCR-7B, SpaceOm-3B] : prithivMLmods/VisionScope-R2
> [RolmOCR-7B, Qwen2-VL-OCR-2B, Aya-Vision-8B, Nanonets-OCR-s] : prithivMLmods/Multimodal-OCR
> [DREX-062225-7B, Typhoon-OCR-3B, olmOCR-7B-0225, VIREX-062225-7B] : prithivMLmods/Doc-VLMs-OCR
> [Cosmos-Reason1-7B, docscopeOCR-7B, Captioner-7B, visionOCR-3B] : prithivMLmods/DocScope-R1

✦ Space Collection : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

.
.
.
To know more about it, visit the model card of the respective model. !!

1 reply

Recently active users