We just shipped a blog on everything latest on vision language models, including π€ GUI agents, agentic VLMs, omni models π multimodal RAG β―οΈ video LMs π€π» smol models ..and more! https://huggingface.co/blog/vlms-2025
Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages. Models: ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4 Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf
Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages. Models: ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4 Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf
Microsoft released their new fine-tuned phi-4 models with reasoning data yesterday. They outperform/rival much larger models . Check out them if you haven't yet. π
Microsoft released their new fine-tuned phi-4 models with reasoning data yesterday. They outperform/rival much larger models . Check out them if you haven't yet. π
The meta-llama org just crossed 40,000 followers on Hugging Face. Grateful for all their impact on the field sharing the Llama weights openly and much more!
We need more of this from all other big tech to make the AI more open, collaborative and beneficial to all!
β Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.) β Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B. β Three stage done while pretraining: β’ Stage 1: General language learning and knowledge building. β’ Stage 2: Reasoning boost with STEM, coding, and logic skills. β’ Stage 3: Long context training β It supports MCP in the model β Strong agent skills β Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template. β Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.
β Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.) β Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B. β Three stage done while pretraining: β’ Stage 1: General language learning and knowledge building. β’ Stage 2: Reasoning boost with STEM, coding, and logic skills. β’ Stage 3: Long context training β It supports MCP in the model β Strong agent skills β Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template. β Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.
FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.
β First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. π€ β Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. π‘ β Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?
We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.