Mert Erbak's picture

Mert Erbak PRO

merterbak

AI & ML interests

NLP and Image Processing

Recent Activity

Organizations

Open-Source AI Meetup's profile picture MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture open/ acc's profile picture AI Starter Pack's profile picture

merterbak's activity

reacted to their post with πŸ”₯ about 6 hours ago
posted an update about 6 hours ago
reacted to merve's post with πŸ”₯ 1 day ago
view post
Post
2669
VLMS 2025 UPDATE πŸ”₯

We just shipped a blog on everything latest on vision language models, including
πŸ€– GUI agents, agentic VLMs, omni models
πŸ“‘ multimodal RAG
⏯️ video LMs
🀏🏻 smol models
..and more! https://huggingface.co/blog/vlms-2025
  • 1 reply
Β·
reacted to their post with πŸš€πŸ”₯ 2 days ago
view post
Post
2045
Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages.
Models: ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4
Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master
Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf
posted an update 2 days ago
view post
Post
2045
Seed-Coder released and it's designed for coding tasks, featuring base, instruct, and reasoning variants at an 8B parameter scale developed by ByteDance Seed team. Unlike traditional open source LLMs that rely on human crafted rules or annotated data for curating code pretraining datasets Seed-Coder introduces a model-centric data pipeline. The pipeline processes raw data from GitHub and web archives into four categories: file-level codes, repository-level codes, GitHub commits, and code-related web data.A quality filter LLM, evaluates code (for readability, modularity, clarity, and reusability) by removing the lowest 10% to create a 6 trillion token dataset supporting 89 programming languages.
Models: ByteDance-Seed/seed-coder-680de32c15ead6555c75b0e4
Github: https://github.com/ByteDance-Seed/Seed-Coder/tree/master
Paper: https://github.com/ByteDance-Seed/Seed-Coder/blob/master/Seed-Coder.pdf
reacted to clem's post with πŸ”₯ 5 days ago
reacted to their post with πŸš€πŸ”₯ 12 days ago
view post
Post
1669
Microsoft released their new fine-tuned phi-4 models with reasoning data yesterday. They outperform/rival much larger models . Check out them if you haven't yet. πŸš€

Phi4 mini reasoning(SFT): microsoft/Phi-4-mini-reasoning
Phi-4 reasoning(SFT): microsoft/Phi-4-reasoning
Phi-4 reasoning plus (SFT + RL): microsoft/Phi-4-reasoning-plus
Demo: https://github.com/marketplace/models/azureml/Phi-4-reasoning/playground
Articles: https://arxiv.org/pdf/2504.21318
https://arxiv.org/pdf/2504.21233
Blog: https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/

  • 1 reply
Β·
posted an update 12 days ago
view post
Post
1669
Microsoft released their new fine-tuned phi-4 models with reasoning data yesterday. They outperform/rival much larger models . Check out them if you haven't yet. πŸš€

Phi4 mini reasoning(SFT): microsoft/Phi-4-mini-reasoning
Phi-4 reasoning(SFT): microsoft/Phi-4-reasoning
Phi-4 reasoning plus (SFT + RL): microsoft/Phi-4-reasoning-plus
Demo: https://github.com/marketplace/models/azureml/Phi-4-reasoning/playground
Articles: https://arxiv.org/pdf/2504.21318
https://arxiv.org/pdf/2504.21233
Blog: https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/

  • 1 reply
Β·
reacted to clem's post with πŸ€— 12 days ago
view post
Post
1504
The meta-llama org just crossed 40,000 followers on Hugging Face. Grateful for all their impact on the field sharing the Llama weights openly and much more!

We need more of this from all other big tech to make the AI more open, collaborative and beneficial to all!
reacted to AdinaY's post with πŸ”₯ 13 days ago
view post
Post
1861
Xiaomi just entered the open source as a new playerπŸ”₯ And dropped MiMo - a 7B model trained from scratch for reasoning.

XiaomiMiMo/MiMo-7B-RL

✨ 7B - Base/RL/SFT/RL zero
✨ Surpasses 32B models in math & code
✨ Apache 2.0 licensed
reacted to their post with πŸš€πŸ”₯ 15 days ago
view post
Post
4825
Qwen 3 models releasedπŸ”₯
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

βœ… Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
βœ…Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
βœ… Three stage done while pretraining:
β€’ Stage 1: General language learning and knowledge building.
β€’ Stage 2: Reasoning boost with STEM, coding, and logic skills.
β€’ Stage 3: Long context training
βœ… It supports MCP in the model
βœ… Strong agent skills
βœ… Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
βœ… Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.
posted an update 15 days ago
view post
Post
4825
Qwen 3 models releasedπŸ”₯
It offers 2 MoE and 6 dense models with following parameter sizes: 0.6B, 1.7B, 4B, 8B, 14B, 30B(MoE), 32B, and 235B(MoE).
Models: Qwen/qwen3-67dd247413f0e2e4f653967f
Blog: https://qwenlm.github.io/blog/qwen3/
Demo: Qwen/Qwen3-Demo
GitHub: https://github.com/QwenLM/Qwen3

βœ… Pre-trained 119 languages(36 trillion tokens) and dialects with strong translation and instruction following abilities. (Qwen2.5 was pre-trained on 18 trillion tokens.)
βœ…Qwen3 dense models match the performance of larger Qwen2.5 models. For example, Qwen3-1.7B/4B/8B/14B/32B perform like Qwen2.5-3B/7B/14B/32B/72B.
βœ… Three stage done while pretraining:
β€’ Stage 1: General language learning and knowledge building.
β€’ Stage 2: Reasoning boost with STEM, coding, and logic skills.
β€’ Stage 3: Long context training
βœ… It supports MCP in the model
βœ… Strong agent skills
βœ… Supports seamless between thinking mode (for hard tasks like math and coding) and non-thinking mode (for fast chatting) inside chat template.
βœ… Better human alignment for creative writing, roleplay, multi-turn conversations, and following detailed instructions.
reacted to julien-c's post with πŸ”₯ 18 days ago
view post
Post
4179
BOOOOM: Today I'm dropping TINY AGENTS

the 50 lines of code Agent in Javascript πŸ”₯

I spent the last few weeks working on this, so I hope you will like it.

I've been diving into MCP (Model Context Protocol) to understand what the hype was all about.

It is fairly simple, but still quite powerful: MCP is a standard API to expose sets of Tools that can be hooked to LLMs.

But while doing that, came my second realization:

Once you have a MCP Client, an Agent is literally just a while loop on top of it. 🀯

➑️ read it exclusively on the official HF blog: https://huggingface.co/blog/tiny-agents
  • 1 reply
Β·
posted an update 18 days ago
view post
Post
3609
FlowReasoner is a new system that builds a custom set of small AI agents for every user question. Unlike search based methods it uses reasoning driven optimization with external execution feedback.

βœ… First, it distills reasoning data using DeepSeek R1-671B to build multi agent systems. πŸ€–
βœ… Then, reasoning data used for DeepSeek-R1-Distill-Qwen-7B via supervised fine tuning for basic reasoning skills. πŸ’‘
βœ… Finally, RL with GRPO (optimizes by comparing response groups from queries/tasks) to improve reasoning.

FlowReasoner: Reinforcing Query-Level Meta-Agents (2504.15257)
Code: https://github.com/sail-sg/flowreasoner
reacted to fdaudens's post with πŸ”₯ 19 days ago
reacted to meg's post with πŸ”₯ 21 days ago
reacted to clem's post with πŸ”₯ 21 days ago
view post
Post
3995
Energy is a massive constraint for AI but do you even know what energy your chatGPT convos are using?

We're trying to change this by releasing ChatUI-energy, the first interface where you see in real-time what energy your AI conversations consume. Great work from @jdelavande powered by spaces & TGI, available for a dozen of open-source models like Llama, Mistral, Qwen, Gemma and more.

jdelavande/chat-ui-energy

Should all chat interfaces have this? Just like ingredients have to be shown on products you buy, we need more transparency in AI for users!
  • 3 replies
Β·