merterbak (Mert Erbak)

reacted to clem's post with 🤗 about 22 hours ago

Post

1467

Llama 4 is in transformers!

Fun example using the instruction-tuned Maverick model responding about two images, using tensor parallel for maximum speed.

From https://huggingface.co/blog/llama4-release

liked a model 1 day ago

meta-llama/Llama-4-Scout-17B-16E

Image-Text-to-Text • Updated about 16 hours ago • 3.85k • 114

upvoted an article 1 day ago

Article

Welcome Llama 4 Maverick & Scout on Hugging Face!

2 days ago

• 96

liked a model 1 day ago

meta-llama/Llama-4-Scout-17B-16E-Instruct

Image-Text-to-Text • Updated about 16 hours ago • 16k • • 484

reacted to their post with 🔥 1 day ago

Post

1895

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

upvoted a collection 1 day ago

Llama 4

Collection

Llama 4 release • 10 items • Updated 1 day ago • 356

posted an update 1 day ago

Post

1895

Meta has unveiled its Llama 4 🦙 family of models, featuring native multimodality and mixture-of-experts architecture. Two model families are available now:
Models🤗: meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Blog Post: https://ai.meta.com/blog/llama-4-multimodal-intelligence/
HF's Blog Post: https://huggingface.co/blog/llama4-release

- 🧠 Native Multimodality - Process text and images in a unified architecture
- 🔍 Mixture-of-Experts - First Llama models using MoE for incredible efficiency
- 📏 Super Long Context - Up to 10M tokens
- 🌐 Multilingual Power - Trained on 200 languages with 10x more multilingual tokens than Llama 3 (including over 100 languages with over 1 billion tokens each)

🔹 Llama 4 Scout
- 17B active parameters (109B total)
- 16 experts architecture
- 10M context window
- Fits on a single H100 GPU
- Beats Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1

🔹 Llama 4 Maverick
- 17B active parameters (400B total)
- 128 experts architecture
- It can fit perfectly on DGX H100(8x H100)
- 1M context window
- Outperforms GPT-4o and Gemini 2.0 Flash
- ELO score of 1417 on LMArena currently second best model on arena

🔹 Llama 4 Behemoth (Coming Soon)
- 288B active parameters (2T total)
- 16 experts architecture
- Teacher model for Scout and Maverick
- Outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks

liked a model 1 day ago

meta-llama/Llama-4-Maverick-17B-128E-Instruct

Image-Text-to-Text • Updated 1 day ago • 1.91k • 202

liked 2 models 6 days ago

Qwen/Qwen2.5-VL-32B-Instruct

Image-Text-to-Text • Updated about 14 hours ago • 240k • 319

google/txgemma-9b-chat

Text Generation • Updated 9 days ago • 2.3k • 24

liked a model 9 days ago

deepseek-ai/DeepSeek-V3-0324

Text Generation • Updated 11 days ago • 158k • • 2.39k

updated a Space 9 days ago

10

Phi 4

🐨

Chat with Microsoft's phi-4 or phi-4-mini models

updated a Space 17 days ago

7

Gemma 3

💎

Chat with multimodal gemma-3-12b-it or gemma-3-4b-it models

liked a Space 18 days ago

7

Gemma 3

💎

Chat with multimodal gemma-3-12b-it or gemma-3-4b-it models

upvoted a paper 18 days ago

DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning

Paper • 2503.15265 • Published 19 days ago • 45

updated a collection 18 days ago

Papers

Collection

6 items • Updated 18 days ago

reacted to clem's post with 🔥 18 days ago

Post

2569

Nice new space to see how fast your personal or organization followers are growing on HF:
julien-c/follow-history

As you can see, I still have more followers than @julien-c even if he's trying to change this by building such cool spaces 😝😝😝