Hugging Face Smol Models Research

Enterprise

community

AI & ML interests

Exploring smol models (for text, vision and video) and high quality web and synthetic datasets

Recent Activity

thomwolf authored a paper 29 days ago

SmolVLM: Redefining small and efficient multimodal models

lewtun authored a paper 29 days ago

SmolVLM: Redefining small and efficient multimodal models

nouamanetazi authored a paper 29 days ago

SmolVLM: Redefining small and efficient multimodal models

View all activity

Organization Card

Community About org cards

Hugging Face Smol Models Research

This is the home for smol models (SmolLM & SmolVLM) and high quality pre-training datasets. We released:

FineWeb-Edu: a filtered version of FineWeb dataset for educational content, paper available here.
Cosmopedia: the largest open synthetic dataset, with 25B tokens and 30M samples. It contains synthetic textbooks, blog posts, and stories, posts generated by Mixtral. Blog post available here.
Smollm-Corpus: the pre-training corpus of SmolLM: Cosmopedia v0.2, FineWeb-Edu dedup and Python-Edu. Blog post available here.
FineMath: the best public math pretraining dataset with 50B tokens of mathematical and problem solving data.
Stack-Edu: the best open code pretraining dataset with educational code in 15 programming languages.
SmolLM2 models: a series of strong small models in three sizes: 135M, 360M and 1.7B
SmolVLM2: a family of small Video and Vision models in three sizes: 2.2B, 500M and 256M. Blog post available here.

News 🗞️

HuggingSnap: turn your iPhone into a visual assistant usig SmolVLM2. App Store - Source code
Stack-Edu: 125B tokens of educational code in 15 programming languages. Dataset

Collections 13

spaces 13

SmolLM2 1.7B Instruct WebGPU

A blazingly fast & powerful AI chatbot that runs in-browser!

SmolVLM 256M Instruct WebGPU

Generate descriptions for images using WebGPU technology

Smolvlm Web Benchmarking

SmolVLM2 IPhone Waitlist

sign in to receive news on the iPhone app

SmolVLM2 XSPFGenerator (VLC prototype)

Generate video highlights and playlist

SmolVLM2 HighlightGenerator

Generate video highlights from uploaded video

models 74

HuggingFaceTB/SmolLM2-360M-Instruct

Text Generation • Updated 15 days ago • 845k • 116

HuggingFaceTB/SmolLM2-135M-Instruct

Text Generation • Updated 15 days ago • 367k • 188

HuggingFaceTB/SmolLM2-1.7B-Instruct

Text Generation • Updated 15 days ago • 81.1k • 611

HuggingFaceTB/SmolVLM2-2.2B-Base

Image-Text-to-Text • Updated 23 days ago • 181 • 3

HuggingFaceTB/SmolVLM-256M-Instruct

Image-Text-to-Text • Updated 28 days ago • 486k • 221

HuggingFaceTB/SmolVLM-Instruct

Image-Text-to-Text • Updated 29 days ago • 74.6k • 436

HuggingFaceTB/SmolVLM-500M-Instruct

Image-Text-to-Text • Updated 29 days ago • 32k • 119

HuggingFaceTB/SmolVLM2-256M-Video-Instruct

Image-Text-to-Text • Updated 29 days ago • 27.4k • 55

HuggingFaceTB/SmolVLM2-500M-Video-Instruct

Image-Text-to-Text • Updated 29 days ago • 18.6k • 57

HuggingFaceTB/SmolVLM2-2.2B-Instruct

Image-Text-to-Text • Updated 29 days ago • 82.4k • 170

datasets 39

HuggingFaceTB/stack-edu

Viewer • Updated Mar 20 • 167M • 1.96k • 33

HuggingFaceTB/issues-kaggle-notebooks

Viewer • Updated Mar 19 • 16.1M • 917 • 8

HuggingFaceTB/dclm-edu

Viewer • Updated Mar 7 • 1B • 14.6k • 26

HuggingFaceTB/SmolLM2-intermediate-evals

Viewer • Updated Mar 3 • 582 • 58

HuggingFaceTB/smoltalk

Viewer • Updated Feb 10 • 2.2M • 7.3k • 333

HuggingFaceTB/smol-smoltalk

Viewer • Updated Feb 6 • 485k • 2.79k • 41

HuggingFaceTB/finemath

Viewer • Updated Feb 6 • 48.3M • 16.6k • 308

HuggingFaceTB/everyday-conversations-llama3.1-2k

Viewer • Updated Jan 29 • 2.38k • 567 • 99

HuggingFaceTB/MagPie-Pro-300k-MT

Viewer • Updated Jan 29 • 300k • 69 • 2

HuggingFaceTB/finemath_contamination_report

Viewer • Updated Jan 7 • 5.33k • 104 • 1