Gabriel MartΓ­n BlΓ‘zquez's picture

Gabriel MartΓ­n BlΓ‘zquez

gabrielmbmb

AI & ML interests

ML Engineer

Recent Activity

Articles

Organizations

Hugging Face's profile picture Spaces-explorers's profile picture SomosNLP's profile picture Hugging Face H4's profile picture Argilla's profile picture Blog-explorers's profile picture Hugging Face TB Research's profile picture Argilla Explorers's profile picture distilabel-internal-testing's profile picture Data Is Better Together's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture LLHF's profile picture SLLHF's profile picture Argilla Warehouse's profile picture IOPO Experiments's profile picture Hugging Face FineVideo's profile picture rg-preview's profile picture Data Is Better Together Contributor's profile picture

gabrielmbmb's activity

reacted to anton-l's post with πŸš€ 5 days ago
view post
Post
1974
Introducing πŸ“π…π’π§πžπŒπšπ­π‘: the best public math pre-training dataset with 50B+ tokens!
HuggingFaceTB/finemath

Math remains challenging for LLMs and by training on FineMath we see considerable gains over other math datasets, especially on GSM8K and MATH.

We build the dataset by:
πŸ› οΈ carefully extracting math data from Common Crawl;
πŸ”Ž iteratively filtering and recalling high quality math pages using a classifier trained on synthetic annotations to identify math reasoning and deduction.

We conducted a series of ablations comparing the performance of Llama-3.2-3B-Base after continued pre-training on FineMath and observe notable gains compared to the baseline model and other public math datasets.

We hope this helps advance the performance of LLMs on math and reasoning! πŸš€
We’re also releasing all the ablation models as well as the evaluation code.

HuggingFaceTB/finemath-6763fb8f71b6439b653482c2
reacted to burtenshaw's post with πŸ€—β€οΈ 7 days ago
view post
Post
2547
People are flexing their end of year stats, so I made this app to show hub stats in a tidy design!

Thanks @Ameeeee and @jfcalvo for the feature from Argilla!
burtenshaw/recap
  • 1 reply
Β·
updated a Space 10 days ago