Tom K.'s picture

Tom K.

ToKrCZ

AI & ML interests

None yet

Recent Activity

liked a model about 5 hours ago
unsloth/DeepSeek-R1-Distill-Qwen-32B-GGUF
liked a model 3 days ago
MiniMaxAI/MiniMax-VL-01
liked a model 9 days ago
NovaSky-AI/Sky-T1-32B-Preview
View all activity

Organizations

None yet

ToKrCZ's activity

reacted to thomwolf's post with πŸš€ about 1 month ago
view post
Post
4907
We are proud to announce HuggingFaceFW/fineweb-2: A sparkling update to HuggingFaceFW/fineweb with 1000s of πŸ—£οΈlanguages.

We applied the same data-driven approach that led to SOTA English performance in🍷 FineWeb to thousands of languages.

πŸ₯‚ FineWeb2 has 8TB of compressed text data and outperforms other multilingual datasets in our experiments.

The dataset is released under the permissive πŸ“œ ODC-By 1.0 license, and the πŸ’» code to reproduce it and our evaluations is public.

We will very soon announce a big community project, and are working on a πŸ“ blogpost walking you through the entire dataset creation process. Stay tuned!

In the mean time come ask us question on our chat place: HuggingFaceFW/discussion

H/t @guipenedo @hynky @lvwerra as well as @vsabolcec Bettina Messmer @negar-foroutan and @mjaggi
  • 2 replies
Β·