view article Article Introducing smolagents: simple agents that write actions in code. Dec 31, 2024 • 667
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 10 days ago • 234
The Big Benchmarks Collection Collection Gathering benchmark spaces on the hub (beyond the Open LLM Leaderboard) • 13 items • Updated Nov 18, 2024 • 196
Open LLM Leaderboard best models ❤️🔥 Collection A daily uploaded list of models with best evaluations on the LLM leaderboard: • 64 items • Updated about 18 hours ago • 537
view article Article Democratization of AI, Open Source, and AI Auditing: Thoughts from the DisinfoCon Panel in Berlin By frimelle • Oct 8, 2024 • 6
Manual Configuration Collection 5 datasets showcase YAML configuration on HuggingFace. See docs: https://huggingface.co/docs/hub/datasets-manual-configuration. • 5 items • Updated Nov 23, 2023 • 5
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale Paper • 2406.17557 • Published Jun 25, 2024 • 91
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24, 2024 • 58
view article Article Announcing Finance Commons and the Bad Data Toolbox: Pioneering Open Data and Advanced Document Processing By Pclanglais • Jul 19, 2024 • 20
view article Article Experimenting with Automatic PII Detection on the Hub using Presidio Jul 10, 2024 • 24
ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata Paper • 2405.09496 • Published May 15, 2024 • 3