guipenedo (Guilherme Penedo)

liked a Space 2 days ago

18

The Distill Template

🌌

Craft Beautiful Blogs

liked a Space 24 days ago

2.25k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters

liked a model 27 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 20 days ago • 2.29M • • 11.3k

liked a dataset about 1 month ago

open-r1/OpenThoughts-114k-math

Viewer • Updated Jan 30 • 89.1k • 1.75k • 73

liked a dataset 3 months ago

HuggingFaceFW/fineweb-2

Viewer • Updated Jan 8 • 12.5B • 74.3k • 446

liked a Space 4 months ago

35

Discussion Forum

💬

liked a model 5 months ago

HuggingFaceTB/SmolLM2-1.7B-Instruct

Text Generation • Updated 9 days ago • 667k • • 574

liked 2 Spaces 5 months ago

60

Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks

📝

Evaluate multilingual models using FineTasks

106

TxT360: Trillion Extracted Text

📖

Create a large, deduplicated dataset for LLM pre-training

liked a model 5 months ago

cis-lmu/glotlid

Text Classification • Updated Oct 26, 2024 • 29.3k • 60

liked a dataset 6 months ago

tiiuae/falcon-refinedweb

Viewer • Updated Jun 20, 2023 • 968M • 53.5k • 841

liked a Space 7 months ago

393

Finegrain Object Eraser

🧽

Erase any object just by naming it!

liked 3 models 8 months ago

liked a dataset 9 months ago

HuggingFaceFW/fineweb-edu

Viewer • Updated Jan 31 • 3.3B • 502k • 652

liked a Space 10 months ago

876

FineWeb: decanting the web for the finest text data at scale

🍷

Generate high-quality web text data for LLM training

liked a dataset 11 months ago

HuggingFaceFW/fineweb

Viewer • Updated Jan 31 • 25B • 301k • 2.03k

liked a Space over 1 year ago

208

GPT Baker

🚀

Create customized chatbots using simple prompts

Guilherme Penedo

AI & ML interests

Organizations

guipenedo's activity