Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
burtenshaw 
posted an update 14 days ago
Post
2770
NEW UNIT in the Hugging Face Reasoning course. We dive deep into the algorithm behind DeepSeek R1 with an advanced and hands-on guide to interpreting GRPO.

🔗 reasoning-course

This unit is super useful if you’re tuning models with reinforcement learning. It will help with:

- interpreting loss and reward progression during training runs
- selecting effective parameters for training
- reviewing and defining effective reward functions

This unit also works up smoothly toward the existing practical exercises form @mlabonne and Unsloth.

📣 Shout out to @ShirinYamani who wrote the unit. Follow for more great content.

realy?