@chansung on Hugging Face: "simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

chansung

posted an update 14 days ago

Post

3374

simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

takarajordan

14 days ago

Very cool

chansung

14 days ago

Thanks!

smirki

14 days ago

Question! Can you explain if the vram usage increases if you increase the max # of generations per prompt, if so, why does that happen?

chansung

14 days ago

Because more tokens has to be stored in vram?

xinnn63

13 days ago

Cool frens!

In this post

chansung chansung park
takarajordan Jordan Legg
smirki Manav
xinnn63 Natalie H