Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
chansung 
posted an update 14 days ago
Post
3374
simple guide on the recipe for GRPO on Open-R1 which is built on top of TRL

I think FastAPI wrapper of vLLM with WeightSyncWorker is pretty cool feature. Also, we have many predefined reward functions out of the box!

Very cool

·

Thanks!

Question! Can you explain if the vram usage increases if you increase the max # of generations per prompt, if so, why does that happen?

·

Because more tokens has to be stored in vram?

Cool frens!