chenk-ai (chenk-ai)

New activity in open-r1/README 23 days ago

[Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO

22

#15 opened 2 months ago by

lewtun

replied to burtenshaw's post 23 days ago

I experienced that the GRPO from TRL is very memory-consuming. There are already various alternative implementations out there that seem much faster and more lightweight. Unsloth is promoting this with a factor of 10 less memory! This is insane. Can we potentially expect something similar for the TRL implementation in the near future?

I have combined the RL gym lib with GRPO here to see if you can teach a small model to drive taxi. This already took around 70gb for the 1.5b model.

BTW: The RL gym lib could be potentially helpful for new/better reasoning models (and new benchmarks)?

https://github.com/chenkel-data/grpo-taxi

upvoted an article 28 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

about 1 month ago

• 382

upvoted an article 2 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 839

chenk-ai

AI & ML interests

Recent Activity

Organizations

chenk-ai's activity

[Experiment] Applying GRPO to DeepSeek-R1-Distill-Qwen-1.5B with LIMO

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

Open-R1: a fully open reproduction of DeepSeek-R1