Post
380
nice clean GRPO implementation:
- no transformers
- no vllm
- has improved grpo (DAPO)
- under 300 lines
- runs on 24GB (RTX 4090 GPU)
- no transformers
- no vllm
- has improved grpo (DAPO)
- under 300 lines
- runs on 24GB (RTX 4090 GPU)