Post
1980
I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out π
Any πare more than welcome π€
https://github.com/mkurman/grpo-llm-evaluator
Any πare more than welcome π€
https://github.com/mkurman/grpo-llm-evaluator