Smol-reason Collection My first ever usage of GRPO fine tuning techniques, information learned from this model will be used on future Andy models. • 7 items • Updated 7 days ago