Elrich akira

elrich666
·

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago
ggml-org/gguf-my-repo
liked a model about 2 months ago
Novaciano/Llama-3.2-3b-NSFW_Aesir_Uncensored-GGUF
View all activity

Organizations

None yet

elrich666's activity

reacted to Jaward's post with 👍🔥 about 2 months ago
view post
Post
1528
The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300
  • 1 reply
·