Elrich akira's picture

1 1 2

Elrich akira

elrich666

·

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago

ggml-org/gguf-my-repo

upvoted a collection about 1 month ago

Muy denunciado = Debe ser muy bueno

liked a model about 2 months ago

Novaciano/Llama-3.2-3b-NSFW_Aesir_Uncensored-GGUF

View all activity

Organizations

None yet

elrich666's activity

liked a Space about 1 month ago

GGUF My Repo

Create and quantize Hugging Face models

upvoted a collection about 1 month ago

Muy denunciado = Debe ser muy bueno

Si es muy denunciado es muy probable de que sea un modelo bastante bueno. Si está en este listado es garantía de calidad. • 1 item • Updated Mar 8 • 1

liked a model about 2 months ago

Novaciano/Llama-3.2-3b-NSFW_Aesir_Uncensored-GGUF

Updated Feb 7 • 334 • 11

reacted to Jaward's post with 👍🔥 about 2 months ago

Post

1528

The beauty in GRPO is the fact that it doesn’t care if the rewards are rule-based or learned, the hack: let the data self-normalize— trajectories in a batch compete against their mean, no value model, no extra params, just clean, efficient RL that cuts memory usage by 50%, while maintaining SOTA performance. btw it was introduced 9months prior to R1: arxiv.org/pdf/2402.03300

1 reply

·