Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models. By tiiuae and 9 others • 3 days ago • 31
Highlights from the First ICLR 2025 Watermarking Workshop By hadyelsahar and 4 others • 4 days ago • 9
Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs By davidberenstein1957 and 1 other • 11 days ago • 25
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 136
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 34
A Deep Dive into Alibaba’s ZeroSearch: Why It Changes LLM-Centric Search Workflows By lynn-mikami • 6 days ago • 3
Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models. By tiiuae and 9 others • 3 days ago • 31
Highlights from the First ICLR 2025 Watermarking Workshop By hadyelsahar and 4 others • 4 days ago • 9
Good answers are not necessarily factual answers: an analysis of hallucination in leading LLMs By davidberenstein1957 and 1 other • 11 days ago • 25
DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 136
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • Feb 11 • 34
A Deep Dive into Alibaba’s ZeroSearch: Why It Changes LLM-Centric Search Workflows By lynn-mikami • 6 days ago • 3