Transformers Are Getting Old: Variants and Alternatives Exist!
By
•
•
34Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models
By
and 17 others
•
•
33FineWeb-C: A Community-Driven Dataset for Educational Quality Annotations in 122 Languages
By
and 5 others
•
•
26cocogold: training Marigold for text-grounded segmentation
By
•
•
24LLM Hallucinations: bug or feature? The US Supreme Court 2025 cases experiment
By
•
•
16We're open-sourcing "The Amazing Hand", a fully 3D printed robotic hand for less than $200 ✌️✌️✌️
By
and 2 others
•
•
17Should We Still Pretrain Encoders with Masked Language Modeling?
By
and 3 others
•
•
20How to Train Your LLM Web Agent: A Statistical Diagnosis
By
•
•
10Understanding Gemma 3n: How MatFormer Gives You Many Models in One
By
•
•
31🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?
By
•
•
311Introduction to MedVideoCap-55K: A New, Large-Scale, High-Quality Medical Video-Caption Pair Dataset
By
•
•
9Bringing Fusion Down to Earth: ML for Stellarator Optimization
By
•
•
64DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
By
•
•
182Uncensor any LLM with abliteration
By
•
•
626Code a simple RAG from scratch
By
•
•
121Why We Built the OpenMDW License: A Comprehensive License for ML Models
By
•
•
15Can AI Be Consentful? Rethinking Permission in the Age of Synthetic Everything
By
•
•
5ColPali: Efficient Document Retrieval with Vision Language Models 👀
By
•
•
274Everything You Need to Know about Knowledge Distillation
By
and 1 other
•
•
36Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models
By
and 8 others
•
•
7