Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper • 2503.09573 • Published 1 day ago • 31
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 2 days ago • 206
Q-Filters Collection Pre-computed Q-Filters for efficient KV cache compression. • 15 items • Updated 10 days ago • 6
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published 16 days ago • 68
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation Paper • 2502.13092 • Published 23 days ago • 12
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment Paper • 2502.10391 • Published 27 days ago • 32
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published about 1 month ago • 47
Teuken-7B-v0.4 Collection OpenGPT-X Teuken 7B models trained on 4 trillion tokens • 4 items • Updated Dec 6, 2024 • 3
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents Paper • 2502.05957 • Published Feb 9 • 16
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling Paper • 2502.06703 • Published Feb 10 • 142
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 263
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 112
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems Paper • 2411.02959 • Published Nov 5, 2024 • 68
view article Article Model2Vec: Distill a Small Fast Model from any Sentence Transformer By Pringled and 1 other • Oct 14, 2024 • 77
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published Oct 7, 2024 • 13