Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought Paper • 2504.05599 • Published 15 days ago • 81
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 15 days ago • 149
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay Paper • 2504.03601 • Published 19 days ago • 16 • 4
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper • 2503.14456 • Published Mar 18 • 141
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published Mar 10 • 42
Forgetting Transformer: Softmax Attention with a Forget Gate Paper • 2503.02130 • Published Mar 3 • 31
EuroBERT: Scaling Multilingual Encoders for European Languages Paper • 2503.05500 • Published Mar 7 • 77
SurveyX: Academic Survey Automation via Large Language Models Paper • 2502.14776 • Published Feb 20 • 100
MoBA: Mixture of Block Attention for Long-Context LLMs Paper • 2502.13189 • Published Feb 18 • 17
Running 2.5k 2.5k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Dria-Agent-a Collection powerful agentic models built for pythonic function calling • 4 items • Updated Feb 14 • 4
Tiny-Agent-a Collection fast and powerful agentic models designed to run on edge devices. • 6 items • Updated Feb 12 • 7
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 138 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 138 • 12
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7 • 138