Self-rewarding correction for mathematical reasoning Paper • 2502.19613 • Published 15 days ago • 77
Magma: A Foundation Model for Multimodal AI Agents Paper • 2502.13130 • Published 23 days ago • 56
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE Paper • 2502.06282 • Published Feb 10 • 5
FAST: Efficient Action Tokenization for Vision-Language-Action Models Paper • 2501.09747 • Published Jan 16 • 23
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling Paper • 2412.05271 • Published Dec 6, 2024 • 138
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published Dec 5, 2024 • 18
GRAPE: Generalizing Robot Policy via Preference Alignment Paper • 2411.19309 • Published Nov 28, 2024 • 44
Running on CPU Upgrade 7.86k 7.86k Kolors Virtual Try-On 👕 Upload images to try on clothes virtually
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22, 2024 • 126
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 59
Achieving Human Level Competitive Robot Table Tennis Paper • 2408.03906 • Published Aug 7, 2024 • 27
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD Paper • 2404.06512 • Published Apr 9, 2024 • 30
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Paper • 2403.03507 • Published Mar 6, 2024 • 186