MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10, 2024 • 29
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published Aug 20, 2024 • 59
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27, 2024 • 39
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published Sep 17, 2024 • 29
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published Sep 18, 2024 • 37
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19, 2024 • 48
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published Sep 20, 2024 • 69
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19, 2024 • 137
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published Sep 24, 2024 • 17
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
Byte Latent Transformer: Patches Scale Better Than Tokens Paper • 2412.09871 • Published Dec 13, 2024 • 92
ColorFlow: Retrieval-Augmented Image Sequence Colorization Paper • 2412.11815 • Published Dec 16, 2024 • 26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search Paper • 2412.18319 • Published Dec 24, 2024 • 37
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published Jan 10 • 61
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14 • 273
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper • 2501.06751 • Published Jan 12 • 31
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22 • 332
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation Paper • 2501.12202 • Published Jan 21 • 34
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference Paper • 2502.00299 • Published 26 days ago • 3
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published 1 day ago • 21