DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging Paper • 2504.12364 • Published 7 days ago • 18
Iterative Self-Training for Code Generation via Reinforced Re-Ranking Paper • 2504.09643 • Published 10 days ago • 34
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Paper • 2504.12395 • Published 7 days ago • 16
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published 9 days ago • 12
ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness Paper • 2504.10514 • Published 13 days ago • 45
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published 8 days ago • 56
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Paper • 2504.09454 • Published 10 days ago • 11
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations Paper • 2504.10481 • Published 9 days ago • 83
In-2-4D: Inbetweening from Two Single-View Images to 4D Generation Paper • 2504.08366 • Published 12 days ago • 9
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning Paper • 2504.08600 • Published 12 days ago • 26
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance Paper • 2504.08716 • Published 12 days ago • 10
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation Paper • 2504.07405 • Published 13 days ago • 12
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 12 days ago • 39
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published 12 days ago • 47
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 12 days ago • 120
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 12
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper • 2504.06148 • Published 15 days ago • 13