StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements Paper • 2412.08503 • Published 11 days ago • 8
StyleMaster: Stylize Your Video with Artistic Generation and Translation Paper • 2412.07744 • Published 12 days ago • 19
Learning Flow Fields in Attention for Controllable Person Image Generation Paper • 2412.08486 • Published 11 days ago • 32
LAION-SG: An Enhanced Large-Scale Dataset for Training Complex Image-Text Models with Structural Annotations Paper • 2412.08580 • Published 11 days ago • 43
UniReal: Universal Image Generation and Editing via Learning Real-world Dynamics Paper • 2412.07774 • Published 12 days ago • 24
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published 12 days ago • 45
Maya: An Instruction Finetuned Multilingual Multimodal Model Paper • 2412.07112 • Published 13 days ago • 24
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published 13 days ago • 56
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation Paper • 2412.09428 • Published 10 days ago • 7
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published 17 days ago • 10
CompCap: Improving Multimodal Large Language Models with Composite Captions Paper • 2412.05243 • Published 16 days ago • 18
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale Paper • 2412.05237 • Published 16 days ago • 44
ZipAR: Accelerating Autoregressive Image Generation through Spatial Locality Paper • 2412.04062 • Published 17 days ago • 7
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows Paper • 2412.01169 • Published 20 days ago • 10
Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis Paper • 2412.04431 • Published 17 days ago • 16
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents Paper • 2410.24024 • Published Oct 31 • 48
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 20