Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers Paper • 2405.05945 • Published May 9 • 2
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models Paper • 2406.09416 • Published 4 days ago • 25
Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling Paper • 2406.07522 • Published 6 days ago • 26
MLCM: Multistep Consistency Distillation of Latent Diffusion Model Paper • 2406.05768 • Published 8 days ago • 8
Yuan 2.0-M32: Mixture of Experts with Attention Router Paper • 2405.17976 • Published 20 days ago • 18
Preference Learning Algorithms Do Not Learn Preference Rankings Paper • 2405.19534 • Published 18 days ago • 1
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 26
UniFL: Improve Stable Diffusion via Unified Feedback Learning Paper • 2404.05595 • Published Apr 8 • 22
Configurable Safety Tuning of Language Models with Synthetic Preference Data Paper • 2404.00495 • Published Mar 30 • 2
Scalable Pre-training of Large Autoregressive Image Models Paper • 2401.08541 • Published Jan 16 • 35
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper • 2404.02905 • Published Apr 3 • 60
GiT: Towards Generalist Vision Transformer through Universal Language Interface Paper • 2403.09394 • Published Mar 14 • 25
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 97
Insights into Alignment: Evaluating DPO and its Variants Across Multiple Tasks Paper • 2404.14723 • Published Apr 23 • 9
Aligning Diffusion Models by Optimizing Human Utility Paper • 2404.04465 • Published Apr 6 • 12
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14 • 42
Evaluating Text-to-Visual Generation with Image-to-Text Generation Paper • 2404.01291 • Published Apr 1 • 5
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 41
ReNoise: Real Image Inversion Through Iterative Noising Paper • 2403.14602 • Published Mar 21 • 19
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences Paper • 2404.03715 • Published Apr 4 • 58
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
Efficient Streaming Language Models with Attention Sinks Paper • 2309.17453 • Published Sep 29, 2023 • 13
Transparent Image Layer Diffusion using Latent Transparency Paper • 2402.17113 • Published Feb 27 • 5
MoAI: Mixture of All Intelligence for Large Language and Vision Models Paper • 2403.07508 • Published Mar 12 • 72
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 60
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 37
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis Paper • 2307.01952 • Published Jul 4, 2023 • 74
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 46
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 571