-
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Paper • 2312.08578 • Published • 20 -
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks
Paper • 2312.08583 • Published • 12 -
Vision-Language Models as a Source of Rewards
Paper • 2312.09187 • Published • 14 -
StemGen: A music generation model that listens
Paper • 2312.08723 • Published • 49
Collections
Discover the best community collections!
Collections including paper arxiv:2401.08541
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 30 -
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models
Paper • 2401.05252 • Published • 49 -
Scalable Pre-training of Large Autoregressive Image Models
Paper • 2401.08541 • Published • 38 -
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Paper • 2401.09417 • Published • 61
-
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation
Paper • 2312.12491 • Published • 70 -
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs
Paper • 2401.11708 • Published • 30 -
Training-Free Consistent Text-to-Image Generation
Paper • 2402.03286 • Published • 67 -
PALP: Prompt Aligned Personalization of Text-to-Image Models
Paper • 2401.06105 • Published • 49
-
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Paper • 2312.02087 • Published • 23 -
FaceStudio: Put Your Face Everywhere in Seconds
Paper • 2312.02663 • Published • 33 -
Orthogonal Adaptation for Modular Customization of Diffusion Models
Paper • 2312.02432 • Published • 15 -
ReconFusion: 3D Reconstruction with Diffusion Priors
Paper • 2312.02981 • Published • 11