Collections
Discover the best community collections!
Collections including paper arxiv:2409.02634
-
Controllable Text Generation for Large Language Models: A Survey
Paper ā¢ 2408.12599 ā¢ Published ā¢ 62 -
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
Paper ā¢ 2408.12590 ā¢ Published ā¢ 33 -
Real-Time Video Generation with Pyramid Attention Broadcast
Paper ā¢ 2408.12588 ā¢ Published ā¢ 14 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper ā¢ 2408.11039 ā¢ Published ā¢ 56
-
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Paper ā¢ 2311.17049 ā¢ Published -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper ā¢ 2405.04434 ā¢ Published ā¢ 13 -
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Paper ā¢ 2303.17376 ā¢ Published -
Sigmoid Loss for Language Image Pre-Training
Paper ā¢ 2303.15343 ā¢ Published ā¢ 4
-
Adapting Large Language Models via Reading Comprehension
Paper ā¢ 2309.09530 ā¢ Published ā¢ 77 -
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Paper ā¢ 2404.03715 ā¢ Published ā¢ 60 -
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Paper ā¢ 2404.05719 ā¢ Published ā¢ 80 -
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Paper ā¢ 2409.02634 ā¢ Published ā¢ 89
-
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Paper ā¢ 2402.17485 ā¢ Published ā¢ 188 -
VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior
Paper ā¢ 2312.01841 ā¢ Published ā¢ 1 -
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model
Paper ā¢ 2311.16498 ā¢ Published ā¢ 1 -
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Paper ā¢ 2312.02134 ā¢ Published ā¢ 2
-
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Paper ā¢ 2401.09985 ā¢ Published ā¢ 15 -
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects
Paper ā¢ 2401.09962 ā¢ Published ā¢ 8 -
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Paper ā¢ 2401.10404 ā¢ Published ā¢ 10 -
ActAnywhere: Subject-Aware Video Background Generation
Paper ā¢ 2401.10822 ā¢ Published ā¢ 13