4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published 3 days ago • 6
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published 5 days ago • 16
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published 6 days ago • 13
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published 7 days ago • 7
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published 10 days ago • 11
Part123: Part-aware 3D Reconstruction from a Single-view Image Paper • 2405.16888 • Published 7 days ago • 10
Human4DiT: Free-view Human Video Generation with 4D Diffusion Transformer Paper • 2405.17405 • Published 7 days ago • 13
Vidu4D: Single Generated Video to High-Fidelity 4D Reconstruction with Dynamic Gaussian Surfels Paper • 2405.16822 • Published 7 days ago • 11
I2VEdit: First-Frame-Guided Video Editing via Image-to-Video Diffusion Models Paper • 2405.16537 • Published 8 days ago • 15
Looking Backward: Streaming Video-to-Video Translation with Feature Banks Paper • 2405.15757 • Published 10 days ago • 12
Look Once to Hear: Target Speech Hearing with Noisy Examples Paper • 2405.06289 • Published 24 days ago • 3
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling Paper • 2405.14847 • Published 11 days ago • 6
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published 11 days ago • 27
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published 11 days ago • 5
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published 12 days ago • 21
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices Paper • 2405.12211 • Published 14 days ago • 1
Dreamer XL: Towards High-Resolution Text-to-3D Generation via Trajectory Score Matching Paper • 2405.11252 • Published 16 days ago • 11
Imp: Highly Capable Large Multimodal Models for Mobile Devices Paper • 2405.12107 • Published 14 days ago • 23
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published 15 days ago • 53
INDUS: Effective and Efficient Language Models for Scientific Applications Paper • 2405.10725 • Published 17 days ago • 23
Chameleon: Mixed-Modal Early-Fusion Foundation Models Paper • 2405.09818 • Published 18 days ago • 96
TRANSIC: Sim-to-Real Policy Transfer by Learning from Online Correction Paper • 2405.10315 • Published 18 days ago • 9
Dual3D: Efficient and Consistent Text-to-3D Generation with Dual-mode Multi-view Latent Diffusion Paper • 2405.09874 • Published 18 days ago • 15
CAT3D: Create Anything in 3D with Multi-View Diffusion Models Paper • 2405.10314 • Published 18 days ago • 38
Compositional Text-to-Image Generation with Dense Blob Representations Paper • 2405.08246 • Published 21 days ago • 11
Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning Paper • 2405.08054 • Published 21 days ago • 21
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots Paper • 2405.07990 • Published 21 days ago • 15
SUTRA: Scalable Multilingual Language Model Architecture Paper • 2405.06694 • Published 27 days ago • 34
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation Paper • 2405.07065 • Published 23 days ago • 16
StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation Paper • 2405.01434 • Published May 2 • 47
Invisible Stitch: Generating Smooth 3D Scenes with Depth Inpainting Paper • 2404.19758 • Published Apr 30 • 10
DressCode: Autoregressively Sewing and Generating Garments from Text Guidance Paper • 2401.16465 • Published Jan 29 • 10
InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation Paper • 2404.19427 • Published Apr 30 • 68
Interactive3D: Create What You Want by Interactive 3D Generation Paper • 2404.16510 • Published Apr 25 • 18
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models Paper • 2404.14507 • Published Apr 22 • 21
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 122
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation Paper • 2404.14396 • Published Apr 22 • 17
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 26
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation Paper • 2404.11565 • Published Apr 17 • 12
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization Paper • 2404.09956 • Published Apr 15 • 11
Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model Paper • 2404.09967 • Published Apr 15 • 20
Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video Paper • 2404.09833 • Published Apr 15 • 27
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies Paper • 2404.08197 • Published Apr 12 • 26
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models Paper • 2404.07973 • Published Apr 11 • 28
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11 • 41
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Paper • 2404.07987 • Published Apr 11 • 46
RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion Paper • 2404.07199 • Published Apr 10 • 22
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting Paper • 2404.06903 • Published Apr 10 • 14