Real-Time Video Generation with Pyramid Attention Broadcast Paper • 2408.12588 • Published 2 days ago • 12
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published 2 days ago • 28
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper • 2408.11039 • Published 4 days ago • 35
Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering Paper • 2408.09702 • Published 5 days ago • 9
TraDiffusion: Trajectory-Based Training-Free Image Generation Paper • 2408.09739 • Published 5 days ago • 6
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views Paper • 2408.10195 • Published 5 days ago • 9
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations Paper • 2408.08459 • Published 9 days ago • 40
Body Transformer: Leveraging Robot Embodiment for Policy Learning Paper • 2408.06316 • Published 12 days ago • 8
FruitNeRF: A Unified Neural Radiance Field based Fruit Counting Framework Paper • 2408.06190 • Published 12 days ago • 15
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published 12 days ago • 31
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published 12 days ago • 93
RayGauss: Volumetric Gaussian-Based Ray Casting for Photorealistic Novel View Synthesis Paper • 2408.03356 • Published 18 days ago • 8
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks Paper • 2408.03615 • Published 17 days ago • 30
Openstory++: A Large-scale Dataset and Benchmark for Instance-aware Open-domain Visual Storytelling Paper • 2408.03695 • Published 17 days ago • 11
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Paper • 2408.03209 • Published 18 days ago • 20
ProCreate, Dont Reproduce! Propulsive Energy Diffusion for Creative Generation Paper • 2408.02226 • Published 20 days ago • 10
MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization Paper • 2408.02555 • Published 19 days ago • 26
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools Paper • 2406.12793 • Published Jun 18 • 31
TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling Paper • 2408.01291 • Published 22 days ago • 11
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published 23 days ago • 39
Berkeley Humanoid: A Research Platform for Learning-based Control Paper • 2407.21781 • Published 24 days ago • 7
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published 24 days ago • 24
WalkTheDog: Cross-Morphology Motion Alignment via Phase Manifolds Paper • 2407.18946 • Published Jul 11 • 12
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency Paper • 2407.17470 • Published about 1 month ago • 13
MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence Paper • 2407.16655 • Published Jul 23 • 28
A Simulation Benchmark for Autonomous Racing with Large-Scale Human Data Paper • 2407.16680 • Published Jul 23 • 11
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation Paper • 2407.15060 • Published Jul 21 • 9
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation Paper • 2407.11394 • Published Jul 16 • 11
Generalizable Implicit Motion Modeling for Video Frame Interpolation Paper • 2407.08680 • Published Jul 11 • 8
OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects Paper • 2407.08711 • Published Jul 11 • 5
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10 • 9
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively Paper • 2401.02955 • Published Jan 5 • 19
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos Paper • 2406.08407 • Published Jun 12 • 24
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound Paper • 2406.06612 • Published Jun 6 • 14
MagicPose4D: Crafting Articulated Models with Appearance and Motion Control Paper • 2405.14017 • Published May 22 • 2
Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion Paper • 2406.03184 • Published Jun 5 • 18
V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation Paper • 2406.02511 • Published Jun 4 • 8
Learning Temporally Consistent Video Depth from Video Diffusion Priors Paper • 2406.01493 • Published Jun 3 • 17
4Diffusion: Multi-view Video Diffusion Model for 4D Generation Paper • 2405.20674 • Published May 31 • 11
T2V-Turbo: Breaking the Quality Bottleneck of Video Consistency Model with Mixed Reward Feedback Paper • 2405.18750 • Published May 29 • 20
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning Paper • 2405.18386 • Published May 28 • 19
Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control Paper • 2405.17414 • Published May 27 • 10
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition Paper • 2405.15216 • Published May 24 • 11