CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers Paper • 2502.06527 • Published about 20 hours ago • 4
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... By srinivasbilla • 21 days ago • 60
MotionCanvas: Cinematic Shot Design with Controllable Image-to-Video Generation Paper • 2502.04299 • Published 5 days ago • 14
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 3 items • Updated 15 days ago • 334
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated Dec 6, 2024 • 203
DextrAH-G: Pixels-to-Action Dexterous Arm-Hand Grasping with Geometric Fabrics Paper • 2407.02274 • Published Jul 2, 2024 • 1
view article Article FineWeb2-C: Help Build Better Language Models in Your Language By davanstrien and 5 others • Dec 23, 2024 • 18
Eagle 2 Collection Eagle 2 is a family of frontier vision-language models with vision-centric design. The model supports 4K HD input, long-context video, and grounding. • 9 items • Updated 19 days ago • 31
UI-TARS: Pioneering Automated GUI Interaction with Native Agents Paper • 2501.12326 • Published 21 days ago • 49
CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation Paper • 2501.09433 • Published 26 days ago • 18
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces Paper • 2501.09756 • Published 26 days ago • 19
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published 27 days ago • 15
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling Paper • 2501.00574 • Published Dec 31, 2024 • 6
TACO Models Collection This collection contains the best-performing TACO models based on LLaMA-3/Qwen2 and SigLIP/CLIP. • 3 items • Updated Dec 20, 2024 • 8
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published Jan 8 • 84
TransPixar: Advancing Text-to-Video Generation with Transparency Paper • 2501.03006 • Published Jan 6 • 23
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published Jan 3 • 42