Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published 14 days ago • 131
Adaptive Caching for Faster Video Generation with Diffusion Transformers Paper • 2411.02397 • Published Nov 4 • 23
view article Article Advanced Flux Dreambooth LoRA Training with 🧨 diffusers By linoyts • Oct 21 • 32
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated 21 days ago • 548
CommonCanvas Collection Collection of models trained on the CommonCatalogue datasets • 8 items • Updated May 16 • 8
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published Sep 19 • 23
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 74
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model Paper • 2409.01199 • Published Sep 2 • 13
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers Paper • 2408.17131 • Published Aug 30 • 12
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated Nov 14 • 536
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation Paper • 2408.15239 • Published Aug 27 • 29
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22 • 35
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization Paper • 2408.08019 • Published Aug 15 • 10