microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • Updated 15 days ago • 622k • 1.32k
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 23 days ago • 38
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation Paper • 2412.04432 • Published Dec 5, 2024 • 16