Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context Paper • 2403.05530 • Published Mar 8 • 61
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation Paper • 2410.11779 • Published Oct 15 • 24
Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention Paper • 2410.10774 • Published Oct 14 • 25
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding Paper • 2411.19527 • Published 26 days ago • 10