-
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Paper • 2406.04325 • Published • 73 -
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models
Paper • 2401.15947 • Published • 49 -
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Paper • 2311.10122 • Published • 26 -
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Paper • 2311.16103 • Published • 1
Mikhail
Dremin
·
AI & ML interests
None yet
Recent Activity
liked
a dataset
about 2 months ago
q-future/VQA-stage3
liked
a model
3 months ago
genmo/mochi-1-preview
updated
a collection
4 months ago
VLM
Organizations
None yet
Collections
1
models
None public yet
datasets
None public yet