VideoMind
Collection
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
β’
8 items
β’
Updated
β’
3
VideoMind is a multi-modal agent framework that enhances video reasoning by emulating human-like processes, such as breaking down tasks, localizing and verifying moments, and synthesizing answers.
Please refer to our GitHub Repository for more details about this model.
Please kindly cite our paper if you find this project helpful.
@article{liu2025videomind,
title={VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning},
author={Liu, Ye and Lin, Kevin Qinghong and Chen, Chang Wen and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2503.13444},
year={2025}
}