|
--- |
|
license: apache-2.0 |
|
--- |
|
# AskVideos-7B-Instruct-v0.1 |
|
|
|
## Model details |
|
|
|
**Model type:** |
|
AskVideos-7B-Instruct-v0.1 is an open-source chatbot trained by fine-tuning a Video-LLaMA variant on additional video Q&A data. |
|
It uses a frozen Vicuna 7B v1.1 LLM to answer Video-Text queries and a frozen BLIP style image encoder. |
|
A video feature is derived from the encoded image using a video-QFormer and the result is projected onto the LLM space. |
|
|
|
**Github repo for demo:** |
|
https://github.com/AskYoutubeAI/AskVideos-Instruct |
|
|
|
**Acknowledgement** |
|
This model is based on Video-LLaMA. Check out the original work here: https://github.com/DAMO-NLP-SG/Video-LLaMA |
|
|
|
## License |
|
AskVideos-7B-Instruct-v0.1 code and models are distributed under the Apache License 2.0. |
|
|
|
## Training dataset |
|
- Finetuned with 50K video synthetic Q&A pairs mined from videos. |
|
- For each Q&A pair, 16 frames are sampled over a 30s video. |
|
- Finetuned on Video-LLaAMA Vicuna 7B. |