lixin4ever
commited on
Commit
•
7ed968d
1
Parent(s):
7900467
Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,22 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: bsd-3-clause
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
- zh
|
6 |
+
pipeline_tag: visual-question-answering
|
7 |
+
---
|
8 |
+
|
9 |
+
# Video-LLaMA: An Instruction-Finetuned Visual Language Model for Video Understanding
|
10 |
+
This is the Hugging Face repo for storing pre-trained & fine-tuned checkpoints of our [Video-LLaMA](https://github.com/DAMO-NLP-SG/Video-LLaMA), which is a multi-modal conversational large language model with video understanding capability.
|
11 |
+
|
12 |
+
|
13 |
+
| Checkpoint | Link | Note |
|
14 |
+
|:------------|-------------|-------------|
|
15 |
+
| pretrain-vicuna7b.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/tree/main) | Vicuna-7B as language decoder, pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
|
16 |
+
| finetune-vicuna7b-v2.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/tree/main) | Fine-tuned on [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset|
|
17 |
+
| pretrain-vicuna13b.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-vicuna13b.pth) | Vicuna-13B as language decoder, pre-trained on WebVid (2.5M video-caption pairs) and LLaVA-CC3M (595k image-caption pairs) |
|
18 |
+
| finetune-vicuna13b-v2.pth (**recommended**) | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-vicuna13b-v2.pth) | Fine-tuned on [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset|
|
19 |
+
| pretrain-ziya13b-zh.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-ziya13b-zh.pth) | Pre-trained with Chinese LLM [Ziya-13B](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) |
|
20 |
+
| finetune-ziya13b-zh.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-ziya13b-zh.pth) | Fine-tuned on machine-translated [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset (in Chinese)|
|
21 |
+
| pretrain-billa7b-zh.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/pretrain-billa7b-zh.pth) | Pre-trained with Chinese LLM [BiLLA-7B](https://huggingface.co/IDEA-CCNL/Ziya-LLaMA-13B-v1) |
|
22 |
+
| finetune-billa7b-zh.pth | [link](https://huggingface.co/DAMO-NLP-SG/Video-LLaMA-Series/resolve/main/finetune-billa7b-zh.pth) | Fine-tuned on machine-translated [VideoChat](https://github.com/OpenGVLab/Ask-Anything) instruction-following dataset (in Chinese) |
|