File size: 765 Bytes
1bf33ae 7d22760 1bf33ae cfb7866 7d22760 cfb7866 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
---
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
library_name: transformers
license: mit
pipeline_tag: video-text-to-text
---
# VideoChat2-TPO
This model is based on the paper [Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment](https://huggingface.co/papers/2412.19326).
## 🏃 Installation
```
pip install -r requirements.txt
python app.py
```
## 🔧 Usage
```
from transformers import AutoModel, AutoTokenizer
from tokenizer import MultimodalLlamaTokenizer
model_path = "OpenGVLab/VideoChat-TPO"
tokenizer = AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True,
use_fast=False,)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True, _tokenizer=self.tokenizer).eval()
``` |