File size: 765 Bytes
1bf33ae
 
 
7d22760
1bf33ae
 
 
 
cfb7866
 
7d22760
 
cfb7866
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
---
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
library_name: transformers
license: mit
pipeline_tag: video-text-to-text
---

# VideoChat2-TPO

This model is based on the paper [Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment](https://huggingface.co/papers/2412.19326).

## 🏃 Installation

```
pip install -r requirements.txt
python app.py
```

## 🔧 Usage

```
from transformers import AutoModel, AutoTokenizer
from tokenizer import MultimodalLlamaTokenizer

model_path = "OpenGVLab/VideoChat-TPO"
tokenizer =  AutoTokenizer.from_pretrained(model_path,
trust_remote_code=True,
use_fast=False,)
model = AutoModel.from_pretrained(model_path,  trust_remote_code=True, _tokenizer=self.tokenizer).eval()
```