Image-Text-to-Text
Transformers
Safetensors
English
MLLM
Inference Endpoints
File size: 1,424 Bytes
549533d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Model Card
<!-- Provide a quick summary of what the model is/does. -->
Parrot is a multi-language and multi-modal large language model capable of achieving excellent performance.
For a comprehensive introduction, please refer to [Parrot Paper](https://arxiv.org/abs/2406.02539) and [Parrot GitHub](https://github.com/AIDC-AI/Parrot).

# Model Details
![](https://github.com/AIDC-AI/Parrot/images/teaser.png)

# Performance
![](https://github.com/AIDC-AI/Parrot/images/teaser.png)

# Usage

Below is a code snippet to run Parrot with multimodal inputs. For additional usage instructions, including inference wrapper and Gradio UI, please refer to [Parrot GitHub](https://github.com/AIDC-AI/Parrot).
```markdown
pip install torch==2.1.2 transformers==4.43.2 pillow==10.3.0
```
```python
import torch
from PIL import Image
from transformers import AutoModelForCausalLM
```

# Citation
If you find Parrot useful, please cite the paper

```markdown
@article{sun2024parrot,
  title={Parrot: Multilingual Visual Instruction Tuning},
  author={Sun, Hai-Long and Zhou, Da-Wei and Li, Yang and Lu, Shiyin and Yi, Chao and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu and Zhan, De-Chuan and others},
  journal={arXiv preprint arXiv:2406.02539},
  year={2024}
}
```

# License
The project is licensed under Apache License Version 2.0 and is restricted to uses that comply with the license agreements of Qwen and Clip.