Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,19 @@
|
|
1 |
---
|
2 |
-
|
|
|
3 |
datasets:
|
4 |
- rp-yu/VPT_Datasets
|
5 |
language:
|
6 |
- en
|
|
|
|
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- Qwen/Qwen2-VL-2B-Instruct
|
4 |
datasets:
|
5 |
- rp-yu/VPT_Datasets
|
6 |
language:
|
7 |
- en
|
8 |
+
library_name: transformers
|
9 |
+
license: apache-2.0
|
10 |
metrics:
|
11 |
- accuracy
|
12 |
+
pipeline_tag: image-text-to-text
|
13 |
+
---
|
14 |
+
|
15 |
+
# Introducing Visual Perception Token into Multimodal Large Language Model
|
16 |
+
|
17 |
+
This repository contains models based on the paper [Introducing Visual Perception Token into Multimodal Large Language Model](https://arxiv.org/abs/2502.17425). These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).
|
18 |
+
|
19 |
+
Code: https://github.com/yu-rp/VisualPerceptionToken
|