Introducing Visual Perception Token into Multimodal Large Language Model

This repository contains models based on the paper Introducing Visual Perception Token into Multimodal Large Language Model. These models utilize Visual Perception Tokens to enhance the visual perception capabilities of multimodal large language models (MLLMs).

Code: https://github.com/yu-rp/VisualPerceptionToken

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rp-yu/Qwen2-VL-2b-VPT-Seg

Base model

Qwen/Qwen2-VL-2B
Finetuned
(175)
this model

Dataset used to train rp-yu/Qwen2-VL-2b-VPT-Seg

Collection including rp-yu/Qwen2-VL-2b-VPT-Seg