Qwen2.5 VL 72B Instruct
Interact with Qwen2.5-VL-Chat model using text and files
Interact with Qwen2.5-VL-Chat model using text and files
VLMEvalKit Evaluation Results Collection
Generate responses using images and text input
Detect objects in images and get bounding boxes
Space for Qwen2.5-VL-3B and 7B image + text demo.
Highlight described objects in images
Engage in multi-modal conversations with images and videos
Compare any two VLMs, side-by-side.
A VLM-based message decoder that is trained via GRPO
Chat with images and text using Qwen-VL-Plus
Interact with images and texts using Qwen-VL-Max
Convert images to grayscale
Generate high-resolution images from text prompts
Analyze images and answer questions about them
Generate responses from text and images
Generate text from images and videos
Generate text by combining an image and a question
Generate text based on an image or video
Analyze images and describe their contents using AI models
Fixed fork of the original audio sr!
Generate captions for images
Qwen2-VL is a vision-language model that performs OCR
Describe an image based on a question
Generate text responses based on images and input text