NexaAI
/

Qwen3-VL-4B-Instruct-NPU

Image-Text-to-Text

Model card Files Files and versions

alanzhuly commited on Oct 14

Commit

dd3a1be

·

verified ·

1 Parent(s): 25721ef

Create README.md

Files changed (1) hide show

README.md +52 -0

README.md ADDED Viewed

	@@ -0,0 +1,52 @@

+---
+pipeline_tag: image-text-to-text
+tags:
+- NPU
+---
+# Qwen3-VL-4B-Thinking
+Run **Qwen3-VL-4B-Thinking** optimized for **Qualcomm NPUs** with [nexaSDK](https://sdk.nexa.ai).
+## Quickstart
+1. **Install NexaSDK** and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
+2. **Activate your device** with your access token:
+   ```bash
+   nexa config set license '<access_token>'
+   ```
+3. Run the model on Qualcomm NPU in one line:
+   ```bash
+   nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU
+   ```
+## Model Description
+**Qwen3-VL-4B-Thinking** is a 4-billion-parameter multimodal large language model from the Qwen team at Alibaba Cloud.
+Part of the **Qwen3-VL** (Vision-Language) family, it is designed for advanced visual reasoning and chain-of-thought generation across image, text, and video inputs.
+Compared to the *Instruct* variant, the **Thinking** model emphasizes deeper multi-step reasoning, analysis, and planning. It produces detailed, structured outputs that reflect intermediate reasoning steps, making it well-suited for research, multimodal understanding, and agentic workflows.
+## Features
+- **Vision-Language Understanding**: Processes images, text, and videos for joint reasoning tasks.
+- **Structured Thinking Mode**: Generates intermediate reasoning traces for better transparency and interpretability.
+- **High Accuracy on Visual QA**: Performs strongly on visual question answering, chart reasoning, and document analysis benchmarks.
+- **Multilingual Support**: Understands and responds in multiple languages.
+- **Optimized for Efficiency**: Delivers strong performance at 4B scale for on-device or edge deployment.
+## Use Cases
+- Multimodal reasoning and visual question answering
+- Scientific and analytical reasoning tasks involving charts, tables, and documents
+- Step-by-step visual explanation or tutoring
+- Research on interpretability and chain-of-thought modeling
+- Integration into agent systems that require structured reasoning
+## Inputs and Outputs
+**Input:**
+- Text, images, or combined multimodal prompts (e.g., image + question)
+**Output:**
+- Generated text, reasoning traces, or structured responses
+- May include explicit thought steps or structured JSON reasoning sequences
+## License
+Check the [official Qwen license](https://huggingface.co/Qwen) for terms of use and redistribution.