alanzhuly commited on
Commit
dd3a1be
·
verified ·
1 Parent(s): 25721ef

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: image-text-to-text
3
+ tags:
4
+ - NPU
5
+ ---
6
+ # Qwen3-VL-4B-Thinking
7
+ Run **Qwen3-VL-4B-Thinking** optimized for **Qualcomm NPUs** with [nexaSDK](https://sdk.nexa.ai).
8
+
9
+ ## Quickstart
10
+
11
+ 1. **Install NexaSDK** and create a free account at [sdk.nexa.ai](https://sdk.nexa.ai)
12
+ 2. **Activate your device** with your access token:
13
+
14
+ ```bash
15
+ nexa config set license '<access_token>'
16
+ ```
17
+ 3. Run the model on Qualcomm NPU in one line:
18
+
19
+ ```bash
20
+ nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU
21
+ ```
22
+
23
+ ## Model Description
24
+ **Qwen3-VL-4B-Thinking** is a 4-billion-parameter multimodal large language model from the Qwen team at Alibaba Cloud.
25
+ Part of the **Qwen3-VL** (Vision-Language) family, it is designed for advanced visual reasoning and chain-of-thought generation across image, text, and video inputs.
26
+
27
+ Compared to the *Instruct* variant, the **Thinking** model emphasizes deeper multi-step reasoning, analysis, and planning. It produces detailed, structured outputs that reflect intermediate reasoning steps, making it well-suited for research, multimodal understanding, and agentic workflows.
28
+
29
+ ## Features
30
+ - **Vision-Language Understanding**: Processes images, text, and videos for joint reasoning tasks.
31
+ - **Structured Thinking Mode**: Generates intermediate reasoning traces for better transparency and interpretability.
32
+ - **High Accuracy on Visual QA**: Performs strongly on visual question answering, chart reasoning, and document analysis benchmarks.
33
+ - **Multilingual Support**: Understands and responds in multiple languages.
34
+ - **Optimized for Efficiency**: Delivers strong performance at 4B scale for on-device or edge deployment.
35
+
36
+ ## Use Cases
37
+ - Multimodal reasoning and visual question answering
38
+ - Scientific and analytical reasoning tasks involving charts, tables, and documents
39
+ - Step-by-step visual explanation or tutoring
40
+ - Research on interpretability and chain-of-thought modeling
41
+ - Integration into agent systems that require structured reasoning
42
+
43
+ ## Inputs and Outputs
44
+ **Input:**
45
+ - Text, images, or combined multimodal prompts (e.g., image + question)
46
+
47
+ **Output:**
48
+ - Generated text, reasoning traces, or structured responses
49
+ - May include explicit thought steps or structured JSON reasoning sequences
50
+
51
+ ## License
52
+ Check the [official Qwen license](https://huggingface.co/Qwen) for terms of use and redistribution.