Instructions to use Qwen/Qwen3-VL-32B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Qwen/Qwen3-VL-32B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Qwen/Qwen3-VL-32B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-32B-Instruct")
model = AutoModelForMultimodalLM.from_pretrained("Qwen/Qwen3-VL-32B-Instruct")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Qwen/Qwen3-VL-32B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Qwen/Qwen3-VL-32B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3-VL-32B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Qwen/Qwen3-VL-32B-Instruct

SGLang

How to use Qwen/Qwen3-VL-32B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Qwen/Qwen3-VL-32B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3-VL-32B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Qwen/Qwen3-VL-32B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Qwen/Qwen3-VL-32B-Instruct",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Qwen/Qwen3-VL-32B-Instruct with Docker Model Runner:
```
docker model run hf.co/Qwen/Qwen3-VL-32B-Instruct
```

Configuration issue with tie_word_embeddings when using trl GRPOTrainer with vLLM

by DimensionSTP - opened Feb 4

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

-0

DimensionSTP

Feb 4

When loading Qwen/Qwen3-VL-32B-Instruct with vLLM, for example when using TRL’s GRPOTrainer with use_vllm=True, an error of the form AttributeError: 'Qwen3VLTextConfig' object has no attribute 'tie_word_embeddings' may occur.

This issue is caused by a mismatch in the configuration structure. In the config.json for Qwen3-VL-32B-Instruct, the field tie_word_embeddings is defined at the root level but is missing from the text_config section. In contrast, Qwen3-VL-4B-Instruct explicitly defines tie_word_embeddings inside text_config. The vLLM implementation for Qwen3-VL directly accesses text_config.tie_word_embeddings, and when this field is absent, an AttributeError is raised. This is a configuration compatibility issue rather than a problem with the model architecture or training procedure.

A straightforward and recommended fix is to add "tie_word_embeddings": false to the text_config section of config.json. This is consistent with the intended design of the 32B model, which uses untied input and output embeddings, and it does not change model behavior or performance. It only restores compatibility with vLLM.

If editing config.json is not practical, the issue can also be resolved at runtime by injecting the missing attribute after loading the model, for example by setting model.config.text_config.tie_word_embeddings to the value of the root-level tie_word_embeddings field, defaulting to false if necessary.

The tie_word_embeddings option controls whether the input token embedding matrix and the output language modeling head share the same weights. For Qwen3-VL-32B-Instruct, this option appears to be intentionally set to false. The error arises solely from the missing field in text_config, not from an incorrect configuration value.

Configuration issue with tie_word_embeddings when using trl GRPOTrainer with vLLMf9c6c475

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment