Instructions to use Qwen/Qwen3-VL-32B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen3-VL-32B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="Qwen/Qwen3-VL-32B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("Qwen/Qwen3-VL-32B-Instruct") model = AutoModelForMultimodalLM.from_pretrained("Qwen/Qwen3-VL-32B-Instruct") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Qwen/Qwen3-VL-32B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen3-VL-32B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3-VL-32B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/Qwen/Qwen3-VL-32B-Instruct
- SGLang
How to use Qwen/Qwen3-VL-32B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3-VL-32B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3-VL-32B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen3-VL-32B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen3-VL-32B-Instruct", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use Qwen/Qwen3-VL-32B-Instruct with Docker Model Runner:
docker model run hf.co/Qwen/Qwen3-VL-32B-Instruct
Configuration issue with tie_word_embeddings when using trl GRPOTrainer with vLLM
When loading Qwen/Qwen3-VL-32B-Instruct with vLLM, for example when using TRL’s GRPOTrainer with use_vllm=True, an error of the form AttributeError: 'Qwen3VLTextConfig' object has no attribute 'tie_word_embeddings' may occur.
This issue is caused by a mismatch in the configuration structure. In the config.json for Qwen3-VL-32B-Instruct, the field tie_word_embeddings is defined at the root level but is missing from the text_config section. In contrast, Qwen3-VL-4B-Instruct explicitly defines tie_word_embeddings inside text_config. The vLLM implementation for Qwen3-VL directly accesses text_config.tie_word_embeddings, and when this field is absent, an AttributeError is raised. This is a configuration compatibility issue rather than a problem with the model architecture or training procedure.
A straightforward and recommended fix is to add "tie_word_embeddings": false to the text_config section of config.json. This is consistent with the intended design of the 32B model, which uses untied input and output embeddings, and it does not change model behavior or performance. It only restores compatibility with vLLM.
If editing config.json is not practical, the issue can also be resolved at runtime by injecting the missing attribute after loading the model, for example by setting model.config.text_config.tie_word_embeddings to the value of the root-level tie_word_embeddings field, defaulting to false if necessary.
The tie_word_embeddings option controls whether the input token embedding matrix and the output language modeling head share the same weights. For Qwen3-VL-32B-Instruct, this option appears to be intentionally set to false. The error arises solely from the missing field in text_config, not from an incorrect configuration value.