Instructions to use Qwen/Qwen2.5-7B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/Qwen2.5-7B-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Qwen/Qwen2.5-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/Qwen2.5-7B-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/Qwen2.5-7B-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2.5-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Qwen/Qwen2.5-7B-Instruct
- SGLang
How to use Qwen/Qwen2.5-7B-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2.5-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2.5-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/Qwen2.5-7B-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/Qwen2.5-7B-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Qwen/Qwen2.5-7B-Instruct with Docker Model Runner:
docker model run hf.co/Qwen/Qwen2.5-7B-Instruct
If you are getting getting undefined symbol: _ZN3c1013MessageLoggerC1EPKciib when following instructions or other errors on vLLM
If you are unable to load a Qwen2.5 model using vLLM and are getting errors regarding missing libraries and such, the following version combination worked for me when trying to load Qwen/Qwen2.5-7B-Instruct and Qwen/Qwen2.5-14B-Instruct models.
torch==2.8.0
torchvision==0.23.0
torchaudio==2.8.0
vllm==0.10.2
I however installed cuda specific versions like so, since I understood that vLLM needs a particular version of torch and flash-attn version (see below) needs to match the same version
uv pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu129 --force-reinstall
Download the specific vllm wheel from https://github.com/vllm-project/vllm/releases/tag/v0.10.2 -> Assets section
uv pip install vllm-0.10.2+cu129-cp38-abi3-manylinux1_x86_64.whl --no-build-isolation
Extra: For me flash-attn wasn't loading either. So I did this
I installed flash_attn==2.8.3, like so
uv pip install flash_attn-2.8.3+cu12torch2.8cxx11abiFALSE-cp312-cp312-linux_x86_64.whl
I downloaded the wheel from here: https://github.com/Dao-AILab/flash-attention/releases/tag/v2.8.3 -> Assets section
Finally I had to reinstall flashinfer-python and flashinfer-cubin
uv pip install -U --pre flashinfer-python --index-url https://flashinfer.ai/whl/nightly/ --no-deps --force-reinstall
uv pip install -U --pre flashinfer-cubin --index-url https://flashinfer.ai/whl/nightly/ --force-reinstall
And finally, install flashinfer-jit-cache
uv pip install -U --pre flashinfer-jit-cache --index-url https://flashinfer.ai/whl/nightly/cu129
Thanks to @mwalol for pointing in the correct direction in this post:
https://huggingface.co/mistralai/Voxtral-4B-TTS-2603/discussions/12
The undefined symbol error suggests a mismatch between compiled and runtime library versions. Have you verified that all dependencies are up to date and correctly linked? Also, observe if there are any discrepancies in environment variables or configuration files that might affect dynamic linking. This could be related to specific build configurations rather than the model itself.