Benasd
/

Qwen2.5-VL-72B-Instruct-AWQ

Image-Text-to-Text

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Benasd commited on 15 days ago

Commit

fe85be3

·

verified ·

1 Parent(s): 642244b

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -13,6 +13,11 @@ base_model:
 - Qwen/Qwen2.5-VL-72B-Instruct
 ---
 # Qwen2.5-VL-72B-Instruct
 <a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
     <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>

 - Qwen/Qwen2.5-VL-72B-Instruct
 ---
+# Multi-GPU inference with vLLM
+```
+docker run -it --name iddt-ben-qwen25vl72 --gpus '"device=0,1"' -v huggingface:/root/.cache/huggingface --shm-size=32g -p 30000:8000 --ipc=host benasd/vllm:latest --model Benasd/Qwen2.5-VL-72B-Instruct-AWQ  --dtype float16 --quantization awq -tp 2
+```
 # Qwen2.5-VL-72B-Instruct
 <a href="https://chat.qwenlm.ai/" target="_blank" style="margin: 2px;">
     <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>