vLLM online serve

#3
by chaurAr - opened

Just wanted to post these commands if someone is having trouble with serving the model with vLLM.
Cuda version: 13.0

uv venv speechllm
source speechllm/bin/activate
uv pip install -U --pre vllm
--torch-backend=auto
--extra-index-url https://wheels.vllm.ai/nightly/cu130

uv pip install "vllm[audio]"
uv pip install --upgrade transformers

vllm serve nvidia/audio-flamingo-3-hf
--port 8092
--gpu-memory-utilization 0.7
--host 0.0.0.0
--trust-remote-code

Sign up or log in to comment