vLLM online serve

by chaurAr - opened Apr 1

Apr 1

Just wanted to post these commands if someone is having trouble with serving the model with vLLM.
Cuda version: 13.0

uv venv speechllm
source speechllm/bin/activate
uv pip install -U --pre vllm
--torch-backend=auto
--extra-index-url https://wheels.vllm.ai/nightly/cu130

uv pip install "vllm[audio]"
uv pip install --upgrade transformers

vllm serve nvidia/audio-flamingo-3-hf
--port 8092
--gpu-memory-utilization 0.7
--host 0.0.0.0
--trust-remote-code

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment