Audio-Text-to-Text
Transformers
Safetensors
English
audioflamingo3
text2text-generation
audio
reasoning
audio understanding
ASR
Instructions to use nvidia/audio-flamingo-3-hf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/audio-flamingo-3-hf with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForSeq2SeqLM processor = AutoProcessor.from_pretrained("nvidia/audio-flamingo-3-hf") model = AutoModelForSeq2SeqLM.from_pretrained("nvidia/audio-flamingo-3-hf") - Notebooks
- Google Colab
- Kaggle
vLLM online serve
#3
by chaurAr - opened
Just wanted to post these commands if someone is having trouble with serving the model with vLLM.
Cuda version: 13.0
uv venv speechllm
source speechllm/bin/activate
uv pip install -U --pre vllm
--torch-backend=auto
--extra-index-url https://wheels.vllm.ai/nightly/cu130
uv pip install "vllm[audio]"
uv pip install --upgrade transformers
vllm serve nvidia/audio-flamingo-3-hf
--port 8092
--gpu-memory-utilization 0.7
--host 0.0.0.0
--trust-remote-code