VoxCPM2 β€” GGUF weights for llama.cpp-omni

GGUF-converted weights of openbmb/VoxCPM2 for the C++/ggml inference engine llama.cpp-omni (tools/omni/voxcpm2).

These let you run VoxCPM2 text-to-speech and zero-shot voice cloning natively on CPU / Metal / CUDA / Vulkan via ggml β€” no PyTorch runtime required.

Files

File Format Size Component
VoxCPM2-BaseLM-F16.gguf F16 ~3.0 GB Base language model (28-layer, n_embd=2048)
VoxCPM2-BaseLM-Q8_0.gguf Q8_0 ~1.6 GB Base language model, 8-bit quantized (recommended)
VoxCPM2-Acoustic-F16.gguf F16 ~1.7 GB Acoustic stack (ResidualLM + FSQ + LocEnc/LocDiT CFM + AudioVAE)

VoxCPM2 outputs 48 kHz mono audio. You need one BaseLM (F16 or Q8_0) + the Acoustic file. Q8_0 halves the BaseLM download size with negligible quality loss and a slight speed boost.

Getting Started

1. Clone

git clone https://github.com/tc-mb/llama.cpp-omni.git
cd llama.cpp-omni

2. Build

# macOS (Metal β€” auto-detected):
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target voxcpm2-cli -j

#Linux / Windows (CUDA):
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build --target voxcpm2-cli -j

# Linux / Windows (Vulkan):
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON
cmake --build build --target voxcpm2-cli -j

# CPU only:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target voxcpm2-cli -j

3. Download GGUF Weights

You need one BaseLM (F16 or Q8_0) + the Acoustic file.

# Download Q8_0 (recommended)
huggingface-cli download DennisHuang648/VoxCPM2-
    VoxCPM2-BaseLM-Q8_0.gguf VoxCPM2-Acoustic-F16.gguf \
    --local-dir ./models

# Or download F16 (full precision)
huggingface-cli download DennisHuang648/VoxCPM2-GGUF \
    VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16
    --local-dir ./models

Usage

Build voxcpm2-cli from llama.cpp-omni, then:

# Basic TTS (GPU by default; add --cpu to force CPU)
./voxcpm2-cli \
    -t "Hello, this is VoxCPM2 running through llama.cpp-omni." \
    -o output.wav \
    VoxCPM2-BaseLM-F16.gguf \
    VoxCPM2-Acoustic-F16.gguf

# Voice cloning (reference audio)
./voxcpm2-cli -t "Cloned voice." -r speaker.wav -o clone.wav \
    VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16.gguf

# Reference-transcript ("ultimate") cloning
./voxcpm2-cli -t "Target text." --prompt-wav speaker.wav --prompt-text "transcript of speaker.wav" \
    -o clone.wav VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16.gguf

Key flags: --cfg (guidance scale, default 2.0), --timesteps (CFM steps, default 10), --seed, --temperature, --stream. Voice design: prefix the text with (a calm female voice)….

Conversion

Produced with the official converter against the upstream PyTorch weights:

python tools/omni/voxcpm2/convert_voxcpm2_to_gguf.py \
    --model model.safetensors \
    --vae audiovae.pth \
    --config config.json \
    --output ./out

License & attribution

Weights derive from openbmb/VoxCPM2; their original license/terms apply. Conversion tooling and inference engine: llama.cpp-omni.

Downloads last month
2,992
GGUF
Model size
0.9B params
Architecture
voxcpm-acoustic
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for DennisHuang648/VoxCPM2-GGUF

Base model

openbmb/VoxCPM2
Quantized
(9)
this model