VoxCPM2 — GGUF weights for llama.cpp-omni

GGUF-converted weights of openbmb/VoxCPM2 for the C++/ggml inference engine llama.cpp-omni (tools/omni/voxcpm2).

These let you run VoxCPM2 text-to-speech and zero-shot voice cloning natively on CPU / Metal / CUDA / Vulkan via ggml — no PyTorch runtime required.

Files

File	Format	Size	Component
`VoxCPM2-BaseLM-F16.gguf`	F16	~3.0 GB	Base language model (28-layer, n_embd=2048)
`VoxCPM2-BaseLM-Q8_0.gguf`	Q8_0	~1.6 GB	Base language model, 8-bit quantized (recommended)
`VoxCPM2-Acoustic-F16.gguf`	F16	~1.7 GB	Acoustic stack (ResidualLM + FSQ + LocEnc/LocDiT CFM + AudioVAE)

VoxCPM2 outputs 48 kHz mono audio. You need one BaseLM (F16 or Q8_0) + the Acoustic file. Q8_0 halves the BaseLM download size with negligible quality loss and a slight speed boost.

Getting Started

1. Clone

git clone https://github.com/tc-mb/llama.cpp-omni.git
cd llama.cpp-omni

2. Build

# macOS (Metal — auto-detected):
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target voxcpm2-cli -j

#Linux / Windows (CUDA):
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build --target voxcpm2-cli -j

# Linux / Windows (Vulkan):
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON
cmake --build build --target voxcpm2-cli -j

# CPU only:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target voxcpm2-cli -j

3. Download GGUF Weights

You need one BaseLM (F16 or Q8_0) + the Acoustic file.

# Download Q8_0 (recommended)
huggingface-cli download DennisHuang648/VoxCPM2-
    VoxCPM2-BaseLM-Q8_0.gguf VoxCPM2-Acoustic-F16.gguf \
    --local-dir ./models

# Or download F16 (full precision)
huggingface-cli download DennisHuang648/VoxCPM2-GGUF \
    VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16
    --local-dir ./models

Usage

Build voxcpm2-cli from llama.cpp-omni, then:

# Basic TTS (GPU by default; add --cpu to force CPU)
./voxcpm2-cli \
    -t "Hello, this is VoxCPM2 running through llama.cpp-omni." \
    -o output.wav \
    VoxCPM2-BaseLM-F16.gguf \
    VoxCPM2-Acoustic-F16.gguf

# Voice cloning (reference audio)
./voxcpm2-cli -t "Cloned voice." -r speaker.wav -o clone.wav \
    VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16.gguf

# Reference-transcript ("ultimate") cloning
./voxcpm2-cli -t "Target text." --prompt-wav speaker.wav --prompt-text "transcript of speaker.wav" \
    -o clone.wav VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16.gguf

Key flags: --cfg (guidance scale, default 2.0), --timesteps (CFM steps, default 10), --seed, --temperature, --stream. Voice design: prefix the text with (a calm female voice)….

Conversion

Produced with the official converter against the upstream PyTorch weights:

python tools/omni/voxcpm2/convert_voxcpm2_to_gguf.py \
    --model model.safetensors \
    --vae audiovae.pth \
    --config config.json \
    --output ./out

License & attribution

Weights derive from openbmb/VoxCPM2; their original license/terms apply. Conversion tooling and inference engine: llama.cpp-omni.

Downloads last month: 2,992

GGUF

Model size

0.9B params

Architecture

voxcpm-acoustic

Hardware compatibility

8-bit

16-bit

Model tree for DennisHuang648/VoxCPM2-GGUF

Base model

openbmb/VoxCPM2

Quantized

(9)

this model