Instructions to use DennisHuang648/VoxCPM2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- VoxCPM
How to use DennisHuang648/VoxCPM2-GGUF with VoxCPM:
import soundfile as sf from voxcpm import VoxCPM model = VoxCPM.from_pretrained("DennisHuang648/VoxCPM2-GGUF") wav = model.generate( text="VoxCPM is an innovative end-to-end TTS model from ModelBest, designed to generate highly expressive speech.", prompt_wav_path=None, # optional: path to a prompt speech for voice cloning prompt_text=None, # optional: reference text cfg_value=2.0, # LM guidance on LocDiT, higher for better adherence to the prompt, but maybe worse inference_timesteps=10, # LocDiT inference timesteps, higher for better result, lower for fast speed normalize=True, # enable external TN tool denoise=True, # enable external Denoise tool retry_badcase=True, # enable retrying mode for some bad cases (unstoppable) retry_badcase_max_times=3, # maximum retrying times retry_badcase_ratio_threshold=6.0, # maximum length restriction for bad case detection (simple but effective), it could be adjusted for slow pace speech ) sf.write("output.wav", wav, 16000) print("saved: output.wav") - Notebooks
- Google Colab
- Kaggle
VoxCPM2 β GGUF weights for llama.cpp-omni
GGUF-converted weights of openbmb/VoxCPM2
for the C++/ggml inference engine
llama.cpp-omni (tools/omni/voxcpm2).
These let you run VoxCPM2 text-to-speech and zero-shot voice cloning natively on CPU / Metal / CUDA / Vulkan via ggml β no PyTorch runtime required.
Files
| File | Format | Size | Component |
|---|---|---|---|
VoxCPM2-BaseLM-F16.gguf |
F16 | ~3.0 GB | Base language model (28-layer, n_embd=2048) |
VoxCPM2-BaseLM-Q8_0.gguf |
Q8_0 | ~1.6 GB | Base language model, 8-bit quantized (recommended) |
VoxCPM2-Acoustic-F16.gguf |
F16 | ~1.7 GB | Acoustic stack (ResidualLM + FSQ + LocEnc/LocDiT CFM + AudioVAE) |
VoxCPM2 outputs 48 kHz mono audio. You need one BaseLM (F16 or Q8_0) + the Acoustic file. Q8_0 halves the BaseLM download size with negligible quality loss and a slight speed boost.
Getting Started
1. Clone
git clone https://github.com/tc-mb/llama.cpp-omni.git
cd llama.cpp-omni
2. Build
# macOS (Metal β auto-detected):
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target voxcpm2-cli -j
#Linux / Windows (CUDA):
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_CUDA=ON
cmake --build build --target voxcpm2-cli -j
# Linux / Windows (Vulkan):
cmake -B build -DCMAKE_BUILD_TYPE=Release -DGGML_VULKAN=ON
cmake --build build --target voxcpm2-cli -j
# CPU only:
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --target voxcpm2-cli -j
3. Download GGUF Weights
You need one BaseLM (F16 or Q8_0) + the Acoustic file.
# Download Q8_0 (recommended)
huggingface-cli download DennisHuang648/VoxCPM2-
VoxCPM2-BaseLM-Q8_0.gguf VoxCPM2-Acoustic-F16.gguf \
--local-dir ./models
# Or download F16 (full precision)
huggingface-cli download DennisHuang648/VoxCPM2-GGUF \
VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16
--local-dir ./models
Usage
Build voxcpm2-cli from llama.cpp-omni, then:
# Basic TTS (GPU by default; add --cpu to force CPU)
./voxcpm2-cli \
-t "Hello, this is VoxCPM2 running through llama.cpp-omni." \
-o output.wav \
VoxCPM2-BaseLM-F16.gguf \
VoxCPM2-Acoustic-F16.gguf
# Voice cloning (reference audio)
./voxcpm2-cli -t "Cloned voice." -r speaker.wav -o clone.wav \
VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16.gguf
# Reference-transcript ("ultimate") cloning
./voxcpm2-cli -t "Target text." --prompt-wav speaker.wav --prompt-text "transcript of speaker.wav" \
-o clone.wav VoxCPM2-BaseLM-F16.gguf VoxCPM2-Acoustic-F16.gguf
Key flags: --cfg (guidance scale, default 2.0), --timesteps (CFM steps, default 10),
--seed, --temperature, --stream. Voice design: prefix the text with
(a calm female voice)β¦.
Conversion
Produced with the official converter against the upstream PyTorch weights:
python tools/omni/voxcpm2/convert_voxcpm2_to_gguf.py \
--model model.safetensors \
--vae audiovae.pth \
--config config.json \
--output ./out
License & attribution
Weights derive from openbmb/VoxCPM2; their original license/terms apply. Conversion tooling and inference engine: llama.cpp-omni.
- Downloads last month
- 2,992
8-bit
16-bit
Model tree for DennisHuang648/VoxCPM2-GGUF
Base model
openbmb/VoxCPM2