X-Voice GGUF

Self-contained GGUF bundles for the X-Voice GGML C++ runtime.

Runtime repository:

https://github.com/bluryar/X-Voice.cpp

Upstream model source:

https://huggingface.co/XRXRX/X-Voice
https://github.com/sunnyxrxrx/X-Voice

Files

file approximate size sha256
x-voice-f32.gguf 1.6 GB 077d8dec4d94ab4562ae31cab4c0e000a9bb63a831f9ee3a5da7c8e7587df347
x-voice-f16.gguf 1.1 GB 82b8bf7604da1f64c4d98e8e6afc11b3afda57058627a01054e8fc50b44505b3
x-voice-q8_0.gguf 848 MB 8cb3452ad7aa1730047572bd38f22a92f87256a7dc803124b93a955b1f8ae5a5
x-voice-q6_k.gguf 790 MB 5c4eb825d890f69fc706c4e06b5dd5ddec8b4ff3311cde0fc12bfa43fbcc4cf9
x-voice-q4_k.gguf 729 MB 776f05d88d7bc252298272159ab91176029f4238b7eb7902d982e83bce8746a5

The quantized files use a conservative X-Voice policy: large GGML mul_mat matrix weights are converted, while conv, norm, bias, embedding, positional, and small tensors remain in their source type.

Quantization Benchmark

Single-run local benchmark on an RTX 4060 Ti with NVIDIA_TF32_OVERRIDE=0, --preset product, cfg_nonlayered, 32 sampler steps, the default zh sample text, and /root/code/ggbond/models/test.wav:

model size wall sampler load mel max abs vs f32 mel mean abs vs f32 wav max abs vs f32 wav mean abs vs f32
f32 1544.1 MiB 10.74s 9.222s 0.515s 0 0 0 0
f16 1060.5 MiB 8.89s 7.316s 0.513s 1.40578 0.01232 0.89212 0.01086
q8_0 847.4 MiB 8.52s 7.268s 0.303s 0.78026 0.02397 1.01920 0.01708
q6_k 790.0 MiB 8.66s 7.407s 0.301s 0.89031 0.04256 0.68326 0.02387
q4_k 728.8 MiB 8.61s 7.277s 0.319s 5.72519 0.28411 1.22644 0.07608

Interpretation: f16/q8_0/q6_k are the practical release candidates. q4_k is size-first and should be auditioned carefully before use as a quality default.

Quick Start

git clone --recursive https://github.com/bluryar/X-Voice.cpp
cd X-Voice.cpp
bash scripts/dev/build_xvoice_cuda.sh

NVIDIA_TF32_OVERRIDE=0 build-cuda/x-voice-cli \
  --model /path/to/x-voice-f32.gguf \
  --load-tensors \
  --synthesize \
  --text '发帖人(弟弟)在详细描述其五口之家的现状后,寻求处理家庭问题的建议,并提出了自己的初步计划。' \
  --text-kind plain \
  --language zh \
  --ref-wav /path/to/reference.wav \
  --preset product \
  --output-wav /tmp/xvoice.wav \
  --metadata-json /tmp/xvoice.json \
  --progress \
  -b cuda -t 8

License

The C++ runtime is Apache-2.0. These model artifacts are converted from upstream X-Voice weights; please follow the upstream model license and terms for any redistribution or commercial use.

Downloads last month
29
GGUF
Model size
0.4B params
Architecture
xvoice
Hardware compatibility
Log In to add your hardware

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support