X-Voice GGUF
Self-contained GGUF bundles for the X-Voice GGML C++ runtime.
Runtime repository:
https://github.com/bluryar/X-Voice.cpp
Upstream model source:
https://huggingface.co/XRXRX/X-Voice
https://github.com/sunnyxrxrx/X-Voice
Files
| file | approximate size | sha256 |
|---|---|---|
x-voice-f32.gguf |
1.6 GB | 077d8dec4d94ab4562ae31cab4c0e000a9bb63a831f9ee3a5da7c8e7587df347 |
x-voice-f16.gguf |
1.1 GB | 82b8bf7604da1f64c4d98e8e6afc11b3afda57058627a01054e8fc50b44505b3 |
x-voice-q8_0.gguf |
848 MB | 8cb3452ad7aa1730047572bd38f22a92f87256a7dc803124b93a955b1f8ae5a5 |
x-voice-q6_k.gguf |
790 MB | 5c4eb825d890f69fc706c4e06b5dd5ddec8b4ff3311cde0fc12bfa43fbcc4cf9 |
x-voice-q4_k.gguf |
729 MB | 776f05d88d7bc252298272159ab91176029f4238b7eb7902d982e83bce8746a5 |
The quantized files use a conservative X-Voice policy: large GGML mul_mat
matrix weights are converted, while conv, norm, bias, embedding, positional, and
small tensors remain in their source type.
Quantization Benchmark
Single-run local benchmark on an RTX 4060 Ti with NVIDIA_TF32_OVERRIDE=0,
--preset product, cfg_nonlayered, 32 sampler steps, the default zh sample
text, and /root/code/ggbond/models/test.wav:
| model | size | wall | sampler | load | mel max abs vs f32 | mel mean abs vs f32 | wav max abs vs f32 | wav mean abs vs f32 |
|---|---|---|---|---|---|---|---|---|
| f32 | 1544.1 MiB | 10.74s | 9.222s | 0.515s | 0 | 0 | 0 | 0 |
| f16 | 1060.5 MiB | 8.89s | 7.316s | 0.513s | 1.40578 | 0.01232 | 0.89212 | 0.01086 |
| q8_0 | 847.4 MiB | 8.52s | 7.268s | 0.303s | 0.78026 | 0.02397 | 1.01920 | 0.01708 |
| q6_k | 790.0 MiB | 8.66s | 7.407s | 0.301s | 0.89031 | 0.04256 | 0.68326 | 0.02387 |
| q4_k | 728.8 MiB | 8.61s | 7.277s | 0.319s | 5.72519 | 0.28411 | 1.22644 | 0.07608 |
Interpretation: f16/q8_0/q6_k are the practical release candidates. q4_k is size-first and should be auditioned carefully before use as a quality default.
Quick Start
git clone --recursive https://github.com/bluryar/X-Voice.cpp
cd X-Voice.cpp
bash scripts/dev/build_xvoice_cuda.sh
NVIDIA_TF32_OVERRIDE=0 build-cuda/x-voice-cli \
--model /path/to/x-voice-f32.gguf \
--load-tensors \
--synthesize \
--text '发帖人(弟弟)在详细描述其五口之家的现状后,寻求处理家庭问题的建议,并提出了自己的初步计划。' \
--text-kind plain \
--language zh \
--ref-wav /path/to/reference.wav \
--preset product \
--output-wav /tmp/xvoice.wav \
--metadata-json /tmp/xvoice.json \
--progress \
-b cuda -t 8
License
The C++ runtime is Apache-2.0. These model artifacts are converted from upstream X-Voice weights; please follow the upstream model license and terms for any redistribution or commercial use.
- Downloads last month
- 29
6-bit
8-bit
16-bit
32-bit