X-Voice GGUF

Self-contained GGUF bundles for the X-Voice GGML C++ runtime.

Runtime repository:

https://github.com/bluryar/X-Voice.cpp

Upstream model source:

https://huggingface.co/XRXRX/X-Voice
https://github.com/sunnyxrxrx/X-Voice

Files

file	approximate size	sha256
`x-voice-f32.gguf`	1.6 GB	`077d8dec4d94ab4562ae31cab4c0e000a9bb63a831f9ee3a5da7c8e7587df347`
`x-voice-f16.gguf`	1.1 GB	`82b8bf7604da1f64c4d98e8e6afc11b3afda57058627a01054e8fc50b44505b3`
`x-voice-q8_0.gguf`	848 MB	`8cb3452ad7aa1730047572bd38f22a92f87256a7dc803124b93a955b1f8ae5a5`
`x-voice-q6_k.gguf`	790 MB	`5c4eb825d890f69fc706c4e06b5dd5ddec8b4ff3311cde0fc12bfa43fbcc4cf9`
`x-voice-q4_k.gguf`	729 MB	`776f05d88d7bc252298272159ab91176029f4238b7eb7902d982e83bce8746a5`

The quantized files use a conservative X-Voice policy: large GGML mul_mat matrix weights are converted, while conv, norm, bias, embedding, positional, and small tensors remain in their source type.

Quantization Benchmark

Single-run local benchmark on an RTX 4060 Ti with NVIDIA_TF32_OVERRIDE=0, --preset product, cfg_nonlayered, 32 sampler steps, the default zh sample text, and /root/code/ggbond/models/test.wav:

model	size	wall	sampler	load	mel max abs vs f32	mel mean abs vs f32	wav max abs vs f32	wav mean abs vs f32
f32	1544.1 MiB	10.74s	9.222s	0.515s	0	0	0	0
f16	1060.5 MiB	8.89s	7.316s	0.513s	1.40578	0.01232	0.89212	0.01086
q8_0	847.4 MiB	8.52s	7.268s	0.303s	0.78026	0.02397	1.01920	0.01708
q6_k	790.0 MiB	8.66s	7.407s	0.301s	0.89031	0.04256	0.68326	0.02387
q4_k	728.8 MiB	8.61s	7.277s	0.319s	5.72519	0.28411	1.22644	0.07608

Interpretation: f16/q8_0/q6_k are the practical release candidates. q4_k is size-first and should be auditioned carefully before use as a quality default.

Quick Start

git clone --recursive https://github.com/bluryar/X-Voice.cpp
cd X-Voice.cpp
bash scripts/dev/build_xvoice_cuda.sh

NVIDIA_TF32_OVERRIDE=0 build-cuda/x-voice-cli \
  --model /path/to/x-voice-f32.gguf \
  --load-tensors \
  --synthesize \
  --text '发帖人（弟弟）在详细描述其五口之家的现状后，寻求处理家庭问题的建议，并提出了自己的初步计划。' \
  --text-kind plain \
  --language zh \
  --ref-wav /path/to/reference.wav \
  --preset product \
  --output-wav /tmp/xvoice.wav \
  --metadata-json /tmp/xvoice.json \
  --progress \
  -b cuda -t 8

License

The C++ runtime is Apache-2.0. These model artifacts are converted from upstream X-Voice weights; please follow the upstream model license and terms for any redistribution or commercial use.

Downloads last month: 29

GGUF

Model size

0.4B params

Architecture

xvoice

Hardware compatibility

6-bit

8-bit

16-bit

32-bit