TigreGotico/vconnx-facodec

ONNX artifacts for the FACodec (NaturalSpeech 3) voice-conversion engine, part of vconnx.

License

Upstream weights: Apache-2.0 — amphion/naturalspeech3_facodec.
Code: Amphion (open-mmlab/Amphion) — Apache-2.0.
ONNX exports: Apache-2.0 (same upstream license, stated here per vconnx publish-all policy).

Architecture

FACodec (Ju et al., NaturalSpeech 3, ICML 2024) disentangles speech into content, prosody, timbre, and acoustic detail subspaces. Voice conversion is zero-shot: encode source → quantize → swap reference timbre embedding → decode.

Component	Input → Output	Description
	wav(1,1,N) → enc_feats(1,256,T)	Convolutional encoder (hop=200)
	enc_feats(1,256,T) → spk_embs(1,256)	TransformerEncoder timbre extractor
	(enc_feats, mel_20(1,20,T)) → vq_ids(6,1,T)	Factorised VQ — prosody(1)+content(2)+residual(3)
	(vq_ids, spk_embs) → wav(1,1,N)	vq2emb + AdaIN + conv decoder

Parity (fp32 torch vs ORT)

Component	max_abs Δ	mean_abs Δ	Verdict
encoder	1.62e-05	2.36e-06	PASS
timbre	1.43e-06	6.40e-08	PASS
quantize	exact int64 match	—	PASS
decoder	7.50e-09	1.46e-09	PASS

Model sizes

File	Size
(fp32)	16.5 MB
(INT8)	4.7 MB
(fp32)	33.0 MB
(INT8)	12.1 MB
(fp32)	33.4 MB
(INT8)	12.5 MB
(fp32)	66.2 MB
(INT8)	36.8 MB

Intelligibility (WER gate ≤ 25%)

Tested with edge-tts source (en-US-GuyNeural) converted to two reference voices (en-US-AriaNeural, en-GB-SoniaNeural) using Whisper base.en transcription.

Reference voice	WER	Gate
en-US-AriaNeural	0%	✓ PASS
en-GB-SoniaNeural	0%	✓ PASS

References

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including TigreGotico/voiceclonnx-facodec

voiceclonnx — pure-ONNX voice conversion

Collection

ONNX exports powering the vconnx voice-conversion library: one repo per engine, with parity reports and provenance. • 10 items • Updated 8 days ago

Paper for TigreGotico/voiceclonnx-facodec

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Paper • 2403.03100 • Published Mar 5, 2024 • 37