metadata
license: apache-2.0
language:
- vi
VieNeu-Codec: The Heart of VieNeu-TTS v2
VieNeu-Codec is the high-performance audio engine built specifically for the upcoming VieNeu-TTS v2. It is a neural audio codec trained on over 20,000 hours of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction.
This repository provides the optimized ONNX versions of the VieNeu-Codec for production use.
π Key Features
- 24kHz High-Fidelity: Crystal clear audio reconstruction optimized for the Vietnamese language.
- Zero-Shot Voice Cloning: Clone any voice with just 5 seconds of reference audio.
- Optimized for VieNeu-TTS v2: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS.
- Two Deployment Modes: Includes both FP32 (High Quality) and INT8 (High Speed) decoders.
π¦ Model Components
vieneu_decoder.onnx: (FP32) High-fidelity audio decoder for maximum quality.vieneu_decoder_int8.onnx: (INT8) Quantized decoder for fast CPU inference.
π οΈ Usage
Synthesize Speech
Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2):
sess_dec = ort.InferenceSession("vieneu_decoder.onnx")
audio = sess_dec.run(None, {
"content_ids": ids,
"voice": embedding
})[0]
π License & Attribution
Author: Pham Nguyen Ngoc Bao
Project: VieNeu-Codec (for VieNeu-TTS v2)
Version: 2.0