Upload folder using huggingface_hub

9c0a610 verified 5 days ago

1.5 kB

license: apache-2.0
language:
  - vi

VieNeu-Codec: The Heart of VieNeu-TTS v2

VieNeu-Codec is the high-performance audio engine built specifically for the upcoming VieNeu-TTS v2. It is a neural audio codec trained on over 20,000 hours of diverse Vietnamese and English speech data, ensuring state-of-the-art robustness, natural prosody, and crystal-clear audio reconstruction.

This repository provides the optimized ONNX versions of the VieNeu-Codec for production use.

🚀 Key Features

24kHz High-Fidelity: Crystal clear audio reconstruction optimized for the Vietnamese language.
Zero-Shot Voice Cloning: Clone any voice with just 5 seconds of reference audio.
Optimized for VieNeu-TTS v2: Seamlessly integrates with the next-generation LLM backbone of VieNeu-TTS.
Two Deployment Modes: Includes both FP32 (High Quality) and INT8 (High Speed) decoders.

📦 Model Components

vieneu_decoder.onnx: (FP32) High-fidelity audio decoder for maximum quality.
vieneu_decoder_int8.onnx: (INT8) Quantized decoder for fast CPU inference.

🛠️ Usage

Synthesize Speech

Combine the speaker embedding with content tokens from your LLM (VieNeu-TTS v2):

sess_dec = ort.InferenceSession("vieneu_decoder.onnx")
audio = sess_dec.run(None, {
    "content_ids": ids,
    "voice": embedding
})[0]

📄 License & Attribution

Author: Pham Nguyen Ngoc Bao
Project: VieNeu-Codec (for VieNeu-TTS v2)
Version: 2.0