Zenox TTS Multilingual

A fine-tuned XTTS-v2 model for Chinese and English text-to-speech synthesis with voice cloning capability. Built by ZENOX (Zayyan Waheed).

About

This model is based on Coqui XTTS-v2 and has been fine-tuned on Chinese (Traditional/Simplified) audio data. It supports multilingual synthesis and real-time voice cloning from a short audio sample.

The model was trained on 94 audio clips and achieves a final loss of 0.020, indicating high quality voice reproduction.

Features

🇨🇳 Chinese (zh-cn) text to speech
🇬🇧 English text to speech
🎙️ Voice cloning — clone any voice from 6-30 seconds of audio
🚀 REST API ready with FastAPI
🔑 API key authentication support
💻 CPU and GPU compatible

Supported Languages

XTTS-v2 supports 17 languages including Chinese, English, Spanish, French, German, Japanese, Korean and more.

Usage

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_dir="./",
    vocab_path="vocab.json",
    eval=True
)

outputs = model.synthesize(
    text="大家好，歡迎回來參加學習課程。",
    config=config,
    speaker_wav="your_voice.wav",
    language="zh-cn",
)

REST API

A full FastAPI wrapper with API key authentication is available at: 👉 https://github.com/zayyanwaheed/zenox-tts-api

Model Files

model.pth — fine-tuned model weights
config.json — model configuration
vocab.json — tokenizer vocabulary

Training Details

Base model: XTTS-v2
Training samples: 94 audio clips
Epochs: 5
Final loss: 0.020
Language: Chinese (zh-cn)

Credits

Built by ZENOX — Zayyan Waheed--- license: mit language: - zh - en tags: - text-to-speech - voice-cloning - xtts - chinese - english