Zenox TTS Multilingual

A fine-tuned XTTS-v2 model for Chinese and English text-to-speech synthesis with voice cloning capability. Built by ZENOX (Zayyan Waheed).

About

This model is based on Coqui XTTS-v2 and has been fine-tuned on Chinese (Traditional/Simplified) audio data. It supports multilingual synthesis and real-time voice cloning from a short audio sample.

The model was trained on 94 audio clips and achieves a final loss of 0.020, indicating high quality voice reproduction.

Features

  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese (zh-cn) text to speech
  • ๐Ÿ‡ฌ๐Ÿ‡ง English text to speech
  • ๐ŸŽ™๏ธ Voice cloning โ€” clone any voice from 6-30 seconds of audio
  • ๐Ÿš€ REST API ready with FastAPI
  • ๐Ÿ”‘ API key authentication support
  • ๐Ÿ’ป CPU and GPU compatible

Supported Languages

XTTS-v2 supports 17 languages including Chinese, English, Spanish, French, German, Japanese, Korean and more.

Usage

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_dir="./",
    vocab_path="vocab.json",
    eval=True
)

outputs = model.synthesize(
    text="ๅคงๅฎถๅฅฝ๏ผŒๆญก่ฟŽๅ›žไพ†ๅƒๅŠ ๅญธ็ฟ’่ชฒ็จ‹ใ€‚",
    config=config,
    speaker_wav="your_voice.wav",
    language="zh-cn",
)

REST API

A full FastAPI wrapper with API key authentication is available at: ๐Ÿ‘‰ https://github.com/zayyanwaheed/zenox-tts-api

Model Files

  • model.pth โ€” fine-tuned model weights
  • config.json โ€” model configuration
  • vocab.json โ€” tokenizer vocabulary

Training Details

  • Base model: XTTS-v2
  • Training samples: 94 audio clips
  • Epochs: 5
  • Final loss: 0.020
  • Language: Chinese (zh-cn)

Credits

Built by ZENOX โ€” Zayyan Waheed--- license: mit language: - zh - en tags: - text-to-speech - voice-cloning - xtts - chinese - english

Zenox TTS Multilingual

A fine-tuned XTTS-v2 model for Chinese and English text-to-speech synthesis with voice cloning capability. Built by ZENOX (Zayyan Waheed).

About

This model is based on Coqui XTTS-v2 and has been fine-tuned on Chinese (Traditional/Simplified) audio data. It supports multilingual synthesis and real-time voice cloning from a short audio sample.

The model was trained on 94 audio clips and achieves a final loss of 0.020, indicating high quality voice reproduction.

Features

  • ๐Ÿ‡จ๐Ÿ‡ณ Chinese (zh-cn) text to speech
  • ๐Ÿ‡ฌ๐Ÿ‡ง English text to speech
  • ๐ŸŽ™๏ธ Voice cloning โ€” clone any voice from 6-30 seconds of audio
  • ๐Ÿš€ REST API ready with FastAPI
  • ๐Ÿ”‘ API key authentication support
  • ๐Ÿ’ป CPU and GPU compatible

Supported Languages

XTTS-v2 supports 17 languages including Chinese, English, Spanish, French, German, Japanese, Korean and more.

Usage

from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts

config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(
    config,
    checkpoint_dir="./",
    vocab_path="vocab.json",
    eval=True
)

outputs = model.synthesize(
    text="ๅคงๅฎถๅฅฝ๏ผŒๆญก่ฟŽๅ›žไพ†ๅƒๅŠ ๅญธ็ฟ’่ชฒ็จ‹ใ€‚",
    config=config,
    speaker_wav="your_voice.wav",
    language="zh-cn",
)

REST API

A full FastAPI wrapper with API key authentication is available at: ๐Ÿ‘‰ https://github.com/zayyanwaheed/zenox-tts-api

Model Files

  • model.pth โ€” fine-tuned model weights
  • config.json โ€” model configuration
  • vocab.json โ€” tokenizer vocabulary

Training Details

  • Base model: XTTS-v2
  • Training samples: 94 audio clips
  • Epochs: 5
  • Final loss: 0.020
  • Language: Chinese (zh-cn)

Credits

Built by ZENOX โ€” Zayyan Waheed

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support