Zenox TTS Multilingual
A fine-tuned XTTS-v2 model for Chinese and English text-to-speech synthesis with voice cloning capability. Built by ZENOX (Zayyan Waheed).
About
This model is based on Coqui XTTS-v2 and has been fine-tuned on Chinese (Traditional/Simplified) audio data. It supports multilingual synthesis and real-time voice cloning from a short audio sample.
The model was trained on 94 audio clips and achieves a final loss of 0.020, indicating high quality voice reproduction.
Features
- ๐จ๐ณ Chinese (zh-cn) text to speech
- ๐ฌ๐ง English text to speech
- ๐๏ธ Voice cloning โ clone any voice from 6-30 seconds of audio
- ๐ REST API ready with FastAPI
- ๐ API key authentication support
- ๐ป CPU and GPU compatible
Supported Languages
XTTS-v2 supports 17 languages including Chinese, English, Spanish, French, German, Japanese, Korean and more.
Usage
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(
config,
checkpoint_dir="./",
vocab_path="vocab.json",
eval=True
)
outputs = model.synthesize(
text="ๅคงๅฎถๅฅฝ๏ผๆญก่ฟๅไพๅๅ ๅญธ็ฟ่ชฒ็จใ",
config=config,
speaker_wav="your_voice.wav",
language="zh-cn",
)
REST API
A full FastAPI wrapper with API key authentication is available at: ๐ https://github.com/zayyanwaheed/zenox-tts-api
Model Files
model.pthโ fine-tuned model weightsconfig.jsonโ model configurationvocab.jsonโ tokenizer vocabulary
Training Details
- Base model: XTTS-v2
- Training samples: 94 audio clips
- Epochs: 5
- Final loss: 0.020
- Language: Chinese (zh-cn)
Credits
Built by ZENOX โ Zayyan Waheed--- license: mit language: - zh - en tags: - text-to-speech - voice-cloning - xtts - chinese - english
Zenox TTS Multilingual
A fine-tuned XTTS-v2 model for Chinese and English text-to-speech synthesis with voice cloning capability. Built by ZENOX (Zayyan Waheed).
About
This model is based on Coqui XTTS-v2 and has been fine-tuned on Chinese (Traditional/Simplified) audio data. It supports multilingual synthesis and real-time voice cloning from a short audio sample.
The model was trained on 94 audio clips and achieves a final loss of 0.020, indicating high quality voice reproduction.
Features
- ๐จ๐ณ Chinese (zh-cn) text to speech
- ๐ฌ๐ง English text to speech
- ๐๏ธ Voice cloning โ clone any voice from 6-30 seconds of audio
- ๐ REST API ready with FastAPI
- ๐ API key authentication support
- ๐ป CPU and GPU compatible
Supported Languages
XTTS-v2 supports 17 languages including Chinese, English, Spanish, French, German, Japanese, Korean and more.
Usage
from TTS.tts.configs.xtts_config import XttsConfig
from TTS.tts.models.xtts import Xtts
config = XttsConfig()
config.load_json("config.json")
model = Xtts.init_from_config(config)
model.load_checkpoint(
config,
checkpoint_dir="./",
vocab_path="vocab.json",
eval=True
)
outputs = model.synthesize(
text="ๅคงๅฎถๅฅฝ๏ผๆญก่ฟๅไพๅๅ ๅญธ็ฟ่ชฒ็จใ",
config=config,
speaker_wav="your_voice.wav",
language="zh-cn",
)
REST API
A full FastAPI wrapper with API key authentication is available at: ๐ https://github.com/zayyanwaheed/zenox-tts-api
Model Files
model.pthโ fine-tuned model weightsconfig.jsonโ model configurationvocab.jsonโ tokenizer vocabulary
Training Details
- Base model: XTTS-v2
- Training samples: 94 audio clips
- Epochs: 5
- Final loss: 0.020
- Language: Chinese (zh-cn)
Credits
Built by ZENOX โ Zayyan Waheed