Hojo-TTS-Light

Hojo-TTS-Light is an open-source lightweight Text-To-Speech model by HojoAI team. With only 0.08B parameters, that is, the parametere size of backbone LM is only 80M, Hojo-TTS-Light can generate good enough quality speech (average DNSMOS>4.0 on Seed-TTS eval dataset). Currently, Hojo-TTS-Light supports both Chinese and English, and also supports voice cloning with a few seconds of audio.

Features

  • Ultra-Lightweight Core Model --- The core language model is only 80M parameters, with extremely small parameter size under the same sound quality and very low deployment threshold.
  • Native Bilingual Integration --- A single model supports smooth synthesis and cross-lingual voice cloning for both Chinese and English, no branch switching required.
  • Voice Cloning --- High similarity voice cloning can be completed with a small amount of reference audio, featuring natural prosody, high voice restoration.
  • Low Computational Cost & On-Device Friendly --- Low memory usage and low inference overhead, which can run smoothly on CPU, ordinary GPU, and embedded edge devices.
  • Ready to Use --- Provides simple inference scripts and fast calling interfaces, enabling synthesis and cloning with one line of code, facilitating secondary development and business integration.
  • Supports quick correction --- For the problem of easily mispronouncing Chinese and English polyphonic characters and proper nouns, users can directly use Pinyin to correct pronunciation errors.

Model Details

  • The model follows the Token-LM model paradim.
  • The speech tokenizer is composed of a 18M encoder and a 30M decoder.
  • We use FSQ which inherently enables higher codebook utilization, the codebook size is 8000 for audio and totally <20000.
  • Currently the released version runs at 50Hz token rate and the 12.5hz version models will be released soon.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support