Roxi-TTS v3 β Indian-English (alternate voice)
A second Indian-English LoRA fine-tune of MOSS-TTS-Nano,
on a different IndicTTS-English speaker than roxi-tts-v2
β trained on more data (~70 min) to compare voices. 48 kHz. Includes the cross-version
compatibility fixes (loads on transformers 4.57.1 β 5.x code paths; SDPA default, no flash-attn).
v2 vs v3
| roxi-tts-v2 | roxi-tts-v3 | |
|---|---|---|
| Speaker | IndicTTS spk A (~50 min) | IndicTTS spk B (~70 min), distinct voice (0.66 sim to v2) |
| Speaker-sim to its target | 0.96 | 0.96 |
| Intelligibility WER | 0.26 | 0.29 |
| Notes | mild read voice | different timbre; fuller data β fewer early cut-offs in testing |
Both are ~0.1 B models on read-speech studio data, so both still sound somewhat synthetic β pick whichever voice you prefer by ear.
Requirements & usage
Use transformers==4.57.1 (this custom code misbehaves on transformers 5.x β NaN/noise).
pip install "transformers==4.57.1" torch torchaudio soundfile sentencepiece librosa
import torch
from transformers import AutoModelForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
"IOTEverythin/roxi-tts-v3", trust_remote_code=True, dtype=torch.float32
).to(device).eval()
res = model.inference(
text="Welcome. Your appointment is confirmed for Monday at ten thirty in the morning.",
output_audio_path="out.wav", mode="continuation",
audio_tokenizer_type="moss-audio-tokenizer-nano",
audio_tokenizer_pretrained_name_or_path="OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano",
device=device, audio_repetition_penalty=1.1, use_kv_cache=True,
)
from IPython.display import Audio; Audio("out.wav")
Generation is stochastic β if a clip cuts off, re-run (or use the retry+trim helper). Keep sentences short for reliability.
Attribution & license
Apache-2.0. Built on MOSS-TTS-Nano (Apache-2.0) + audio tokenizer (Apache-2.0). Training data:
IIT-Madras Indic TTS (English) via SPRINGLab/IndicTTS-English. Required notice:
"COPYRIGHT 2016 TTS Consortium, TDIL, Meity β Hema A. Murthy & S. Umesh β IIT Madras. ALL RIGHTS RESERVED."
Do not use to impersonate real people or for deception; disclose AI-generated audio where required.
- Downloads last month
- 21
Model tree for IOTEverythin/roxi-tts-v3
Base model
OpenMOSS-Team/MOSS-TTS-Nano-100M