Roxi-TTS v3 β€” Indian-English (alternate voice)

A second Indian-English LoRA fine-tune of MOSS-TTS-Nano, on a different IndicTTS-English speaker than roxi-tts-v2 β€” trained on more data (~70 min) to compare voices. 48 kHz. Includes the cross-version compatibility fixes (loads on transformers 4.57.1 β†’ 5.x code paths; SDPA default, no flash-attn).

v2 vs v3

roxi-tts-v2 roxi-tts-v3
Speaker IndicTTS spk A (~50 min) IndicTTS spk B (~70 min), distinct voice (0.66 sim to v2)
Speaker-sim to its target 0.96 0.96
Intelligibility WER 0.26 0.29
Notes mild read voice different timbre; fuller data β†’ fewer early cut-offs in testing

Both are ~0.1 B models on read-speech studio data, so both still sound somewhat synthetic β€” pick whichever voice you prefer by ear.

Requirements & usage

Use transformers==4.57.1 (this custom code misbehaves on transformers 5.x β€” NaN/noise).

pip install "transformers==4.57.1" torch torchaudio soundfile sentencepiece librosa
import torch
from transformers import AutoModelForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
    "IOTEverythin/roxi-tts-v3", trust_remote_code=True, dtype=torch.float32
).to(device).eval()
res = model.inference(
    text="Welcome. Your appointment is confirmed for Monday at ten thirty in the morning.",
    output_audio_path="out.wav", mode="continuation",
    audio_tokenizer_type="moss-audio-tokenizer-nano",
    audio_tokenizer_pretrained_name_or_path="OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano",
    device=device, audio_repetition_penalty=1.1, use_kv_cache=True,
)
from IPython.display import Audio; Audio("out.wav")

Generation is stochastic β€” if a clip cuts off, re-run (or use the retry+trim helper). Keep sentences short for reliability.

Attribution & license

Apache-2.0. Built on MOSS-TTS-Nano (Apache-2.0) + audio tokenizer (Apache-2.0). Training data: IIT-Madras Indic TTS (English) via SPRINGLab/IndicTTS-English. Required notice: "COPYRIGHT 2016 TTS Consortium, TDIL, Meity β€” Hema A. Murthy & S. Umesh β€” IIT Madras. ALL RIGHTS RESERVED." Do not use to impersonate real people or for deception; disclose AI-generated audio where required.

Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for IOTEverythin/roxi-tts-v3

Adapter
(3)
this model