Roxi-TTS v3 — Indian-English (alternate voice)

A second Indian-English LoRA fine-tune of MOSS-TTS-Nano, on a different IndicTTS-English speaker than roxi-tts-v2 — trained on more data (~70 min) to compare voices. 48 kHz. Includes the cross-version compatibility fixes (loads on transformers 4.57.1 → 5.x code paths; SDPA default, no flash-attn).

v2 vs v3

	roxi-tts-v2	roxi-tts-v3
Speaker	IndicTTS spk A (~50 min)	IndicTTS spk B (~70 min), distinct voice (0.66 sim to v2)
Speaker-sim to its target	0.96	0.96
Intelligibility WER	0.26	0.29
Notes	mild read voice	different timbre; fuller data → fewer early cut-offs in testing

Both are ~0.1 B models on read-speech studio data, so both still sound somewhat synthetic — pick whichever voice you prefer by ear.

Requirements & usage

Use transformers==4.57.1 (this custom code misbehaves on transformers 5.x — NaN/noise).

pip install "transformers==4.57.1" torch torchaudio soundfile sentencepiece librosa

import torch
from transformers import AutoModelForCausalLM
device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
    "IOTEverythin/roxi-tts-v3", trust_remote_code=True, dtype=torch.float32
).to(device).eval()
res = model.inference(
    text="Welcome. Your appointment is confirmed for Monday at ten thirty in the morning.",
    output_audio_path="out.wav", mode="continuation",
    audio_tokenizer_type="moss-audio-tokenizer-nano",
    audio_tokenizer_pretrained_name_or_path="OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano",
    device=device, audio_repetition_penalty=1.1, use_kv_cache=True,
)
from IPython.display import Audio; Audio("out.wav")

Generation is stochastic — if a clip cuts off, re-run (or use the retry+trim helper). Keep sentences short for reliability.

Attribution & license

Apache-2.0. Built on MOSS-TTS-Nano (Apache-2.0) + audio tokenizer (Apache-2.0). Training data: IIT-Madras Indic TTS (English) via SPRINGLab/IndicTTS-English. Required notice: "COPYRIGHT 2016 TTS Consortium, TDIL, Meity — Hema A. Murthy & S. Umesh — IIT Madras. ALL RIGHTS RESERVED." Do not use to impersonate real people or for deception; disclose AI-generated audio where required.

Downloads last month: 21

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for IOTEverythin/roxi-tts-v3

Base model

OpenMOSS-Team/MOSS-TTS-Nano-100M

Adapter

(3)

this model