IaraTTS — SFT v1 (Phase 2 checkpoint)

Brazilian Portuguese TTS — full SFT of MOSS-TTS-Nano-100M on Erinome dataset.

Results

Model	WER (50-prompt holdout)	N
Baseline MOSS-TTS-Nano-100M	0.5316	50
IaraTTS-SFT-v1 (this checkpoint)	0.1537	50
Δ	−0.3779 (−71% relative)

Whisper-base ASR for round-trip eval, language=pt, fp16, jiwer for WER.

Training

Hyperparam	Value
Base model	`OpenMOSS-Team/MOSS-TTS-Nano-100M`
Codec	`OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano`
Dataset	`marcosremar2/gemini-dataset-erinome` (4929 valid text+wav pairs)
per_device_batch_size	8
gradient_accumulation_steps	4
global_batch_size	32
epochs	3 (465 steps)
learning_rate	5e-5 cosine, warmup 5%
mixed_precision	bf16
attn_implementation	sdpa
GPU	RTX 4090 (Vast.ai)
Wall time	~10 min
Loss	5.5 → 4.7

Architecture roadmap

This is Phase 2 of a multi-phase IaraTTS roadmap targeting browser deploy at ≤150M params. Subsequent phases (in companion repo iaratts-demo):

Phase 3 (zero-GPU): pt-BR text frontend (Gruut), RAS sampling, voice profile cache, Meta Quest viseme stream, in-session style continuity hybrid.
Phase 4 ($60–110): bilingual continued pretrain, hybrid TF + EOS sub-loss, paralinguistic tag SFT (<laugh>/<sigh>) + IndexTTS2 instruction LM.
Phase 5 ($80–150): X-Codec 2 codec swap, DCAR chunk-AR, Spark-TTS attribute tokens.
Phase 5.5 ($115–190): CosyVoice-2-style streaming AR distillation to 150M + Speech Speculative Decoding (1.4×) + Multi-Token Prediction 8 heads (4–5×) + SpeakStream interleaved training. Target TTFT 80–180ms WebGPU M1.

See full roadmap in companion repo.

Inference

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("marcosremar2/iaratts-sft-v1", trust_remote_code=True)
# Use upstream MOSS-TTS-Nano `infer.py` with this checkpoint:
#   python infer.py --checkpoint ./iaratts-sft-v1 \
#     --audio-tokenizer-pretrained-name-or-path OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano \
#     --text "Hoje a tarde está ensolarada." \
#     --output-audio-path out.wav --mode continuation --seed 42

License

MIT — same as upstream MOSS-TTS-Nano.

Citation

@misc{iaratts-sft-v1,
  author = {marcosremar2},
  title  = {IaraTTS SFT v1 — pt-BR fine-tune of MOSS-TTS-Nano-100M on Erinome},
  year   = {2026},
  url    = {https://huggingface.co/marcosremar2/iaratts-sft-v1}
}

Downloads last month: 11

Model tree for marcosremar2/iaratts-sft-v1

Base model

OpenMOSS-Team/MOSS-TTS-Nano-100M

Finetuned

(6)

this model

marcosremar2
/

iaratts-sft-v1