IaraTTS β€” SFT v1 (Phase 2 checkpoint)

Brazilian Portuguese TTS β€” full SFT of MOSS-TTS-Nano-100M on Erinome dataset.

Results

Model WER (50-prompt holdout) N
Baseline MOSS-TTS-Nano-100M 0.5316 50
IaraTTS-SFT-v1 (this checkpoint) 0.1537 50
Ξ” βˆ’0.3779 (βˆ’71% relative)

Whisper-base ASR for round-trip eval, language=pt, fp16, jiwer for WER.

Training

Hyperparam Value
Base model OpenMOSS-Team/MOSS-TTS-Nano-100M
Codec OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano
Dataset marcosremar2/gemini-dataset-erinome (4929 valid text+wav pairs)
per_device_batch_size 8
gradient_accumulation_steps 4
global_batch_size 32
epochs 3 (465 steps)
learning_rate 5e-5 cosine, warmup 5%
mixed_precision bf16
attn_implementation sdpa
GPU RTX 4090 (Vast.ai)
Wall time ~10 min
Loss 5.5 β†’ 4.7

Architecture roadmap

This is Phase 2 of a multi-phase IaraTTS roadmap targeting browser deploy at ≀150M params. Subsequent phases (in companion repo iaratts-demo):

  • Phase 3 (zero-GPU): pt-BR text frontend (Gruut), RAS sampling, voice profile cache, Meta Quest viseme stream, in-session style continuity hybrid.
  • Phase 4 ($60–110): bilingual continued pretrain, hybrid TF + EOS sub-loss, paralinguistic tag SFT (<laugh>/<sigh>) + IndexTTS2 instruction LM.
  • Phase 5 ($80–150): X-Codec 2 codec swap, DCAR chunk-AR, Spark-TTS attribute tokens.
  • Phase 5.5 ($115–190): CosyVoice-2-style streaming AR distillation to 150M + Speech Speculative Decoding (1.4Γ—) + Multi-Token Prediction 8 heads (4–5Γ—) + SpeakStream interleaved training. Target TTFT 80–180ms WebGPU M1.

See full roadmap in companion repo.

Inference

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained("marcosremar2/iaratts-sft-v1", trust_remote_code=True)
# Use upstream MOSS-TTS-Nano `infer.py` with this checkpoint:
#   python infer.py --checkpoint ./iaratts-sft-v1 \
#     --audio-tokenizer-pretrained-name-or-path OpenMOSS-Team/MOSS-Audio-Tokenizer-Nano \
#     --text "Hoje a tarde estΓ‘ ensolarada." \
#     --output-audio-path out.wav --mode continuation --seed 42

License

MIT β€” same as upstream MOSS-TTS-Nano.

Citation

@misc{iaratts-sft-v1,
  author = {marcosremar2},
  title  = {IaraTTS SFT v1 β€” pt-BR fine-tune of MOSS-TTS-Nano-100M on Erinome},
  year   = {2026},
  url    = {https://huggingface.co/marcosremar2/iaratts-sft-v1}
}
Downloads last month
11
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for marcosremar2/iaratts-sft-v1

Finetuned
(6)
this model

Dataset used to train marcosremar2/iaratts-sft-v1