YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Route-2 v12 β phoneme-conditioned flow-matching TTS (PT-BR)
Phoneme rebuild of Route 2. Pipeline: text β espeak-ng pt-br G2P β phoneme IDs β
nn.Embedding β Matcha OT-CFM (duration predictor + MAS) β mel β frozen Vocos β
24 kHz. Lip-sync from the duration predictor's w_ceil @ 93.75 Hz.
Production gate: ASR (whisper PT) text_similarity β₯ 0.55 (0.41 = prior AR plateau).
Status (2026-05-29)
- Smoke PASSED (
smoke_v12.py): G2P, Embedding wiring, padding_idx zero, loss falls, lip-sync invariantsum(w_ceil)==y_length. - Overfit PASSED (
overfit_v12.py): 16 real utts, sim_mean 0.7154, median 0.787, 4 exact. Phoneme conditioning works; v11's failure was conditioning granularity, not Matcha/Vocos. - Full train: code ready; not yet run (blocked on stable GPU pod availability).
Files
r2_model_v12.pyβ TucanoMatchaTTS, Embedding in_proj (in_dim = phoneme vocab). 19.05M params.phon_util.pyβ espeak-ng pt-br G2P (punctuation-glue fixed). phonemize_batch / build_vocab / encode.gen_mel.pyβ Mimi decode (codesβaudio) β Vocos mel targets. Out:<pref>_concat.npy+_lengths.npy+_ids.json. Mimi viaMIMI_PATHenv orkyutai/mimi.prep_phon_v12.pyβ 94k jsonl βphon_vocab.json+r2_phon_ids.json(per-id ID lists).r2_train_v12.pyβ phoneme loader + OT-CFM train (dur+prior+diff), AdamW, ckpt full-state (model+opt) +--resume.pod_pipeline.shβ on-pod: setup β pull dataset (R2) β gen_mel β prep_phon β train. NeedsRCLONE_CONFIG_R2_*env.smoke_v12.py,overfit_v12.pyβ validation tests.route2_nova_arquitetura.htmlβ architecture explainer + APA references.
Data
94k utts / ~106 h PT-BR in R2 bucket tts-ptbr-training/data/ (train_94k.jsonl + val.jsonl).
Keys: speakable_text (G2P), mimi_codes_flat (βmel), duration_frames, id.
Run (on a GPU pod)
# scp these scripts to the pod, set RCLONE_CONFIG_R2_* env, then:
STEPS=30000 BATCH=32 bash pod_pipeline.sh # setup -> data -> gen_mel -> prep -> train
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support