Mongolian VITS — My-Voice Fine-Tune

Speaker-adapted fine-tune of Bokhbat/mongolian-vits-tts: the multi-speaker Mongolian VITS model with one new voice (speaker01) added, without degrading the original Mongolian ability.

Base: multi-speaker VITS, 78 Mongolian speakers
This model: 79 speakers = original 78 (ids 0–77, unchanged) + speaker01 (id 78, the new voice)
Adaptation data: ~3.7 min (57 clips), single speaker
Best checkpoint: epoch 93 / step 609 (eval-loss best, early-stopped at plateau)
Sample rate: 22050 Hz

How Mongolian ability was protected (Strategy A)

Original 78 speaker ids preserved; new voice appended as id 78 (so the speaker embedding table was expanded, not overwritten).
text_encoder (phonetics/text) and duration_predictor (rhythm/prosody) were frozen — the language model cannot drift on the small dataset.
Low LR 2e-5 (base used 2e-4) + eval-based best-model selection.

The original 78 voices still synthesize full natural Mongolian; speaker01 is the newly learned voice. Note: 3.7 min is very little data — speaker01 is recognizable but rough; more data would sharpen it.

Files

File	Description
`best_model.pth`	Fine-tuned VITS checkpoint (79 speakers)
`config.json`	Coqui TTS config
`speakers.pth`	79-speaker name→id map (`speaker01` = 78)
`tensorboard/`	Fine-tune training curves
`ft_yourvoice_spk01.wav`	Sample: new voice (`speaker01`)
`ft_original_spk0053.wav`	Sample: an original voice (`spk_0053`), Mongolian-ability check

Usage

from huggingface_hub import hf_hub_download
from TTS.utils.synthesizer import Synthesizer

repo = "Bokhbat/mongolian-vits-myvoice"
ckpt  = hf_hub_download(repo, "best_model.pth")
cfg   = hf_hub_download(repo, "config.json")
spk   = hf_hub_download(repo, "speakers.pth")

syn = Synthesizer(ckpt, cfg, tts_speakers_file=spk, use_cuda=False)
# the new voice:
wav = syn.tts("Сайн байна уу?", speaker_name="speaker01")
syn.save_wav(wav, "myvoice.wav")
# an original Mongolian voice still works:
wav = syn.tts("Сайн байна уу?", speaker_name="spk_0053")
syn.save_wav(wav, "original.wav")

Downloads last month: 19