Instructions to use NMikka/Magpie-TTS-Geo-357m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use NMikka/Magpie-TTS-Geo-357m with NeMo:
# tag did not correspond to a valid NeMo domain.
- Notebooks
- Google Colab
- Kaggle
Request for reproducible Magpie-TTS fine-tuning setup for new language adaptation
Hi NMikka,
Thank you for sharing the Georgian Magpie-TTS fine-tuned model. I am also trying to fine-tune NVIDIA Magpie-TTS for a new language, specifically Slovene, and your model seems to be one of the few public examples of successful language adaptation.
I checked the model card and saw that the model was fine-tuned from nvidia/magpie_tts_multilingual_357m using NeMo, with Full SFT, LR 2e-5, 37 epochs, bf16-mixed precision, and the NeMo commit:
3d73c48aca1ae3be44657267b81f25dc3201161a
Would you be willing to share the exact fine-tuning setup you used?
Specifically, it would be very helpful if you could share:
- The exact
magpietts.yaml/ Hydra config used for training - The full training command with all overrides
- Whether you modified any files in the NeMo repo
- If yes, could you share the changed files, patch, or commit diff?
- The exact dataset manifest format you used
- Whether you precomputed
target_audio_codes_pathandcontext_audio_codes_path - How you selected
context_audio_filepathandcontext_textfor each sample - Which tokenizer configuration you used for Georgian
- Whether you used
google/byt5-smallas a byte-level tokenizer or made any language-specific tokenizer changes - Whether you changed
alignment_loss_scale,prior_scaling_factor,cfg_unconditional_prob,context_duration_min/max, or any decoder settings - Whether you used
trainer.precision=32first and later switched tobf16-mixed, or trained directly with bf16-mixed - Any inference settings that helped avoid repetitions or artifacts, such as
temperature,topk,cfg_scale,max_decoder_steps, oruse_local_transformer_for_inference
I am asking because my fine-tuned model trains, but the generated audio sometimes has artifacts, repeated words, or duplicated segments. I want to understand whether the issue is coming from my data preparation, tokenizer setup, cached codec extraction, NeMo version, training config, or inference settings.
Thanks again for releasing the model. It would really help others who are trying to adapt Magpie-TTS to low-resource or unsupported languages.
Best regards,
Tauqeer