Ro-Grad-TTS / README.md
adrianstanea's picture
Add model checkpoints and config files
21f6b3c
metadata
license: apache-2.0
language:
  - ro
tags:
  - text-to-speech
  - Grad-TTS
  - Diffusion
library_name: pytorch
datasets:
  - SWARA-1.0

Ro-Grad-TTS: Romanian Text-to-Speech

Romanian adaptation of Grad-TTS, trained on the SWARA 1.0 dataset.

Quick Start

This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at adrianstanea/Ro-Grad-TTS.

When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.

Details

  • Architecture: Grad-TTS (diffusion-based TTS)
  • Language: Romanian
  • Phonemization: Espeak-ng
  • Vocoder: HiFi-GAN (universal v1)
  • Sample rate: 22050 Hz
  • Training data: SWARA 1.0 Romanian speech corpus

Available Models

Baseline Model

Model Type Description
swara Baseline Speaker-agnostic model trained on full SWARA dataset

Fine-tuned Speaker Models

Model Speaker Training Samples Fine-tune Epochs Use Case
bas_10 BAS (Female) 10 samples 100 Few-shot learning / Low-resource
bas_950 BAS (Female) 950 samples 100 Production-ready speaker
sgs_10 SGS (Male) 10 samples 100 Few-shot learning / Low-resource
sgs_950 SGS (Male) 950 samples 100 Production-ready speaker

Vocoder: Universal HiFi-GAN vocoder

Repository Structure

adrianstanea/Ro-Grad-TTS/
β”œβ”€β”€ config.json                                      # Model hyperparameters
β”œβ”€β”€ hifigan_config.json                              # Vocoder configuration
└──── models/
    β”œβ”€β”€ swara/
    β”‚   └── grad-tts-base-1000.pt                    # Baseline model
    β”œβ”€β”€ bas/
    β”‚   └── grad-tts-bas-{10,950}_{15,50,100}.pt
    β”œβ”€β”€ sgs/
    β”‚   └── grad-tts-sgs-{10,950}_{15,50,100}.pt
    └── vocoder/
        └── hifigan_univ_v1                          # Universal HiFi-GAN

Citation

If you use this Romanian adaptation in your research, please cite:

@ARTICLE{11269795,
  author={Răgman, Teodora and Bogdan StÒnea, Adrian and Cucu, Horia and Stan, Adriana},
  journal={IEEE Access},
  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
  year={2025},
  volume={13},
  number={},
  pages={203415-203428},
  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
  doi={10.1109/ACCESS.2025.3637322}
}

Origianl Grad-TTS Citation

@article{popov2021grad,
  title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
  author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
  journal={International Conference on Machine Learning},
  year={2021}
}

References