metadata
license: apache-2.0
language:
- ro
tags:
- text-to-speech
- Grad-TTS
- Diffusion
library_name: pytorch
datasets:
- SWARA-1.0
Ro-Grad-TTS: Romanian Text-to-Speech
Romanian adaptation of Grad-TTS, trained on the SWARA 1.0 dataset.
Quick Start
This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at adrianstanea/Ro-Grad-TTS.
When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.
Details
- Architecture: Grad-TTS (diffusion-based TTS)
- Language: Romanian
- Phonemization: Espeak-ng
- Vocoder: HiFi-GAN (universal v1)
- Sample rate: 22050 Hz
- Training data: SWARA 1.0 Romanian speech corpus
Available Models
Baseline Model
| Model | Type | Description |
|---|---|---|
| swara | Baseline | Speaker-agnostic model trained on full SWARA dataset |
Fine-tuned Speaker Models
| Model | Speaker | Training Samples | Fine-tune Epochs | Use Case |
|---|---|---|---|---|
| bas_10 | BAS (Female) | 10 samples | 100 | Few-shot learning / Low-resource |
| bas_950 | BAS (Female) | 950 samples | 100 | Production-ready speaker |
| sgs_10 | SGS (Male) | 10 samples | 100 | Few-shot learning / Low-resource |
| sgs_950 | SGS (Male) | 950 samples | 100 | Production-ready speaker |
Vocoder: Universal HiFi-GAN vocoder
Repository Structure
adrianstanea/Ro-Grad-TTS/
βββ config.json # Model hyperparameters
βββ hifigan_config.json # Vocoder configuration
βββββ models/
βββ swara/
β βββ grad-tts-base-1000.pt # Baseline model
βββ bas/
β βββ grad-tts-bas-{10,950}_{15,50,100}.pt
βββ sgs/
β βββ grad-tts-sgs-{10,950}_{15,50,100}.pt
βββ vocoder/
βββ hifigan_univ_v1 # Universal HiFi-GAN
Citation
If you use this Romanian adaptation in your research, please cite:
@ARTICLE{11269795,
author={RΔgman, Teodora and Bogdan StΓ’nea, Adrian and Cucu, Horia and Stan, Adriana},
journal={IEEE Access},
title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
year={2025},
volume={13},
number={},
pages={203415-203428},
keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
doi={10.1109/ACCESS.2025.3637322}
}
Origianl Grad-TTS Citation
@article{popov2021grad,
title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
journal={International Conference on Machine Learning},
year={2021}
}
References
- adrianstanea/Ro-Grad-TTS - Training, documentation, and research details
- huawei-noah/Speech-Backbones - Base architecture and paper