Add model checkpoints and config files

21f6b3c 7 days ago

4.07 kB

license: apache-2.0
language:
  - ro
tags:
  - text-to-speech
  - Grad-TTS
  - Diffusion
library_name: pytorch
datasets:
  - SWARA-1.0

Ro-Grad-TTS: Romanian Text-to-Speech

Romanian adaptation of Grad-TTS, trained on the SWARA 1.0 dataset.

Quick Start

This repository only contains the pretrained model weights for Romanian Grad-TTS. The actual package for Romanian TTS inference, including installation and usage instructions, is hosted on GitHub at adrianstanea/Ro-Grad-TTS.

When using the Romanian Grad-TTS package, the weights from this repository will be automatically downloaded as needed. To install and run Romanian TTS inference, please follow the instructions in the main repository linked above.

Details

Architecture: Grad-TTS (diffusion-based TTS)
Language: Romanian
Phonemization: Espeak-ng
Vocoder: HiFi-GAN (universal v1)
Sample rate: 22050 Hz
Training data: SWARA 1.0 Romanian speech corpus

Available Models

Baseline Model

Model	Type	Description
swara	Baseline	Speaker-agnostic model trained on full SWARA dataset

Fine-tuned Speaker Models

Model	Speaker	Training Samples	Fine-tune Epochs	Use Case
bas_10	BAS (Female)	10 samples	100	Few-shot learning / Low-resource
bas_950	BAS (Female)	950 samples	100	Production-ready speaker
sgs_10	SGS (Male)	10 samples	100	Few-shot learning / Low-resource
sgs_950	SGS (Male)	950 samples	100	Production-ready speaker

Vocoder: Universal HiFi-GAN vocoder

Repository Structure

adrianstanea/Ro-Grad-TTS/
├── config.json                                      # Model hyperparameters
├── hifigan_config.json                              # Vocoder configuration
└──── models/
    ├── swara/
    │   └── grad-tts-base-1000.pt                    # Baseline model
    ├── bas/
    │   └── grad-tts-bas-{10,950}_{15,50,100}.pt
    ├── sgs/
    │   └── grad-tts-sgs-{10,950}_{15,50,100}.pt
    └── vocoder/
        └── hifigan_univ_v1                          # Universal HiFi-GAN

Citation

If you use this Romanian adaptation in your research, please cite:

@ARTICLE{11269795,
  author={Răgman, Teodora and Bogdan Stânea, Adrian and Cucu, Horia and Stan, Adriana},
  journal={IEEE Access},
  title={How Open Is Open TTS? A Practical Evaluation of Open Source TTS Tools},
  year={2025},
  volume={13},
  number={},
  pages={203415-203428},
  keywords={Computer architecture;Training;Text to speech;Spectrogram;Decoding;Computational modeling;Codecs;Predictive models;Acoustics;Low latency communication;Speech synthesis;open tools;evaluation;computational requirements;TTS adaptation;text-to-speech;objective measures;listening test;Romanian},
  doi={10.1109/ACCESS.2025.3637322}
}

Origianl Grad-TTS Citation

@article{popov2021grad,
  title={Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech},
  author={Popov, Vadim and Vovk, Ivan and Gogoryan, Vladimir and Sadekova, Tasnima and Kudinov, Mikhail},
  journal={International Conference on Machine Learning},
  year={2021}
}

References

adrianstanea/Ro-Grad-TTS - Training, documentation, and research details
huawei-noah/Speech-Backbones - Base architecture and paper