Instructions to use aranemini/central-kurdish-tts with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- F5-TTS
How to use aranemini/central-kurdish-tts with F5-TTS:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
F5-TTS Central Kurdish
This repository contains three Central Kurdish (Sorani Kurdish) Text-to-Speech (TTS) models based on the F5-TTS architecture. The models were developed to provide high-quality speech synthesis for a low-resource language and support applications such as speech generation, voice cloning, speech translation, and data augmentation.
The models were developed within the TTS4All initiative at the JSALT 2025 workshop, whose objective is to facilitate the rapid development of speech synthesis systems for low-resource languages using publicly available resources.
The repository includes three single-speaker F5-TTS models trained on different datasets and speaking styles.
Available Models
| Model | Speaker | Training Data | Duration |
|---|---|---|---|
audiobook-F (model-audiobook-female.pt) |
Female | Audiobook recordings | 10h54m |
audiobook-M (model-audiobook-male.pt) |
Male | Audiobook recordings | 10h51m |
studio-M (model-studio-male.pt) |
Male | Studio-recorded speech | 13h35m |
audiobook-F (Female)
A female voice trained on audiobook recordings narrated by a native Central Kurdish speaker. This model achieved the highest overall subjective quality among the evaluated systems and provides highly natural speech synthesis suitable for general-purpose applications.
audiobook-M (Male)
A male voice trained on audiobook recordings from a different speaker and domain. The model provides natural speech synthesis while preserving the speaking style and prosody found in audiobook narration.
studio-M (Male)
A male voice trained on professionally recorded studio speech from the Giganet dataset. This model achieved the highest objective signal-quality metrics and is particularly suitable for applications requiring very clean speech generation.
Evaluation
The models were evaluated using both objective and subjective metrics on multiple Central Kurdish speech synthesis scenarios, including in-domain text, code-switching, news, expressions and idioms, and prosodic variability.
Subjective Evaluation (MOS)
A Mean Opinion Score (MOS) evaluation was conducted with 88 native speakers and 3,101 ratings.
| System | Natural Speech MOS | TTS MOS |
|---|---|---|
| audiobook-F | 4.42 | 4.08 |
| audiobook-M | 4.34 | 4.01 |
| studio-M | 4.40 | 4.06 |
The audiobook-F model achieved the highest overall subjective score among the evaluated Central Kurdish TTS systems.
Objective Evaluation
| Model | Predicted MOS | DF | Average CER |
|---|---|---|---|
| audiobook-F | ~4.36 | ~0.99 | ~3.7% |
| audiobook-M | ~4.38 | ~0.95 | ~4.7% |
| studio-M | ~4.43 | ~0.92 | ~5.6% |
All models maintain strong intelligibility and naturalness while achieving very low recognition error rates.
Downstream Speech Translation
The models were also used to generate synthetic speech for training Central Kurdish → English speech translation systems.
Using synthetic speech generated from the three voices, a Whisper Large V3 speech translation model achieved:
| Evaluation Set | BLEU |
|---|---|
| Asosoft | 27.23 |
| FLEURS | 18.50 |
These results demonstrate the usefulness of the developed TTS models beyond speech synthesis, particularly for developing speech technologies in low-resource languages.
Usage
Each model can be used with the provided infer.py script together with the corresponding prompt audio and prompt transcription files. Example prompt recordings and generated outputs are included in the repository for reproducibility and demonstration purposes.
License
This repository is released under the CC BY-NC-ND 4.0 license.
Restrictions
- Commercial use is strictly prohibited.
- The models may only be used for research, educational, and other non-commercial purposes.
- Redistribution of modified versions is not permitted under the license terms.
- Users must comply with the license conditions of the accompanying datasets and resources.
Citation
If you use these models, datasets, or any accompanying resources in your research, please cite:
@inproceedings{mohammadamini-etal-2026-central,
title = {Central Kurdish Text-to-Speech and Its Application in Speech-to-Text Translation},
author = {Mohammadamini, Mohammad and Shamsi, Meysam and Tahon, Marie},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
month = {May},
year = {2026},
pages = {664--673},
address = {Palma, Mallorca, Spain},
publisher = {European Language Resources Association (ELRA)},
editor = {Piperidis, Stelios and Bel, Núria and van den Heuvel, Henk and Ide, Nancy and Krek, Simon and Toral, Antonio},
doi = {10.63317/4hfwowidu34u}
}
@inproceedings{mohammadamini2026iwslt,
title = {LIUM Submission for IWSLT 2026 Low-Resource Speech Translation Track},
author = {Mohammad Mohammadamini and Marie Tahon},
year = {2026},
howpublished = {Proceedings of the International Conference on Spoken Language Translation (IWSLT) 2026},
}
Acknowledgements
This work was developed within the TTS4All project at the JSALT 2025 workshop and at LIUM, Le Mans University.
We thank the audiobook writers and narrators for granting permission to use their recordings and voices for research purposes. The curated datasets and models were created to support the development of speech technologies for low-resource languages.
- Downloads last month
- 103