|
--- |
|
license: apache-2.0 |
|
--- |
|
# BreezyVoice |
|
|
|
[Playground](https://www.kaggle.com/code/a24998667/breezyvoice-playground); [GitHub](https://github.com/Splend1d/BreezyVoice); [Paper](https://arxiv.org/abs/2501.17790) |
|
|
|
**BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation -- Challenges and Insights** |
|
|
|
BreezyVoice is a voice-cloning text-to-speech system specifically adapted for Taiwanese Mandarin, highlighting phonetic control abilities via auxiliary 注音 (bopomofo) inputs. BreezyVoice is partially derived from [CosyVoice](https://github.com/FunAudioLLM/CosyVoice) |
|
|
|
|
|
## How to Run |
|
|
|
**Running from the GitHub instruction automatically downloads the model for you** |
|
|
|
You can also run the model from a specified local path by cloning the model |
|
``` |
|
git lfs install |
|
git clone https://huggingface.co/MediaTek-Research/BreezyVoice-300M |
|
``` |
|
then, you can use the model as specified in the run_inference.py script, providing the local model path using the model_path parameter. |
|
|
|
If you like our work, please cite: |
|
|
|
``` |
|
@article{hsu2025breezyvoice, |
|
title={BreezyVoice: Adapting TTS for Taiwanese Mandarin with Enhanced Polyphone Disambiguation--Challenges and Insights}, |
|
author={Hsu, Chan-Jan and Lin, Yi-Cheng and Lin, Chia-Chun and Chen, Wei-Chih and Chung, Ho Lam and Li, Chen-An and Chen, Yi-Chang and Yu, Chien-Yu and Lee, Ming-Ji and Chen, Chien-Cheng and others}, |
|
journal={arXiv preprint arXiv:2501.17790}, |
|
year={2025} |
|
} |
|
``` |