rashid0784 commited on
Commit
0625aa6
·
verified ·
1 Parent(s): 85fdbf4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -220
README.md CHANGED
@@ -1,238 +1,66 @@
1
- # Uganda Text-to-Speech (TTS) Models
2
 
3
- **🔗 [Access Models on HuggingFace](https://huggingface.co/Uganda-lang)**
4
 
5
- A comprehensive suite of Text-to-Speech (TTS) models for Ugandan languages, supporting **English, Luganda, Runyonkole, Tesso, and Acholi** with multiple speaker voices for each language.
6
 
7
  ---
8
 
9
- ## Table of Contents
10
-
11
- - [Introduction](#introduction)
12
- - [Model Architecture](#model-architecture)
13
- - [Supported Languages & Voices](#supported-languages--voices)
14
- - [Audio Examples](#audio-examples)
15
- - [Access & Usage](#access--usage)
16
- - [Limitations](#limitations)
17
- - [Future Work](#future-work)
18
- - [Citation](#citation)
19
-
20
- ---
21
-
22
- ## Introduction
23
-
24
- This project introduces a groundbreaking collection of Text-to-Speech models specifically designed for Ugandan languages. These models represent a significant advancement in African language technology, addressing the critical gap in speech synthesis capabilities for Uganda's diverse linguistic landscape.
25
-
26
- The Uganda TTS model family consists of fine-tuned versions of the **Orpheus 3B model**, each specialized for different Ugandan languages while maintaining quality speech synthesis capabilities. Each language model supports multiple distinct speaker voices, enabling versatile applications in education, accessibility, content creation, and digital preservation of Ugandan languages.
27
-
28
- These models are designed to serve researchers, developers, educators, and content creators who need high-quality speech synthesis in Ugandan languages, contributing to the broader goal of digital language preservation and accessibility in Africa.
29
-
30
- ## Model Architecture
31
-
32
- The Uganda TTS models utilize a sophisticated two-stage architecture:
33
-
34
- 1. **Audio Token Generation**: The models generate audio tokens based on the SNAC (Structured Neural Audio Codec) framework
35
- 2. **Fine-tuned Processing**: These audio tokens are then processed through specialized fine-tuned versions of the Orpheus 3B model, each optimized for specific Ugandan languages
36
-
37
- This architecture enables efficient and high-quality speech synthesis while maintaining computational efficiency suitable for various deployment scenarios. More about the Orpheous Models [here](https://canopylabs.ai/releases/orpheus_can_speak_any_language)
 
 
 
 
 
38
 
39
  ---
40
 
41
- ## Supported Languages & Voices
42
 
43
- ### English
44
- **Supported Voices**: Barbara, Mary, Jennifer, Jessica, Susan, James, Linda, Patricia, Elizabeth, Christopher
45
 
46
- ### Luganda
47
- **Supported Voices**: Charles, Sandra, Christopher, Mark, Barbara, Michelle, Karen, James, Margaret, Daniel
48
 
49
- ### Runyonkole
50
- **Supported Voices**: Michelle, James, Patricia, Mark, Elizabeth, Charles, Daniel, Barbara, Christopher, Linda
51
 
52
- ### Tesso
53
- **Supported Voices**: Michelle, Barbara, Jessica, Christopher, James, Daniel, Charles, Mark
54
 
55
- ### Acholi
56
- **Supported Voices**: James, Barbara, Michelle, Mark, Christopher
57
 
58
  ---
59
 
60
- ## Audio Examples
61
-
62
- ### English
63
-
64
- **Christopher**
65
- *Prompt*: "Hello I can speak in English as christopher, one of the voices I can speak."
66
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/christopher.wav"></audio>
67
-
68
- **Barbara**
69
- *Prompt*: "Or as barbra, this is one of my female voices. Pretty cool right?."
70
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/barbara.wav"></audio>
71
-
72
- **Mary**
73
- *Prompt*: "I can also speak as Mary as well."
74
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/mary.wav"></audio>
75
-
76
- **James**
77
- *Prompt*: "Or I can speak as james as you can see."
78
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/james.wav"></audio>
79
-
80
- **Jessica**
81
- *Prompt*: "This is my other voice called jessica, I have more voices of jennifer, suzan, linda, patricia and elizabeth. But I will be sharing these voices once I am fully done from baking."
82
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/jessica.wav"></audio>
83
-
84
- ### Luganda
85
-
86
- **Christopher**
87
- *Prompt*: "Nsobolla okwo'geranga Christopher nga wowulila kati."
88
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/christopher.wav"></audio>
89
-
90
- **Charles**
91
- *Prompt*: "Oba neenjogela nga charles wenti."
92
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/charles.wav"></audio>
93
-
94
- **Sandra**
95
- *Prompt*: "Nina neddoboozi lya Sandra bweliti."
96
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/sandra.wav"></audio>
97
-
98
- **Michelle**
99
- *Prompt*: "Nsobolla ogwogella bwenti mulino eddoboozi."
100
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/michelle.wav"></audio>
101
-
102
- **Daniel**
103
- *Prompt*: "Oba nemulino elye'kisajja nga woowulira."
104
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/daniel.wav"></audio>
105
-
106
- **Margaret**
107
- *Prompt*: "Charlissi yimilila awo."
108
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/margaret.wav"></audio>
109
-
110
- **Mark**
111
- *Prompt*: "Ninna amaloboozi amalala naye nja kugalaga nga mazze oku tureyininga."
112
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/mark.wav"></audio>
113
-
114
- ### Runyonkole
115
-
116
- **Christopher**
117
- *Prompt*: "Nimbasa kugamba nka Christopher omwiraka eri."
118
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/christopher.wav"></audio>
119
-
120
- **Michelle**
121
- *Prompt*: "Nimbasa kugamba nka Michelle omwiraka eri."
122
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/michelle.wav"></audio>
123
-
124
- **James**
125
- *Prompt*: "Uganda eteire amaani aha buhingi n'oburiisa."
126
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/james.wav"></audio>
127
-
128
- **Patricia**
129
- *Prompt*: "Bimwe ebirikugambwa aha reediyo nibihwera abantu kumanya obutare burungi bw'amasharuura gaabo."
130
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/patricia.wav"></audio>
131
-
132
- **Charles**
133
- *Prompt*: "Okukyerererwa kufuuhirira nikyo kirikutokooza ebyokurya ebitwine ebiro ebi."
134
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/charles.wav"></audio>
135
-
136
- **Elizabeth**
137
- *Prompt*: "Omu disiturikiti ya Kayunga emisiri erikukira obwngi ekashangwa erimu ebicoori ebiine oburwaire."
138
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/elizabeth.wav"></audio>
139
-
140
- ### Tesso
141
-
142
- **Christopher**
143
- *Prompt*: "Epedorete akoriok aimedaun ejok kanejaas aicoreta nu itikitikere adeka."
144
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/christopher.wav"></audio>
145
-
146
- **Jessica**
147
- *Prompt*: "Akoru ikorion luegelegela nes ingarakini itunganan."
148
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/jessica.wav"></audio>
149
-
150
- **James**
151
- *Prompt*: "Iraasit yen emunaara aticepak ikur enyamitos."
152
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/james.wav"></audio>
153
-
154
- **Daniel**
155
- *Prompt*: "Aipagisanar nes ewai ecie lo ibwaikinet iboro toma aswam."
156
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/daniel.wav"></audio>
157
-
158
- **Barbara**
159
- *Prompt*: "Isisianakinete isomeroi kwana asiomak eipone lo isubusaere."
160
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/barbara.wav"></audio>
161
-
162
- ### Acholi
163
-
164
- **Mark**
165
- *Prompt*: "Uganda tye ka keme ki lok me pur."
166
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/mark.wav"></audio>
167
-
168
- **Barbara**
169
- *Prompt*: "Lupur twero nongo kony ma dit ka gunongo ngec me gengo onyo cango two ma balo jami ma i poto."
170
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/barbara.wav"></audio>
171
-
172
- **James**
173
- *Prompt*: "Ler ma pe gidodo ma woto ka yenyo cam i dye poto obalo cam weng ma tye i poto."
174
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/james.wav"></audio>
175
-
176
- **Michelle**
177
- *Prompt*: "Gum madwong me timo biacara tye i te yub ma pe jenge i kom gamente."
178
- <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/mitchelle.wav"></audio>
179
-
180
- ---
181
-
182
- ## Access & Usage
183
-
184
- The models are openly accessible and available for research and development purposes:
185
-
186
- **🔗 [HuggingFace Model Hub](https://huggingface.co/uganda-tts-models)**
187
-
188
- All models are provided under an open-source license, encouraging collaboration and further development in African language technologies.
189
-
190
- ---
191
-
192
- ## Limitations
193
-
194
- While these models represent significant progress in Ugandan language TTS, there are some current limitations:
195
-
196
- - **Non-English Language Quality**: The non-English models may occasionally produce lower quality outputs compared to the English model. This is primarily due to the SNAC audio codec not being pre-trained on these languages, which affects the initial audio token generation quality.
197
-
198
- - **Speaker Consistency**: Non-English voices may sometimes generate speech that does not perfectly match the specified speaker characteristics due to limited training data for certain voice-language combinations.
199
-
200
- - **Language Coverage**: Current models focus on five major Ugandan languages, with plans to expand to additional languages based on data availability and community needs.
201
-
202
- **Note**: We are actively working on an improved version that addresses these limitations, including training SNAC on a more diverse set of languages and expanding the training datasets for better speaker fidelity.
203
-
204
- ---
205
-
206
- ## Future Work
207
-
208
- ### Completed ✅
209
- - [x] Train the models for each of the languages
210
- - [x] Open source the models
211
-
212
- ### In Progress 🔄
213
- - [ ] Develop a Python package to act as an API for the models
214
- - [ ] Write a comprehensive white paper detailing the training process and methodology
215
- - [ ] Improve SNAC training for better non-English language support
216
- - [ ] Expand training datasets for enhanced speaker consistency
217
-
218
- ---
219
-
220
- ## Citation
221
-
222
- If you use these models in your research or applications, please cite:
223
-
224
- ```bibtex
225
- @misc{uganda_tts_2024,
226
- author = {Kisejjere Rashid and Magala Reuben},
227
- title = {Uganda Text-to-Speech (TTS) Models},
228
- year = {2024},
229
- howpublished = {\url{https://huggingface.co/Uganda-lang}},
230
- note = {Fine-tuned versions of Orpheus 3B for Ugandan languages}
231
- }
232
- ```
233
-
234
- ---
235
 
236
- **Contributing**: We welcome contributions, feedback, and collaboration from the community. Please feel free to open issues or submit pull requests to help improve these models.
237
 
238
- **Contact**: For questions, collaborations, or support, please reach out through the HuggingFace model repository or create an issue in our GitHub repository."
 
1
+ # Uganda Lang
2
 
3
+ **Uganda Lang** is an open initiative created by passionate machine learning researchers in Uganda to share exciting models and findings from the local research community.
4
 
5
+ We aim to contribute to the growth of African language technologies by building open, high-quality models for speech, language, and other AI applications.
6
 
7
  ---
8
 
9
+ ## Our Works
10
+
11
+ ### 1. Uganda Text-to-Speech (TTS)
12
+
13
+ A collection of fine-tuned Orpheus 3B models that generate natural-sounding speech in multiple Ugandan languages including English, Luganda, Runyankole, Teso, and Acholi. These Models were built on top of open-sourced datasets from SunBird AI, Yogera and Mozilla's Common Voice Dataset.
14
+
15
+ #### 🔉 Audio Examples (With Prompts)
16
+
17
+ | Language | Voice | Prompt | Audio Sample |
18
+ |------------|-------------|------------------------------------------------------------------------------------------|--------------|
19
+ | English | Christopher | Hello I can speak in English as Christopher, one of the voices I can speak. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/christopher.wav"></audio> |
20
+ | English | Barbara | Or as Barbara, this is one of my female voices. Pretty cool, right? | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/barbara.wav"></audio> |
21
+ | English | Mary | I can also speak as Mary as well. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/mary.wav"></audio> |
22
+ | English | James | Or I can speak as James, as you can see. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/james.wav"></audio> |
23
+ | English | Jessica | This is my other voice called Jessica. I have more voices like Jennifer, Susan, Linda, Patricia, and Elizabeth, which I’ll share when they’re ready. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/English/jessica.wav"></audio> |
24
+ | Luganda | Christopher | Nsobola okwo’geranga Christopher nga wowulila kati. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/christopher.wav"></audio> |
25
+ | Luganda | Charles | Oba neenjogela nga Charles wenti. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/charles.wav"></audio> |
26
+ | Luganda | Sandra | Nina neddoboozi lya Sandra bweliti. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/sandra.wav"></audio> |
27
+ | Luganda | Michelle | Nsobola ogwogella bwenti mulino eddoboozi. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/michelle.wav"></audio> |
28
+ | Luganda | Daniel | Oba nemulino elye’kisajja nga woowulira. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Luganda/daniel.wav"></audio> |
29
+ | Runyankole | Christopher | Nimbasa kugamba nka Christopher omwiraka eri. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/christopher.wav"></audio> |
30
+ | Runyankole | Patricia | Bimwe ebirikugambwa aha reediyo nibihwera abantu kumanya obutare burungi bw’amasharuura gaabo. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/patricia.wav"></audio> |
31
+ | Runyankole | Elizabeth | Omu disiturikiti ya Kayunga emisiri erikukira obwngi ekashangwa erimu ebicoori ebiine oburwaire. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/elizabeth.wav"></audio> |
32
+ | Runyankole | Michelle | Nimbasa kugamba nka Michelle omwiraka eri. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/michelle.wav"></audio> |
33
+ | Runyankole | James | Uganda eteire amaani aha buhingi n’oburiisa. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Runyankole/james.wav"></audio> |
34
+ | Teso | Christopher | Epedorete akoriok aimedaun ejok kanejaas aicoreta nu itikitikere adeka. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/christopher.wav"></audio> |
35
+ | Teso | Jessica | Akoru ikorion luegelegela nes ingarakini itunganan. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/jessica.wav"></audio> |
36
+ | Teso | James | Iraasit yen emunaara aticepak ikur enyamitos. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/james.wav"></audio> |
37
+ | Teso | Daniel | Aipagisanar nes ewai ecie lo ibwaikinet iboro toma aswam. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/daniel.wav"></audio> |
38
+ | Teso | Barbara | Isisianakinete isomeroi kwana asiomak eipone lo isubusaere. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Teso/barbara.wav"></audio> |
39
+ | Acholi | Mark | Uganda tye ka keme ki lok me pur. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/mark.wav"></audio> |
40
+ | Acholi | Barbara | Lupur twero nongo kony ma dit ka gunongo ngec me gengo onyo cango two ma balo jami ma i poto. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/barbara.wav"></audio> |
41
+ | Acholi | Michelle | Gum madwong me timo biacara tye i te yub ma pe jenge i kom gamente. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/mitchelle.wav"></audio> |
42
+ | Acholi | James | Ler ma pe gidodo ma woto ka yenyo cam i dye poto obalo cam weng ma tye i poto. | <audio controls src="https://github.com/Uganda-lang/Ganda-TTS/raw/refs/heads/main/Example/Acholi/james.wav"></audio> |
43
 
44
  ---
45
 
46
+ ### 🛠️ How it Works
47
 
48
+ These TTS models are built using a two-stage architecture:
 
49
 
50
+ 1. **Audio Token Generation**
51
+ Uses SNAC (Structured Neural Audio Codec) to convert text into audio tokens.
52
 
53
+ 2. **TTS Model Fine-Tuning**
54
+ Fine-tuned versions of the Orpheus 3B model convert the audio tokens into realistic speech in multiple Ugandan languages.
55
 
56
+ > ⚠️ Note: Some non-English outputs may sound lower in quality due to SNAC not being pretrained on local African phonetics.
 
57
 
58
+ 📦 Code: [GitHub Repository for Uganda TTS](https://github.com/Uganda-lang/Ganda-TTS)
 
59
 
60
  ---
61
 
62
+ ### 2. Uganda Text Generation (Advanced)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
+ We are also training powerful models that can understand and generate text in low-resource Ugandan languages.
65
 
66
+ 📦 Code: [GitHub Repository for Uganda Text Generation](https://github.com/Uganda-lang/Uganda-TextGen)