Text-to-Speech
coqui
Owos commited on
Commit
f99ba77
1 Parent(s): b44f80a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -86
README.md CHANGED
@@ -1,86 +1,86 @@
1
- ---
2
- license: other
3
- license_name: coqui-public-model-license
4
- license_link: https://coqui.ai/cpml
5
- library_name: coqui
6
- pipeline_tag: text-to-speech
7
- widget:
8
- - text: "Abraham said today is a good day to sound like an African"
9
- ---
10
-
11
- # Afro-TTS
12
- Afro-TTS is the first pan-African accented English speech synthesis system capable of generating speech in 86 African accents. It includes 1000 personas representing the rich phonological diversity across the continent for applications in Education, Public Health, and Automated Content Creation. Afro-TTS lets you clone voices into different African accents by using just a quick 6-second audio clip.
13
- The model was adapted from the XTTS model which was developed by [Coqui Studio](https://coqui.ai/).
14
-
15
- Read more about this model in our paper: https://arxiv.org/abs/2406.11727
16
-
17
- ### Features
18
- - Supports 86 unique African accents
19
- - Voice cloning with just a 6-second audio clip
20
- - Emotion and style transfer by cloning
21
- - Multi-accent English speech generation
22
- - 24kHz sampling rate for high-quality audio
23
-
24
- ## Performance
25
-
26
- Afro-TTS achieves near ground truth Mean Opinion Scores (MOS) for naturalness and accentedness. Objective and subjective evaluations demonstrated that the model generates natural-sounding accented speech, bridging the current gap in the representation of African voices in speech synthesis.
27
-
28
-
29
- ### Languages
30
- Afro-TTS supports only English languages for now.
31
- Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out!
32
-
33
- ### Code
34
- The code-base for the paper of this model can be found [here](https://github.com/intron-innovation/AfriSpeech-TTS)
35
-
36
-
37
- ### License
38
- This model is licensed under [Coqui Public Model License](https://coqui.ai/cpml). There's a lot that goes into a license for generative models, and you can read more of [the origin story of CPML here](https://coqui.ai/blog/tts/cpml).
39
-
40
- ### Contact
41
- Come and join in our Bioramp Community. We're active on [Masakhane Slack Server](https://join.slack.com/t/masakhane-nlp/shared_invite/zt-1zgnxx911-YWvICNas~mpeKDNqiO3r3g) and our [website](https://bioramp.org/).
42
- You can also mail the authors at sewade.ogun@inria.fr, tobi@intron.io
43
-
44
- #### Using Afro-TTS:
45
-
46
- Install the Coqui TTS package:
47
-
48
- ```bash
49
- pip install TTS
50
- ```
51
- Run the following code:
52
- ```python
53
- from scipy.io.wavfile import write
54
- from TTS.tts.configs.xtts_config import XttsConfig
55
- from TTS.tts.models.xtts import Xtts
56
-
57
- config = XttsConfig()
58
- config.load_json("intronhealth/afro-tts/config.json")
59
- model = Xtts.init_from_config(config)
60
- model.load_checkpoint(config, checkpoint_dir="intronhealth/afro-tts/", eval=True)
61
- model.cuda()
62
-
63
- outputs = model.synthesize(
64
- "Abraham said today is a good day to sound like an African.,
65
- config,
66
- speaker_wav="audios/reference_accent.wav",
67
- gpt_cond_len=3,
68
- language="en",
69
- )
70
-
71
- write("audios/output.wav", 24000, outputs['wav'])
72
-
73
-
74
- ```
75
-
76
- ### BibTeX entry and citation info.
77
- ```
78
- @misc{ogun20241000,
79
- title={1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis},
80
- author={Sewade Ogun and Abraham T. Owodunni and Tobi Olatunji and Eniola Alese and Babatunde Oladimeji and Tejumade Afonja and Kayode Olaleye and Naome A. Etori and Tosin Adewumi},
81
- year={2024},
82
- eprint={2406.11727},
83
- archivePrefix={arXiv},
84
- primaryClass={id='eess.AS' full_name='Audio and Speech Processing' is_active=True alt_name=None in_archive='eess' is_general=False description='Theory and methods for processing signals representing audio, speech, and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems. Machine learning and pattern analysis applied to any of the above areas is also welcome. Specific topics of interest include: auditory modeling and hearing aids; acoustic beamforming and source localization; classification of acoustic scenes; speaker separation; active noise control and echo cancellation; enhancement; de-reverberation; bioacoustics; music signals analysis, synthesis and modification; music information retrieval; audio for multimedia and joint audio-video processing; spoken and written language modeling, segmentation, tagging, parsing, understanding, and translation; text mining; speech production, perception, and psychoacoustics; speech analysis, synthesis, and perceptual modeling and coding; robust speech recognition; speaker recognition and characterization; deep learning, online learning, and graphical models applied to speech, audio, and language signals; and implementation aspects ranging from system architecture to fast algorithms.'}
85
- }
86
- ```
 
1
+ ---
2
+ license: other
3
+ license_name: coqui-public-model-license
4
+ license_link: https://coqui.ai/cpml
5
+ library_name: coqui
6
+ pipeline_tag: text-to-speech
7
+ widget:
8
+ - text: "Abraham said today is a good day to sound like an African"
9
+ ---
10
+
11
+ # Afro-TTS
12
+ Afro-TTS is the first pan-African accented English speech synthesis system capable of generating speech in 86 African accents. It includes 1000 personas representing the rich phonological diversity across the continent for applications in Education, Public Health, and Automated Content Creation. Afro-TTS lets you clone voices into different African accents by using just a quick 6-second audio clip.
13
+ The model was adapted from the XTTS model which was developed by [Coqui Studio](https://coqui.ai/).
14
+
15
+ Read more about this model in our paper: https://arxiv.org/abs/2406.11727
16
+
17
+ ### Features
18
+ - Supports 86 unique African accents
19
+ - Voice cloning with just a 6-second audio clip
20
+ - Emotion and style transfer by cloning
21
+ - Multi-accent English speech generation
22
+ - 24kHz sampling rate for high-quality audio
23
+
24
+ ## Performance
25
+
26
+ Afro-TTS achieves near ground truth Mean Opinion Scores (MOS) for naturalness and accentedness. Objective and subjective evaluations demonstrated that the model generates natural-sounding accented speech, bridging the current gap in the representation of African voices in speech synthesis.
27
+
28
+
29
+ ### Languages
30
+ Afro-TTS supports only English languages for now.
31
+ Stay tuned as we continue to add support for more languages. If you have any language requests, feel free to reach out!
32
+
33
+ ### Code
34
+ The code-base for the paper of this model can be found [here](https://github.com/intron-innovation/AfriSpeech-TTS)
35
+
36
+
37
+ ### License
38
+ This model is licensed under [Coqui Public Model License](https://coqui.ai/cpml). There's a lot that goes into a license for generative models, and you can read more of [the origin story of CPML here](https://coqui.ai/blog/tts/cpml).
39
+
40
+ ### Contact
41
+ Come and join in our Bioramp Community. We're active on [Masakhane Slack Server](https://join.slack.com/t/masakhane-nlp/shared_invite/zt-1zgnxx911-YWvICNas~mpeKDNqiO3r3g) and our [website](https://bioramp.org/).
42
+ You can also mail the authors at sewade.ogun@inria.fr, tobi@intron.io
43
+
44
+ #### Using Afro-TTS:
45
+
46
+ Install the Coqui TTS package:
47
+
48
+ ```bash
49
+ pip install TTS
50
+ ```
51
+ Run the following code:
52
+ ```python
53
+ from scipy.io.wavfile import write
54
+ from TTS.tts.configs.xtts_config import XttsConfig
55
+ from TTS.tts.models.xtts import Xtts
56
+
57
+ config = XttsConfig()
58
+ config.load_json("intronhealth/afro-tts/config.json")
59
+ model = Xtts.init_from_config(config)
60
+ model.load_checkpoint(config, checkpoint_dir="intronhealth/afro-tts/", eval=True)
61
+ model.cuda()
62
+
63
+ outputs = model.synthesize(
64
+ "Abraham said today is a good day to sound like an African.",
65
+ config,
66
+ speaker_wav="audios/reference_accent.wav",
67
+ gpt_cond_len=3,
68
+ language="en",
69
+ )
70
+
71
+ write("audios/output.wav", 24000, outputs['wav'])
72
+
73
+
74
+ ```
75
+
76
+ ### BibTeX entry and citation info.
77
+ ```
78
+ @misc{ogun20241000,
79
+ title={1000 African Voices: Advancing inclusive multi-speaker multi-accent speech synthesis},
80
+ author={Sewade Ogun and Abraham T. Owodunni and Tobi Olatunji and Eniola Alese and Babatunde Oladimeji and Tejumade Afonja and Kayode Olaleye and Naome A. Etori and Tosin Adewumi},
81
+ year={2024},
82
+ eprint={2406.11727},
83
+ archivePrefix={arXiv},
84
+ primaryClass={id='eess.AS' full_name='Audio and Speech Processing' is_active=True alt_name=None in_archive='eess' is_general=False description='Theory and methods for processing signals representing audio, speech, and language, and their applications. This includes analysis, synthesis, enhancement, transformation, classification and interpretation of such signals as well as the design, development, and evaluation of associated signal processing systems. Machine learning and pattern analysis applied to any of the above areas is also welcome. Specific topics of interest include: auditory modeling and hearing aids; acoustic beamforming and source localization; classification of acoustic scenes; speaker separation; active noise control and echo cancellation; enhancement; de-reverberation; bioacoustics; music signals analysis, synthesis and modification; music information retrieval; audio for multimedia and joint audio-video processing; spoken and written language modeling, segmentation, tagging, parsing, understanding, and translation; text mining; speech production, perception, and psychoacoustics; speech analysis, synthesis, and perceptual modeling and coding; robust speech recognition; speaker recognition and characterization; deep learning, online learning, and graphical models applied to speech, audio, and language signals; and implementation aspects ranging from system architecture to fast algorithms.'}
85
+ }
86
+ ```