Text-to-Speech
English
ButterCream commited on
Commit
0d5fcee
2 Parent(s): 915a9a9 0c6aaeb

Merge branch 'main' of https://huggingface.co/ShoukanLabs/Vokan

Browse files
Files changed (2) hide show
  1. Model/config.yml +1 -1
  2. README.md +6 -3
Model/config.yml CHANGED
@@ -62,7 +62,7 @@ model_params:
62
  dist:
63
  estimate_sigma_data: true
64
  mean: -3
65
- sigma_data: .nan
66
  std: 1
67
  embedding_mask_proba: 0.1
68
  transformer:
 
62
  dist:
63
  estimate_sigma_data: true
64
  mean: -3
65
+ sigma_data: .18
66
  std: 1
67
  embedding_mask_proba: 0.1
68
  transformer:
README.md CHANGED
@@ -7,6 +7,7 @@ datasets:
7
  language:
8
  - en
9
  pipeline_tag: text-to-speech
 
10
  ---
11
 
12
  <style>
@@ -61,7 +62,7 @@ pipeline_tag: text-to-speech
61
  </div>
62
 
63
  **Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
64
- base model fo further finetuning in the future!
65
  It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
66
  Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
67
  With over 6+ days worth of audio data and 672 diverse and expressive speakers,
@@ -116,11 +117,13 @@ You can read more about it on our article on [DagsHub!](https://dagshub.com/blog
116
 
117
  V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
118
  This is where you come in, if you have any large single speaker datasets you'd like to contribute,
119
- in any langauge, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of
120
  smaller single speaker datasets to create one big multispeaker one.
121
- You can upload your uberduck or [FakeYou](https://fakeyou.com/) compliant datasets via the
122
  **[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
123
  The more data we have, the better the models we produce will be!
 
 
124
  <hr>
125
 
126
  <p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations!</p>
 
7
  language:
8
  - en
9
  pipeline_tag: text-to-speech
10
+ base_model: yl4579/StyleTTS2-LibriTTS
11
  ---
12
 
13
  <style>
 
62
  </div>
63
 
64
  **Vokan** is an advanced finetuned **StyleTTS2** model crafted for authentic and expressive zero-shot performance. Designed to serve as a better
65
+ base model for further finetuning in the future!
66
  It leverages a diverse dataset and extensive training to generate high-quality synthesized speech.
67
  Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts.
68
  With over 6+ days worth of audio data and 672 diverse and expressive speakers,
 
117
 
118
  V2 is currently in the works, aiming to be bigger and better in every way! Including multilingual support!
119
  This is where you come in, if you have any large single speaker datasets you'd like to contribute,
120
+ in any language, you can contribute to our **Vokan dataset**. A large **community dataset** that combines a bunch of
121
  smaller single speaker datasets to create one big multispeaker one.
122
+ You can upload your uberduck or FakeYou compliant datasets via the
123
  **[Vokan](https://huggingface.co/ShoukanLabs/Vokan)** bot on the **[ShoukanLabs Discord Server](https://discord.gg/hdVeretude)**.
124
  The more data we have, the better the models we produce will be!
125
+
126
+ [This model is also available on DagsHub](https://dagshub.com/ShoukanLabs/Vokan)
127
  <hr>
128
 
129
  <p align="center", style="font-size: 2vw; font-weight: bold; color: #ff593e;">Citations!</p>