LearnItAnyway commited on
Commit
6856c3b
1 Parent(s): 9c617e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -0
README.md CHANGED
@@ -1,3 +1,33 @@
1
  ---
2
  license: other
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: other
3
  ---
4
+ # VALL-E Korean Model
5
+
6
+ ## Introduction
7
+
8
+ The VALL-E Korean model is an implementation of the VALL-E architecture designed for the Korean language. This model serves as a zero-shot text-to-speech synthesizer, allowing users to generate natural-sounding speech from text input in Korean. The model utilizes various components, including the espeak text phonemizer with language='ko' option and the EnCodec audio tokenizer from [Facebook Research's EnCodec repository](https://github.com/facebookresearch/encodec).
9
+
10
+ ## Model Details
11
+
12
+ - **Architecture**: The VALL-E Korean model consists of both ar (autoregressive) and nar (non-autoregressive) models.
13
+ - **Hidden Dimensions**: The model has a hidden dimension of 1024.
14
+ - **Transformer Layers**: It comprises 12 transformer layers.
15
+ - **Attention Heads**: Each layer has 16 attention heads.
16
+
17
+ ## Training Data
18
+
19
+ The training data for the VALL-E Korean model consists of approximately 2000 hours of Korean audio-text pairs. This dataset was sourced from [AI-Hub 한국인 대화음성](https://www.aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=130).
20
+
21
+ ## Example Usage
22
+
23
+ For an example of how to use the VALL-E Korean model, you can refer to the provided Google Colab notebook: [tester_colab.ipynb](https://huggingface.co/LearnItAnyway/vall-e_korean/blob/main/tester_colab.ipynb). This notebook demonstrates how to perform text-to-speech synthesis using the model. Additionally, the example incorporates the vocos decoder from [Plachtaa's VALL-E repository](https://github.com/Plachtaa/VALL-E-X).
24
+
25
+ ## References
26
+
27
+ - [Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers](https://arxiv.org/abs/2301.02111)
28
+ - [VALL-E Repository by lifeiteng](https://github.com/lifeiteng/vall-e)
29
+ - [Enhuiz's VALL-E Repository](https://github.com/enhuiz/vall-e)
30
+ - [VALL-E-X Repository by Plachtaa](https://github.com/Plachtaa/VALL-E-X)
31
+ - [Vocos](https://github.com/charactr-platform/vocos)
32
+
33
+ For more information and details on using the model, please refer to the provided references and resources.