utkarsh2299 commited on
Commit
a777c71
1 Parent(s): 4b6db64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +100 -0
README.md CHANGED
@@ -1,3 +1,103 @@
1
  ---
2
  license: cc-by-4.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
  ---
4
+ # Fastspeech2 Model using Hybrid Segmentation (HS)
5
+
6
+ This repository contains a Fastspeech2 Model for 16 Indian languages (male and female both) implemented using the Hybrid Segmentation (HS) for speech synthesis. The model is capable of generating mel-spectrograms from text inputs and can be used to synthesize speech..
7
+
8
+ The Repo is large in size: We have used [Git LFS](https://git-lfs.com/) due to Github's size constraint (please install latest git LFS from the link, we have provided the current one below).
9
+ ```
10
+ curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.python.sh | bash
11
+ sudo apt-get install git-lfs
12
+ git lfs install
13
+ ```
14
+
15
+ Language model files are uploaded using git LFS. so please use:
16
+
17
+ ```
18
+ git lfs fetch --all
19
+ git lfs pull
20
+ ```
21
+ to get the original files in your directory.
22
+
23
+ ## Model Files
24
+
25
+ The model for each language includes the following files:
26
+
27
+ - `config.yaml`: Configuration file for the Fastspeech2 Model.
28
+ - `energy_stats.npz`: Energy statistics for normalization during synthesis.
29
+ - `feats_stats.npz`: Features statistics for normalization during synthesis.
30
+ - `feats_type`: Features type information.
31
+ - `pitch_stats.npz`: Pitch statistics for normalization during synthesis.
32
+ - `model.pth`: Pre-trained Fastspeech2 model weights.
33
+
34
+ ## Installation
35
+
36
+ 1. Install [Miniconda](https://docs.conda.io/projects/miniconda/en/latest/) first. Create a conda environment using the provided `environment.yml` file:
37
+
38
+ ```shell
39
+ conda env create -f environment.yml
40
+ ```
41
+
42
+ 2.Activate the conda environment (check inside environment.yaml file):
43
+ ```shell
44
+ conda activate tts-hs-hifigan
45
+ ```
46
+
47
+ 3. Install PyTorch separately (you can install the specific version based on your requirements):
48
+ ```shell
49
+ conda install pytorch cudatoolkit
50
+ pip install torchaudio
51
+ pip install numpy==1.23.0
52
+ ```
53
+ ## Vocoder
54
+ For generating WAV files from mel-spectrograms, you can use a vocoder of your choice. One popular option is the [HIFIGAN](https://github.com/jik876/hifi-gan) vocoder (Clone this repo and put it in the current working directory). Please refer to the documentation of the vocoder you choose for installation and usage instructions.
55
+
56
+ (**We have used the HIFIGAN vocoder and have provided Vocoder tuned on Aryan and Dravidian languages**)
57
+
58
+ ## Usage
59
+
60
+ The directory paths are Relative. ( But if needed, Make changes to **text_preprocess_for_inference.py** and **inference.py** file, Update folder/file paths wherever required.)
61
+
62
+ **Please give language/gender in small cases and sample text between quotes. Adjust output speed using the alpha parameter (higher for slow voiced output and vice versa). Output argument is optional; the provide name will be used for the output file.**
63
+
64
+ Use the inference file to synthesize speech from text inputs:
65
+ ```shell
66
+ python inference.py --sample_text "Your input text here" --language <language> --gender <gender> --alpha <alpha> --output_file <file_name.wav OR path/to/file_name.wav>
67
+ ```
68
+
69
+ **Example:**
70
+
71
+ ```
72
+ python inference.py --sample_text "श्रीलंका और पाकिस्तान में खेला जा रहा एशिया कप अब तक का सबसे विवादित टूर्नामेंट होता जा रहा है।" --language hindi --gender male --alpha 1 --output_file male_hindi_output.wav
73
+ ```
74
+ The file will be stored as `male_hindi_output.wav` and will be inside current working directory. If **--output_file** argument is not given it will be stored as `<language>_<gender>_output.wav` in the current working directory.
75
+
76
+
77
+ ### Citation
78
+ If you use this Fastspeech2 Model in your research or work, please consider citing:
79
+
80
+
81
+ COPYRIGHT
82
+ 2023, Speech Technology Consortium,
83
+
84
+ Bhashini, MeiTY and by Hema A Murthy & S Umesh,
85
+
86
+
87
+ DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
88
+ and
89
+ ELECTRICAL ENGINEERING,
90
+ IIT MADRAS. ALL RIGHTS RESERVED "
91
+
92
+
93
+
94
+ Shield: [![CC BY 4.0][cc-by-shield]][cc-by]
95
+
96
+ This work is licensed under a
97
+ [Creative Commons Attribution 4.0 International License][cc-by].
98
+
99
+ [![CC BY 4.0][cc-by-image]][cc-by]
100
+
101
+ [cc-by]: http://creativecommons.org/licenses/by/4.0/
102
+ [cc-by-image]: https://i.creativecommons.org/l/by/4.0/88x31.png
103
+ [cc-by-shield]: https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg