rahulvramesh commited on
Commit
596ddfc
·
1 Parent(s): 63af446

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +115 -0
README.md CHANGED
@@ -1,3 +1,118 @@
1
  ---
2
  license: openrail++
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: openrail++
3
  ---
4
+
5
+
6
+
7
+
8
+
9
+
10
+ <p align="center">
11
+ <img src="Instagram post - 4.png" alt="Tensen Logo" width="300" height="300"/>
12
+ </p>
13
+
14
+ ---
15
+
16
+ <p align="center"><i>Democratizing access to LLMs, Multi-Modal Gen AI models for the open-source community.<br>Let's advance AI, together. </i></p>
17
+
18
+ ---
19
+
20
+
21
+ Tansen is a text-to-speech program built with the following priorities:
22
+
23
+ 1. Strong multi-voice capabilities.
24
+ 2. Highly realistic prosody and intonation.
25
+ 3. Speaking rate control
26
+
27
+ [Huggingface 🤗 Models](https://huggingface.co/budecosystem/Tansen)
28
+
29
+ <h2 align="left">🎧 Demos </h2>
30
+
31
+
32
+
33
+ ### Demos
34
+
35
+ [random_0_0.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/9a6ce191-2646-497e-bf48-003f2bf0bb8d)
36
+
37
+ [random_0_1.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/87bf5f7c-ae47-4aa4-a110-b5c9899e4446)
38
+
39
+ [random_0_2.webm](https://github.com/BudEcosystem/Tansen/assets/4546714/5549c464-c670-4e7a-987c-c5d79b32bf4b)
40
+
41
+ <h2 align="left">💻 Getting Started on GitHub </h2>
42
+
43
+ Ready to dive in? Here's how you can get started with our repo on GitHub.
44
+
45
+ <h3 align="left">1️⃣ : Clone our GitHub repository</h3>
46
+
47
+ First things first, you'll need to clone our repository. Open up your terminal, navigate to the directory where you want the repository to be cloned, and run the following command:
48
+
49
+ ```bash
50
+ conda create --name Tansen python=3.9 numba inflect
51
+ conda activate Tansen
52
+ conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia
53
+ conda install transformers=4.29.2
54
+ git clone https://github.com/BudEcosystem/Tansen.git
55
+ cd Tansen
56
+ ```
57
+
58
+ <h3 align="left">2️⃣ : Install dependencies</h3>
59
+
60
+ ```bash
61
+ python setup.py install
62
+ ```
63
+
64
+ <h3 align="left">3️⃣ : Generate Audio</h3>
65
+
66
+ ### do_tts.py
67
+
68
+ This script allows you to speak a single phrase with one or more voices.
69
+
70
+ ```shell
71
+ python do_tts.py --text "I'm going to speak this" --voice random --preset fast
72
+ ```
73
+
74
+ ### read.py
75
+
76
+ This script provides tools for reading large amounts of text.
77
+
78
+ ```shell
79
+ python Tansen/read.py --textfile <your text to be read> --voice random
80
+ ```
81
+
82
+ This will break up the textfile into sentences, and then convert them to speech one at a time. It will output a series
83
+ of spoken clips as they are generated. Once all the clips are generated, it will combine them into a single file and
84
+ output that as well.
85
+
86
+ Sometimes Tansen screws up an output. You can re-generate any bad clips by re-running `read.py` with the --regenerate
87
+ argument.
88
+
89
+ Intrested in running as as API ?
90
+
91
+ ### 🐍 Usage in Python
92
+
93
+ Tansen can be used programmatically :
94
+
95
+ ```python
96
+ reference_clips = [utils.audio.load_audio(p, 22050) for p in clips_paths]
97
+ tts = api.TextToSpeech(use_deepspeed=True, kv_cache=True, half=True)
98
+ pcm_audio = tts.tts_with_preset("your text here", voice_samples=reference_clips, preset='fast')
99
+ ```
100
+
101
+ ## Loss Curves
102
+
103
+ <p align="center">
104
+ <img src="results/images/loss_mel_ce.png" alt="" />
105
+ <span>loss_mel_ce</span>
106
+ <p>
107
+
108
+ <p align="center">
109
+ <img src="results/images/loss_text_ce.png" alt="" />
110
+ <span>loss_text_ce</span>
111
+ <p>
112
+
113
+
114
+ ## Training Information
115
+
116
+ Device : A Single A100
117
+
118
+ Dataset : 876 hours