File size: 3,587 Bytes
edaade8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
<div align="center">

<h1>MahaTTS: An Open-Source Large Speech Generation Model in the making</h1>
a Dubverse Black initiative <br> <br>

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1-eOQqznKWwAfMdusJ_LDtDhjIyAlSMrG?usp=sharing)

</div>

------

## Description
MahaTTS (Maha means 'Great' in sanskrit), is a speech generation model which is inspired from tortoise-tts, except it uses seamless M4t wav2vec2 to extract semantic tokens.
Since seamless M4t wav2vec2 is trained on multilingual data, it makes this model easier to scale on multilingual data.

<img width="993" alt="Screenshot 2023-11-19 at 11 53 52 PM" src="https://github.com/dubverse-ai/MahaTTS/assets/32906806/7429d3b6-3f19-4bd8-9005-ff9e16a698f8">


## Features
1. Multilinguality
2. Realistic Prosody and intonation
3. Multi-voice capabilities

## Current Progress
Trained on 200 hours of LibriTTS model -> 'Smolie' 

## Installation
```bash
pip install git+https://github.com/dubverse-ai/MahaTTS.git
```

```bash
pip install maha-tts
```
## Roadmap
- [x] Smolie - eng
- [ ] Smolie - indic
- [ ] Optimizations for inference

## Some Generated Samples
text:
0 -> "I seriously laughed so much hahahaha (seals with headphones...) and appreciate both the interviewer and the subject. Major respect for two extraordinary humans - and in this time of gratefulness, I'm thankful for you both and this forum!"

1 -> "I freakin love how Elon came to life the moment they started talking about gaming and specifically diablo, you can tell that he didn't want that part of the discussion to end, while Lex to move on to the next subject! Once a true gamer, always a true gamer!"

2 -> "hello there! how are you?" (This one didn't work well, M1 model hallucinated)

3 -> "Who doesn't love a good scary story, something to send a chill across your skin in the middle of summer's heat or really, any other time? And this year, we're celebrating the two hundredth birthday of one of the most famous scary stories of all time: Frankenstein."



https://github.com/dubverse-ai/MahaTTS/assets/32906806/66fc7a08-3e8a-4d63-a3fa-88bc705a172a



https://github.com/dubverse-ai/MahaTTS/assets/32906806/5acf5a4b-aeb8-4f14-94fe-45811868a886



https://github.com/dubverse-ai/MahaTTS/assets/32906806/0af2ce6e-4172-4aac-9322-4fd545f1d4ac



https://github.com/dubverse-ai/MahaTTS/assets/32906806/2d5b0335-d1fc-473a-aea8-c5bb6afbce27



https://github.com/dubverse-ai/MahaTTS/assets/32906806/a63ba39f-a261-4fe6-8d06-a172a993acc1



https://github.com/dubverse-ai/MahaTTS/assets/32906806/4355f633-9b27-4290-a284-96d650f5f4b8



https://github.com/dubverse-ai/MahaTTS/assets/32906806/7c93d81e-02bc-4819-a97b-d48e39ec5689



https://github.com/dubverse-ai/MahaTTS/assets/32906806/63456535-0b38-429a-a8a0-686cfb6a92c5



https://github.com/dubverse-ai/MahaTTS/assets/32906806/960aa78c-888f-4f0b-a380-145a87f65a99



https://github.com/dubverse-ai/MahaTTS/assets/32906806/5027f0eb-3601-468b-9dda-6b436b774741



https://github.com/dubverse-ai/MahaTTS/assets/32906806/266285e0-a8f3-4784-81dc-f98b0a9c9373



https://github.com/dubverse-ai/MahaTTS/assets/32906806/68ba18d6-430b-41e7-84e5-e15990064836



https://github.com/dubverse-ai/MahaTTS/assets/32906806/0f7321a7-efb1-407c-8b8c-69e812865739



https://github.com/dubverse-ai/MahaTTS/assets/32906806/dcedffe6-d81b-4eff-95c0-cbd00279fdb7



https://github.com/dubverse-ai/MahaTTS/assets/32906806/8050db3e-7acb-44be-a039-7e0b9e6a9905



https://github.com/dubverse-ai/MahaTTS/assets/32906806/6486af1c-2e14-420b-8419-bf5e01fe49a5