romitjain
commited on
Commit
•
d3e8b89
1
Parent(s):
0abccce
Updated README
Browse files
README.md
CHANGED
@@ -17,6 +17,14 @@ pipeline_tag: text-to-speech
|
|
17 |
library_name: transformers
|
18 |
---
|
19 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
# Model Card for indri-0.1-124m-tts
|
21 |
|
22 |
Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model (124M) in our series and supports TTS tasks in 2 languages:
|
@@ -24,12 +32,6 @@ Indri is a series of audio models that can do TTS, ASR, and audio continuation.
|
|
24 |
1. English
|
25 |
2. Hindi
|
26 |
|
27 |
-
We have open-sourced our training scripts, inference, and other details.
|
28 |
-
|
29 |
-
- **Repository:** [GitHub](https://github.com/cmeraki/indri)
|
30 |
-
- **Demo:** [Website](https://www.indrivoice.ai/)
|
31 |
-
- **Implementation details**: [Release Blog](#TODO)
|
32 |
-
|
33 |
## Model Details
|
34 |
|
35 |
### Model Description
|
@@ -37,9 +39,20 @@ We have open-sourced our training scripts, inference, and other details.
|
|
37 |
`indri-0.1-124m-tts` is a novel, ultra-small, and lightweight TTS model based on the transformer architecture.
|
38 |
It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
|
39 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
### Key features
|
41 |
|
42 |
-
1.
|
43 |
2. Supports voice cloning with small prompts (<5s).
|
44 |
3. Code mixing text input in 2 languages - English and Hindi.
|
45 |
4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
|
@@ -51,6 +64,10 @@ It models audio as tokens and can generate high-quality audio with consistent st
|
|
51 |
3. Language Support: English, Hindi
|
52 |
4. License: CC BY 4.0
|
53 |
|
|
|
|
|
|
|
|
|
54 |
## Technical details
|
55 |
|
56 |
Here's a brief of how the model works:
|
@@ -63,6 +80,7 @@ Please read our blog [here](#TODO) for more technical details on how it was buil
|
|
63 |
|
64 |
## How to Get Started with the Model
|
65 |
|
|
|
66 |
Use the code below to get started with the model. Pipelines are the best way to get started with the model.
|
67 |
|
68 |
```python
|
@@ -85,6 +103,21 @@ output = pipe(['Hi, my name is Indri and I like to talk.'])
|
|
85 |
torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
|
86 |
```
|
87 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
88 |
## Citation
|
89 |
|
90 |
If you use this model in your research, please cite:
|
|
|
17 |
library_name: transformers
|
18 |
---
|
19 |
|
20 |
+
| Platform | Link |
|
21 |
+
|----------|------|
|
22 |
+
| 🌎 Live Demo | [indrivoice.ai](https://indrivoice.ai/) |
|
23 |
+
| 𝕏 Twitter | [@11mlabs](https://x.com/11mlabs) |
|
24 |
+
| 🐱 GitHub | [Indri Repository](https://github.com/cmeraki/indri) |
|
25 |
+
| 🤗 Hugging Face (Collection) | [Indri collection](https://huggingface.co/collections/11mlabs/indri-673dd4210b4369037c736bfe) |
|
26 |
+
| 📝 Release Blog | [Release Blog](#) |
|
27 |
+
|
28 |
# Model Card for indri-0.1-124m-tts
|
29 |
|
30 |
Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model (124M) in our series and supports TTS tasks in 2 languages:
|
|
|
32 |
1. English
|
33 |
2. Hindi
|
34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
35 |
## Model Details
|
36 |
|
37 |
### Model Description
|
|
|
39 |
`indri-0.1-124m-tts` is a novel, ultra-small, and lightweight TTS model based on the transformer architecture.
|
40 |
It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
|
41 |
|
42 |
+
### Samples
|
43 |
+
|
44 |
+
| Text | Sample |
|
45 |
+
| --- | --- |
|
46 |
+
|अतीत गौरवशाली, वर्तमान आशावादी, भविष्य उज्जवल| <audio controls src="data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav" title="Title"></audio> |
|
47 |
+
|भाइयों और बहनों, ये हमारा सौभाग्य है कि हम सब मिलकर इस महान देश को नई ऊंचाइयों पर ले जाने का सपना देख रहे हैं।| <audio controls src="data/6e0a4879-0379-4166-a52c-03220a3f2922.wav" title="Title"></audio> |
|
48 |
+
|Hello दोस्तों, future of speech technology mein अपका स्वागत है | <audio controls src="data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav" title="Title"></audio> |
|
49 |
+
|Artificial Intelligence's collaborative hub: Transforming Machine Learning together| <audio controls src="data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav" title="Title"></audio> |
|
50 |
+
|Intelligent machines processing data at lightning-fast electronic speeds| <audio controls src="data/e21efa09-e179-42b7-982a-b686038a8f60.wav" title="Title"></audio> |
|
51 |
+
|
52 |
+
|
53 |
### Key features
|
54 |
|
55 |
+
1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
|
56 |
2. Supports voice cloning with small prompts (<5s).
|
57 |
3. Code mixing text input in 2 languages - English and Hindi.
|
58 |
4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
|
|
|
64 |
3. Language Support: English, Hindi
|
65 |
4. License: CC BY 4.0
|
66 |
|
67 |
+
### Speed
|
68 |
+
|
69 |
+
|
70 |
+
|
71 |
## Technical details
|
72 |
|
73 |
Here's a brief of how the model works:
|
|
|
80 |
|
81 |
## How to Get Started with the Model
|
82 |
|
83 |
+
### 🤗 pipelines
|
84 |
Use the code below to get started with the model. Pipelines are the best way to get started with the model.
|
85 |
|
86 |
```python
|
|
|
103 |
torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
|
104 |
```
|
105 |
|
106 |
+
### Self hosted service
|
107 |
+
|
108 |
+
```bash
|
109 |
+
git clone https://github.com/cmeraki/indri.git
|
110 |
+
cd indri
|
111 |
+
pip install -r requirements.txt
|
112 |
+
|
113 |
+
# Install ffmpeg (for Mac/Windows, refer here: https://www.ffmpeg.org/download.html)
|
114 |
+
sudo apt update -y
|
115 |
+
sudo apt upgrade -y
|
116 |
+
sudo apt install ffmpeg -y
|
117 |
+
|
118 |
+
python -m inference --model_path 11mlabs/indri-0.1-124m-tts --device cuda:0 --port 8000
|
119 |
+
```
|
120 |
+
|
121 |
## Citation
|
122 |
|
123 |
If you use this model in your research, please cite:
|
data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav
ADDED
Binary file (41.7 kB). View file
|
|
data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav
ADDED
Binary file (39 kB). View file
|
|
data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav
ADDED
Binary file (32.7 kB). View file
|
|
data/6e0a4879-0379-4166-a52c-03220a3f2922.wav
ADDED
Binary file (69.2 kB). View file
|
|
data/e21efa09-e179-42b7-982a-b686038a8f60.wav
ADDED
Binary file (45.5 kB). View file
|
|