Updated README

Files changed (6) hide show

README.md +40 -7
data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav +0 -0
data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav +0 -0
data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav +0 -0
data/6e0a4879-0379-4166-a52c-03220a3f2922.wav +0 -0
data/e21efa09-e179-42b7-982a-b686038a8f60.wav +0 -0

README.md CHANGED Viewed

@@ -17,6 +17,14 @@ pipeline_tag: text-to-speech
 library_name: transformers
 ---
 # Model Card for indri-0.1-124m-tts
 Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model (124M) in our series and supports TTS tasks in 2 languages:
@@ -24,12 +32,6 @@ Indri is a series of audio models that can do TTS, ASR, and audio continuation.
 1. English
 2. Hindi
-We have open-sourced our training scripts, inference, and other details.
-- **Repository:** [GitHub](https://github.com/cmeraki/indri)
-- **Demo:** [Website](https://www.indrivoice.ai/)
-- **Implementation details**: [Release Blog](#TODO)
 ## Model Details
 ### Model Description
@@ -37,9 +39,20 @@ We have open-sourced our training scripts, inference, and other details.
 `indri-0.1-124m-tts` is a novel, ultra-small, and lightweight TTS model based on the transformer architecture.
 It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
 ### Key features
-1. Based on GPT-2 architecture. The methodology can be extended to any transformer-based architecture.
 2. Supports voice cloning with small prompts (<5s).
 3. Code mixing text input in 2 languages - English and Hindi.
 4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
@@ -51,6 +64,10 @@ It models audio as tokens and can generate high-quality audio with consistent st
 3. Language Support: English, Hindi
 4. License: CC BY 4.0
 ## Technical details
 Here's a brief of how the model works:
@@ -63,6 +80,7 @@ Please read our blog [here](#TODO) for more technical details on how it was buil
 ## How to Get Started with the Model
 Use the code below to get started with the model. Pipelines are the best way to get started with the model.
 ```python
@@ -85,6 +103,21 @@ output = pipe(['Hi, my name is Indri and I like to talk.'])
 torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
 ```
 ## Citation
 If you use this model in your research, please cite:

 library_name: transformers
 ---
+| Platform | Link |
+|----------|------|
+| 🌎 Live Demo | [indrivoice.ai](https://indrivoice.ai/) |
+| 𝕏 Twitter | [@11mlabs](https://x.com/11mlabs) |
+| 🐱 GitHub | [Indri Repository](https://github.com/cmeraki/indri) |
+| 🤗 Hugging Face (Collection) | [Indri collection](https://huggingface.co/collections/11mlabs/indri-673dd4210b4369037c736bfe) |
+| 📝 Release Blog | [Release Blog](#) |
 # Model Card for indri-0.1-124m-tts
 Indri is a series of audio models that can do TTS, ASR, and audio continuation. This is the smallest model (124M) in our series and supports TTS tasks in 2 languages:
 1. English
 2. Hindi
 ## Model Details
 ### Model Description
 `indri-0.1-124m-tts` is a novel, ultra-small, and lightweight TTS model based on the transformer architecture.
 It models audio as tokens and can generate high-quality audio with consistent style cloning of the speaker.
+### Samples
+| Text | Sample |
+| --- | --- |
+|अतीत गौरवशाली, वर्तमान आशावादी, भविष्य उज्जवल| <audio controls src="data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav" title="Title"></audio> |
+|भाइयों और बहनों, ये हमारा सौभाग्य है कि हम सब मिलकर इस महान देश को नई ऊंचाइयों पर ले जाने का सपना देख रहे हैं।| <audio controls src="data/6e0a4879-0379-4166-a52c-03220a3f2922.wav" title="Title"></audio> |
+|Hello दोस्तों, future of speech technology mein अपका स्वागत है | <audio controls src="data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav" title="Title"></audio> |
+|Artificial Intelligence's collaborative hub: Transforming Machine Learning together| <audio controls src="data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav" title="Title"></audio> |
+|Intelligent machines processing data at lightning-fast electronic speeds| <audio controls src="data/e21efa09-e179-42b7-982a-b686038a8f60.wav" title="Title"></audio> |
 ### Key features
+1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
 2. Supports voice cloning with small prompts (<5s).
 3. Code mixing text input in 2 languages - English and Hindi.
 4. Ultra-fast. Can generate 5 seconds of audio per second on Amphere generation NVIDIA GPUs, and up to 10 seconds of audio per second on Ada generation NVIDIA GPUs.
 3. Language Support: English, Hindi
 4. License: CC BY 4.0
+### Speed
 ## Technical details
 Here's a brief of how the model works:
 ## How to Get Started with the Model
+### 🤗 pipelines
 Use the code below to get started with the model. Pipelines are the best way to get started with the model.
 ```python
 torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
 ```
+### Self hosted service
+```bash
+git clone https://github.com/cmeraki/indri.git
+cd indri
+pip install -r requirements.txt
+# Install ffmpeg (for Mac/Windows, refer here: https://www.ffmpeg.org/download.html)
+sudo apt update -y
+sudo apt upgrade -y
+sudo apt install ffmpeg -y
+python -m inference --model_path 11mlabs/indri-0.1-124m-tts --device cuda:0 --port 8000
+```
 ## Citation
 If you use this model in your research, please cite:

data/12e5a00e-834b-4c3c-a8b8-7f545ba7088c.wav ADDED Viewed

Binary file (41.7 kB). View file

data/417f5f1b-d641-4393-b922-9da9644dcd1b.wav ADDED Viewed

Binary file (39 kB). View file

data/5848b722-efe3-4e1f-a15e-5e7d431cd475.wav ADDED Viewed

Binary file (32.7 kB). View file

data/6e0a4879-0379-4166-a52c-03220a3f2922.wav ADDED Viewed

Binary file (69.2 kB). View file

data/e21efa09-e179-42b7-982a-b686038a8f60.wav ADDED Viewed

Binary file (45.5 kB). View file