cmeraki commited on
Commit
c1e068f
โ€ข
1 Parent(s): 557ab08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -52,10 +52,10 @@ It models audio as tokens and can generate high-quality audio with consistent st
52
  ### Key features
53
 
54
  1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
55
- 2. Ultra-fast. Using our [self hosted service option](#self-hosted-service), the model can achieve speeds up to 400 toks/s (4s of audio generation per s) and under 20ms time to first token on RTX6000Ada NVIDIA GPU.
56
- 1. On RTX6000Ada, it can support a batch size of 1k with full context length of 1024 tokens
57
- 3. Supports voice cloning with small prompts (<5s).
58
- 4. Code mixing text input in 2 languages - English and Hindi.
59
 
60
  ### Details
61
 
@@ -94,11 +94,30 @@ pipe = pipeline(
94
  trust_remote_code=True
95
  )
96
 
97
- output = pipe(['Hi, my name is Indri and I like to talk.'])
98
 
99
  torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
100
  ```
101
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
102
  ### Self hosted service
103
 
104
  ```bash
 
52
  ### Key features
53
 
54
  1. Extremely small, based on GPT-2 small architecture. The methodology can be extended to any autoregressive transformer-based architecture.
55
+ 2. Ultra-fast. Using our [self hosted service option](#self-hosted-service), on RTX6000Ada NVIDIA GPU the model can achieve speeds up to 400 toks/s (4s of audio generation per s) and under 20ms time to first token.
56
+ 3. On RTX6000Ada, it can support a batch size of 1k with full context length of 1024 tokens
57
+ 4. Supports voice cloning with small prompts (<5s).
58
+ 5. Code mixing text input in 2 languages - English and Hindi.
59
 
60
  ### Details
61
 
 
94
  trust_remote_code=True
95
  )
96
 
97
+ output = pipe(['Hi, my name is Indri and I like to talk.'], speaker = '[spkr_63]')
98
 
99
  torchaudio.save('output.wav', output[0]['audio'][0], sample_rate=24000)
100
  ```
101
 
102
+ **Available speakers**
103
+
104
+ |Speaker ID|Speaker name|
105
+ |---|---|
106
+ |`[spkr_63]`|๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‘จ book reader|
107
+ |`[spkr_67]`|๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‘จ influencer|
108
+ |`[spkr_68]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ book reader|
109
+ |`[spkr_69]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ book reader|
110
+ |`[spkr_70]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ motivational speaker|
111
+ |`[spkr_62]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ book reader heavy|
112
+ |`[spkr_53]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘ฉ recipe reciter|
113
+ |`[spkr_60]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘ฉ book reader|
114
+ |`[spkr_74]`|๐Ÿ‡บ๐Ÿ‡ธ ๐Ÿ‘จ book reader|
115
+ |`[spkr_75]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ entrepreneur|
116
+ |`[spkr_76]`|๐Ÿ‡ฌ๐Ÿ‡ง ๐Ÿ‘จ nature lover|
117
+ |`[spkr_77]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ influencer|
118
+ |`[spkr_66]`|๐Ÿ‡ฎ๐Ÿ‡ณ ๐Ÿ‘จ politician|
119
+
120
+
121
  ### Self hosted service
122
 
123
  ```bash