sriting commited on
Commit
9ff5e53
·
1 Parent(s): a707aa1

feat: update report link

Browse files
Files changed (2) hide show
  1. index.html +73 -38
  2. style.css +1 -0
index.html CHANGED
@@ -28,7 +28,7 @@
28
  Encoder</h4>
29
  <p class="author">
30
  MiniMax Team <span class="date">May 2025</span><br />
31
- <a style="font-size: 1.1rem;"
32
  href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
33
  Report]</a>
34
  </p>
@@ -58,7 +58,9 @@
58
  via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
59
  voice
60
  cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
61
- <a href="https://github.com/MiniMax-AI">https://minimax-ai.github.io/tts_tech_report</a> for more examples.
 
 
62
  </p>
63
  </div>
64
 
@@ -73,7 +75,6 @@
73
  <ol>
74
  <li><a href="#showcase-with-high-versatility">Showcase with High Versatility</a></li>
75
  <li><a href="#showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts</a></li>
76
- <li><a href="#examples-with-more-possibilities">Examples with More Possibilities</a></li>
77
  </ol>
78
  </li>
79
  <li><a href="#zero-shot-vs-one-shot-demonstrations">Zero-Shot vs. One-Shot Demonstrations</a></li>
@@ -158,41 +159,45 @@
158
  <audio class="audio-md" src="assets/audios/Warm%20and%20Magnetic.mp3" controls></audio>
159
  </td>
160
  </tr>
161
- </tbody>
162
- </table>
163
- </div>
164
-
165
- <h3 id="showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts, Post-Processing
166
- Audio Effects and Added Sound Effects</h3>
167
- <div class="scroll-wrapper">
168
- <table style="width: 100%;">
169
- <tbody>
170
  <tr class="border-bottom-thin">
171
- <th scope="col" style="width: 50%;">Description</th>
172
- <th scope="col" style="width: 50%; text-align: center;">Generated Audio</th>
 
 
 
 
 
 
 
173
  </tr>
174
  <tr class="border-bottom-thin">
175
  <td>
176
- A Husky Male Voice: From Soft Murmur to Excitement to Anger, then to Whispers
177
  </td>
178
  <td>
179
- <audio class="audio-lg" src="assets/audios/Murmur-Excitement-Anger-%20Whispers.MP3" controls></audio>
 
 
 
180
  </td>
181
  </tr>
182
  <tr class="border-bottom-thin">
183
  <td>
184
- An Angry Female Voice: From Soft Murmur to Rage to Reminiscence, then to Weeping
185
  </td>
186
  <td>
187
- <audio class="audio-lg" src="assets/audios/Neutral-Rage-Reminiscence-Weeping.MP3" controls></audio>
 
 
 
188
  </td>
189
  </tr>
190
  </tbody>
191
  </table>
192
  </div>
193
 
194
- <h3 id="examples-with-more-possibilities">Examples with More Possibilities, Audio Effects and Sound Effects are
195
- Generated</h3>
196
  <div class="scroll-wrapper">
197
  <table style="width: 100%;">
198
  <tbody>
@@ -202,26 +207,18 @@
202
  </tr>
203
  <tr class="border-bottom-thin">
204
  <td>
205
- An ASMR Whispering Voice with Generated Breathing and Sound Effects
206
- </td>
207
- <td>
208
- <audio class="audio-lg" src="assets/audios/Breathy%20ASMR.MP3" controls></audio>
209
- </td>
210
- </tr>
211
- <tr class="border-bottom-thin">
212
- <td>
213
- A Robotic Voice with Rich Bass Resonance and Spatial Presence
214
  </td>
215
  <td>
216
- <audio class="audio-lg" src="assets/audios/Lucky%20Robot.mp3" controls></audio>
217
  </td>
218
  </tr>
219
  <tr class="border-bottom-thin">
220
  <td>
221
- A Sardonic Mature Female Voice
222
  </td>
223
  <td>
224
- <audio class="audio-lg" src="assets/audios/Onee-san.wav" controls></audio>
225
  </td>
226
  </tr>
227
  </tbody>
@@ -885,8 +882,8 @@
885
  </tr>
886
  <tr class="border-bottom-thin">
887
  <td>
888
- 男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,语速偏慢,<br>
889
- 音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,<br>
890
  在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
891
  </td>
892
  <td>
@@ -901,9 +898,9 @@
901
  </tr>
902
  <tr class="border-bottom-thin">
903
  <td>
904
- 说中文的女青年,音色偏甜美,语速比较快,说话时带着一种轻快的感觉,<br>
905
- 整体音调较高,像是在直播带货,整体氛围比较活跃,<br>
906
- 声音清晰,听起来很有亲和力。
907
  </td>
908
  <td>
909
  亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
@@ -929,7 +926,7 @@
929
  <audio class="audio-md" src="assets/audios/体育解说男青年.wav" controls></audio>
930
  </td>
931
  </tr>
932
- <tr>
933
  <td>
934
  中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
935
  像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
@@ -944,6 +941,44 @@
944
  <audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
945
  </td>
946
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
947
  </tbody>
948
  </table>
949
  </div>
 
28
  Encoder</h4>
29
  <p class="author">
30
  MiniMax Team <span class="date">May 2025</span><br />
31
+ <a style="font-size: 1.1rem;" target="_blank"
32
  href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report/blob/main/MiniMax_Speech.pdf">[Tech
33
  Report]</a>
34
  </p>
 
58
  via LoRA; text to voice (T2V) by synthesizing timbre features directly from text description; and professional
59
  voice
60
  cloning (PVC) by fine-tuning timbre features with additional data. We encourage readers to visit
61
+ <a
62
+ href="https://huggingface.co/spaces/MiniMaxAI/MiniMax-Speech-Tech-Report">https://minimax-ai.github.io/tts_tech_report</a>
63
+ for more examples.
64
  </p>
65
  </div>
66
 
 
75
  <ol>
76
  <li><a href="#showcase-with-high-versatility">Showcase with High Versatility</a></li>
77
  <li><a href="#showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts</a></li>
 
78
  </ol>
79
  </li>
80
  <li><a href="#zero-shot-vs-one-shot-demonstrations">Zero-Shot vs. One-Shot Demonstrations</a></li>
 
159
  <audio class="audio-md" src="assets/audios/Warm%20and%20Magnetic.mp3" controls></audio>
160
  </td>
161
  </tr>
 
 
 
 
 
 
 
 
 
162
  <tr class="border-bottom-thin">
163
+ <td>
164
+ An ASMR Whispering Voice with Generated Breathing and Sound Effects
165
+ </td>
166
+ <td>
167
+ <audio class="audio-md" src="assets/audios/Breathy%20ASMR_Sourse.wav" controls></audio>
168
+ </td>
169
+ <td>
170
+ <audio class="audio-md" src="assets/audios/Breathy%20ASMR.MP3" controls></audio>
171
+ </td>
172
  </tr>
173
  <tr class="border-bottom-thin">
174
  <td>
175
+ A Robotic Voice with Rich Bass Resonance and Spatial Presence
176
  </td>
177
  <td>
178
+ <audio class="audio-md" src="assets/audios/Lucky%20Robot_Sourse.wav" controls></audio>
179
+ </td>
180
+ <td>
181
+ <audio class="audio-md" src="assets/audios/Lucky%20Robot.mp3" controls></audio>
182
  </td>
183
  </tr>
184
  <tr class="border-bottom-thin">
185
  <td>
186
+ A Sardonic Mature Female Voice
187
  </td>
188
  <td>
189
+ <audio class="audio-md" src="assets/audios/Onee-san_Sourse.wav" controls></audio>
190
+ </td>
191
+ <td>
192
+ <audio class="audio-md" src="assets/audios/Onee-san.wav" controls></audio>
193
  </td>
194
  </tr>
195
  </tbody>
196
  </table>
197
  </div>
198
 
199
+ <h3 id="showcase-with-multiple-generation-attempts">Showcase with Multiple Generation Attempts, Post-Processing
200
+ Audio Effects and Added Sound Effects</h3>
201
  <div class="scroll-wrapper">
202
  <table style="width: 100%;">
203
  <tbody>
 
207
  </tr>
208
  <tr class="border-bottom-thin">
209
  <td>
210
+ A Husky Male Voice: From Soft Murmur to Excitement to Anger, then to Whispers
 
 
 
 
 
 
 
 
211
  </td>
212
  <td>
213
+ <audio class="audio-lg" src="assets/audios/Murmur-Excitement-Anger-%20Whispers.MP3" controls></audio>
214
  </td>
215
  </tr>
216
  <tr class="border-bottom-thin">
217
  <td>
218
+ An Angry Female Voice: From Soft Murmur to Rage to Reminiscence, then to Weeping
219
  </td>
220
  <td>
221
+ <audio class="audio-lg" src="assets/audios/Neutral-Rage-Reminiscence-Weeping.MP3" controls></audio>
222
  </td>
223
  </tr>
224
  </tbody>
 
882
  </tr>
883
  <tr class="border-bottom-thin">
884
  <td>
885
+ 男性中年声音,说中文,音色浑厚醇厚,带有自然的磁性,<br>
886
+ 语速偏慢,音量适中,音调偏低沉。声音整体给人沉稳可靠的感觉,<br>
887
  在深度访谈场景中表现出专业性和亲和力,音质清晰,吐字规整有力。
888
  </td>
889
  <td>
 
898
  </tr>
899
  <tr class="border-bottom-thin">
900
  <td>
901
+ 说中文的女青年,音色偏甜美,语速比较快,<br>
902
+ 说话时带着一种轻快的感觉,整体音调较高,像是在直播带货,<br>
903
+ 整体氛围比较活跃,声音清晰,听起来很有亲和力。
904
  </td>
905
  <td>
906
  亲爱的宝宝们,等了好久的神仙面霜终于到货啦!<br>
 
926
  <audio class="audio-md" src="assets/audios/体育解说男青年.wav" controls></audio>
927
  </td>
928
  </tr>
929
+ <tr class="border-bottom-thin">
930
  <td>
931
  中国女青年的声音,音色清脆,说话速度偏快,语调活泼,<br>
932
  像是在做游戏直播,声音中带着愉快的感觉,整体音调较高,<br>
 
941
  <audio class="audio-md" src="assets/audios/游戏主播女青年.wav" controls></audio>
942
  </td>
943
  </tr>
944
+ <tr class="border-bottom-thin">
945
+ <td>
946
+ English-speaking female voice, sounding relatively young,<br>
947
+ with a sweet and pleasant tone. Speaking at a moderate pace<br>
948
+ with a touch of energy, similar to someone narrating a<br>
949
+ beauty/makeup tutorial video. The overall atmosphere is<br>
950
+ relaxed and cheerful.
951
+ </td>
952
+ <td>
953
+ Hi everyone! Today I'll be sharing a soft, romantic<br>
954
+ makeup look that's perfect for dates. Many of you have <br>
955
+ been asking how to apply this eyeshadow naturally - the<br>
956
+ key is using gentle techniques. Let's go through the<br>
957
+ steps together...
958
+ </td>
959
+ <td>
960
+ <audio class="audio-md" src="assets/audios/美妆女博主.wav" controls></audio>
961
+ </td>
962
+ </tr>
963
+ <tr>
964
+ <td>
965
+ English-speaking middle-aged male voice, slightly husky, <br>
966
+ speaking at a moderate-to-slow pace with a deep tone. Like<br>
967
+ someone telling an old story, conveying a nostalgic feeling,<br>
968
+ with a relaxed and composed manner of speaking.
969
+ </td>
970
+ <td>
971
+ That was back in the late 1970s. I remember when our <br>
972
+ village first got electricity - everyone was so excited. <br>
973
+ In theevenings, people would bring their stools and <br>
974
+ gather under the big banyan tree by the village committee <br>
975
+ office to watch movies projected on the wall. Even now, <br>
976
+ thinking back to those moments still fills me with warmth.
977
+ </td>
978
+ <td>
979
+ <audio class="audio-md" src="assets/audios/回忆男中年.wav" controls></audio>
980
+ </td>
981
+ </tr>
982
  </tbody>
983
  </table>
984
  </div>
style.css CHANGED
@@ -837,5 +837,6 @@ h3,
837
  h4,
838
  h5,
839
  h6 {
 
840
  margin-bottom: 1rem;
841
  }
 
837
  h4,
838
  h5,
839
  h6 {
840
+ text-align: left;
841
  margin-bottom: 1rem;
842
  }