Respair commited on
Commit
1432f65
1 Parent(s): 0ab1623

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -28,8 +28,8 @@ tags:
28
 
29
  Part of a [personal project](https://github.com/Respaired/Project-Kanade), focusing on further advancing Japanese speech field.
30
 
31
- - Use the HuggingFace Space for **Tsukasa** (24khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Shiki)
32
- - HuggingFace Space for **Tsumugi** (48khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Shiki)
33
 
34
  - Join Shoukan lab's discord server, a comfy place I frequently visit -> [![Discord](https://img.shields.io/discord/1197679063150637117?logo=discord&logoColor=white&label=Join%20our%20Community)](https://discord.gg/JrPSzdcM)
35
 
@@ -54,15 +54,15 @@ This is a speech generation network, aimed at maximizing the expressiveness and
54
  - Fixed DDP and BF16 Training (mostly!)
55
 
56
 
57
- There are two checkpoints you can use. Tsukasa & Tsumugi (placeholder).
58
 
59
  Tsukasa was trained on ~800 hours of studio grade, high quality data. sourced mainly from games and novels, part of it from a private dataset.
60
  So the Japanese is going to be the "anime japanese" (it's different than what people usually speak in real-life.)
61
 
62
- For Tsumugi (placeholder) a subset of this data was used; at around ~300 hours but in a more controlled manner with additional manual cleaning & annotations.
63
 
64
- Unfortuantely Tsumugi's context length is capped and that means the model will not have enough information to handle the intonations as good as Tsukasa.
65
- it also only supports the first mode of Kotodama's inference, which means no voice design.
66
 
67
 
68
  Brought to you by:
@@ -74,7 +74,7 @@ Brought to you by:
74
 
75
  Special thanks to Yinghao Aaron Li, the Author of StyleTTS which this work is based on top of that. <br> He is one of the most talented Engineers I've ever seen in this field.
76
  Also Karesto and Raven for their help in debugging some of the scripts. wonderful people.
77
-
78
  ## Why does it matter?
79
 
80
  Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
 
28
 
29
  Part of a [personal project](https://github.com/Respaired/Project-Kanade), focusing on further advancing Japanese speech field.
30
 
31
+ - Use the HuggingFace Space for **Tsukasa** (24khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Tsukasa_Speech)
32
+ - HuggingFace Space for **Tsumugi** (48khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Tsumugi_48khz)
33
 
34
  - Join Shoukan lab's discord server, a comfy place I frequently visit -> [![Discord](https://img.shields.io/discord/1197679063150637117?logo=discord&logoColor=white&label=Join%20our%20Community)](https://discord.gg/JrPSzdcM)
35
 
 
54
  - Fixed DDP and BF16 Training (mostly!)
55
 
56
 
57
+ There are two checkpoints you can use. Tsukasa & Tsumugi 48khz (placeholder).
58
 
59
  Tsukasa was trained on ~800 hours of studio grade, high quality data. sourced mainly from games and novels, part of it from a private dataset.
60
  So the Japanese is going to be the "anime japanese" (it's different than what people usually speak in real-life.)
61
 
62
+ For Tsumugi (placeholder) a subset of this data was used with a 48khz config; at around ~300 hours but in a more controlled manner with additional manual cleaning & annotations.
63
 
64
+ **Unfortuantely Tsumugi (48khz)'s context length is capped and that means the model will not have enough information to handle the intonations as good as Tsukasa.
65
+ it also only supports the first mode of Kotodama's inference, which means no voice design.**
66
 
67
 
68
  Brought to you by:
 
74
 
75
  Special thanks to Yinghao Aaron Li, the Author of StyleTTS which this work is based on top of that. <br> He is one of the most talented Engineers I've ever seen in this field.
76
  Also Karesto and Raven for their help in debugging some of the scripts. wonderful people.
77
+ ___________________________________________________________________________________
78
  ## Why does it matter?
79
 
80
  Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.