Update README.md
Browse files
README.md
CHANGED
@@ -28,8 +28,8 @@ tags:
|
|
28 |
|
29 |
Part of a [personal project](https://github.com/Respaired/Project-Kanade), focusing on further advancing Japanese speech field.
|
30 |
|
31 |
-
- Use the HuggingFace Space for **Tsukasa** (24khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/
|
32 |
-
- HuggingFace Space for **Tsumugi** (48khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/
|
33 |
|
34 |
- Join Shoukan lab's discord server, a comfy place I frequently visit -> [![Discord](https://img.shields.io/discord/1197679063150637117?logo=discord&logoColor=white&label=Join%20our%20Community)](https://discord.gg/JrPSzdcM)
|
35 |
|
@@ -54,15 +54,15 @@ This is a speech generation network, aimed at maximizing the expressiveness and
|
|
54 |
- Fixed DDP and BF16 Training (mostly!)
|
55 |
|
56 |
|
57 |
-
There are two checkpoints you can use. Tsukasa & Tsumugi (placeholder).
|
58 |
|
59 |
Tsukasa was trained on ~800 hours of studio grade, high quality data. sourced mainly from games and novels, part of it from a private dataset.
|
60 |
So the Japanese is going to be the "anime japanese" (it's different than what people usually speak in real-life.)
|
61 |
|
62 |
-
For Tsumugi (placeholder) a subset of this data was used; at around ~300 hours but in a more controlled manner with additional manual cleaning & annotations.
|
63 |
|
64 |
-
Unfortuantely Tsumugi's context length is capped and that means the model will not have enough information to handle the intonations as good as Tsukasa.
|
65 |
-
it also only supports the first mode of Kotodama's inference, which means no voice design
|
66 |
|
67 |
|
68 |
Brought to you by:
|
@@ -74,7 +74,7 @@ Brought to you by:
|
|
74 |
|
75 |
Special thanks to Yinghao Aaron Li, the Author of StyleTTS which this work is based on top of that. <br> He is one of the most talented Engineers I've ever seen in this field.
|
76 |
Also Karesto and Raven for their help in debugging some of the scripts. wonderful people.
|
77 |
-
|
78 |
## Why does it matter?
|
79 |
|
80 |
Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
|
|
|
28 |
|
29 |
Part of a [personal project](https://github.com/Respaired/Project-Kanade), focusing on further advancing Japanese speech field.
|
30 |
|
31 |
+
- Use the HuggingFace Space for **Tsukasa** (24khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Tsukasa_Speech)
|
32 |
+
- HuggingFace Space for **Tsumugi** (48khz): [![huggingface](https://img.shields.io/badge/Interactive_Demo-HuggingFace-yellow)](https://huggingface.co/spaces/Respair/Tsumugi_48khz)
|
33 |
|
34 |
- Join Shoukan lab's discord server, a comfy place I frequently visit -> [![Discord](https://img.shields.io/discord/1197679063150637117?logo=discord&logoColor=white&label=Join%20our%20Community)](https://discord.gg/JrPSzdcM)
|
35 |
|
|
|
54 |
- Fixed DDP and BF16 Training (mostly!)
|
55 |
|
56 |
|
57 |
+
There are two checkpoints you can use. Tsukasa & Tsumugi 48khz (placeholder).
|
58 |
|
59 |
Tsukasa was trained on ~800 hours of studio grade, high quality data. sourced mainly from games and novels, part of it from a private dataset.
|
60 |
So the Japanese is going to be the "anime japanese" (it's different than what people usually speak in real-life.)
|
61 |
|
62 |
+
For Tsumugi (placeholder) a subset of this data was used with a 48khz config; at around ~300 hours but in a more controlled manner with additional manual cleaning & annotations.
|
63 |
|
64 |
+
**Unfortuantely Tsumugi (48khz)'s context length is capped and that means the model will not have enough information to handle the intonations as good as Tsukasa.
|
65 |
+
it also only supports the first mode of Kotodama's inference, which means no voice design.**
|
66 |
|
67 |
|
68 |
Brought to you by:
|
|
|
74 |
|
75 |
Special thanks to Yinghao Aaron Li, the Author of StyleTTS which this work is based on top of that. <br> He is one of the most talented Engineers I've ever seen in this field.
|
76 |
Also Karesto and Raven for their help in debugging some of the scripts. wonderful people.
|
77 |
+
___________________________________________________________________________________
|
78 |
## Why does it matter?
|
79 |
|
80 |
Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
|