Update README.md
Browse files
README.md
CHANGED
@@ -69,6 +69,7 @@ Brought to you by:
|
|
69 |
- [Cryptowooser](https://github.com/cryptowooser)
|
70 |
- [Buttercream](https://github.com/korakoe)
|
71 |
|
|
|
72 |
## Why does it matter?
|
73 |
|
74 |
Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
|
@@ -156,27 +157,26 @@ pip install -r requirements.txt
|
|
156 |
|
157 |
***your input is too long for a single inference run. use the Longform inference function. this is particularly challenging with the Tsumugi (placeholder) checkpoint as the context length of the mLSTM layer is capped at 512, meaning you cannot generate more than ~10 seconds of audio without relying on the Longform function. but this shouldn't be an issue with the other checkpoint. all in all, this should not be a serious problem. as there's no theoretical limit to the output thanks to the Longform algoirthm.***
|
158 |
|
159 |
-
|
160 |
|
161 |
***everything said in 2, applies here. make sure your style vector is suitable for that. but generally it's not recommended to use a very short input.***
|
162 |
|
163 |
-
|
164 |
-
|
165 |
***They are all random names mapped to the ids. they have no relation to the speaker, their role in a series or anything. there are hundreds of names so I should provide a metadata later. though the model should work with any random names thrown at it.***
|
166 |
|
167 |
-
|
168 |
|
169 |
***Your gradients are probably exploding. try clipping or your batch size is way too high. if that didn't work, feel free to do the first few epochs which is the pre-training stage, using the original DP script. or use DP entriely.***
|
170 |
|
171 |
-
|
172 |
|
173 |
***There is a wide gap between English and other languages, so I mostly focus on non-English projects. but the good folks at Shoukan labs are trying to train a multilingual model with English included. however, if i ever do myself, it'll be focused on something specific (let's say accents).***
|
174 |
|
175 |
-
|
176 |
-
|
177 |
-
|
178 |
-
|
179 |
-
|
180 |
|
181 |
## Some cool projects:
|
182 |
|
|
|
69 |
- [Cryptowooser](https://github.com/cryptowooser)
|
70 |
- [Buttercream](https://github.com/korakoe)
|
71 |
|
72 |
+
|
73 |
## Why does it matter?
|
74 |
|
75 |
Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
|
|
|
157 |
|
158 |
***your input is too long for a single inference run. use the Longform inference function. this is particularly challenging with the Tsumugi (placeholder) checkpoint as the context length of the mLSTM layer is capped at 512, meaning you cannot generate more than ~10 seconds of audio without relying on the Longform function. but this shouldn't be an issue with the other checkpoint. all in all, this should not be a serious problem. as there's no theoretical limit to the output thanks to the Longform algoirthm.***
|
159 |
|
160 |
+
4. short inputs sound un-impressive:
|
161 |
|
162 |
***everything said in 2, applies here. make sure your style vector is suitable for that. but generally it's not recommended to use a very short input.***
|
163 |
|
164 |
+
5. About the Names used in kotodama inference:
|
|
|
165 |
***They are all random names mapped to the ids. they have no relation to the speaker, their role in a series or anything. there are hundreds of names so I should provide a metadata later. though the model should work with any random names thrown at it.***
|
166 |
|
167 |
+
6. Nans in 2nd Stage:
|
168 |
|
169 |
***Your gradients are probably exploding. try clipping or your batch size is way too high. if that didn't work, feel free to do the first few epochs which is the pre-training stage, using the original DP script. or use DP entriely.***
|
170 |
|
171 |
+
7. Supporting English (or other languages):
|
172 |
|
173 |
***There is a wide gap between English and other languages, so I mostly focus on non-English projects. but the good folks at Shoukan labs are trying to train a multilingual model with English included. however, if i ever do myself, it'll be focused on something specific (let's say accents).***
|
174 |
|
175 |
+
8. Any questions?
|
176 |
+
```email
|
177 |
+
saoshiant@protonmail.com
|
178 |
+
```
|
179 |
+
or simply DM me on discord.
|
180 |
|
181 |
## Some cool projects:
|
182 |
|