Respair commited on
Commit
3c4fc5d
1 Parent(s): 2c67092

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -69,6 +69,7 @@ Brought to you by:
69
  - [Cryptowooser](https://github.com/cryptowooser)
70
  - [Buttercream](https://github.com/korakoe)
71
 
 
72
  ## Why does it matter?
73
 
74
  Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
@@ -156,27 +157,26 @@ pip install -r requirements.txt
156
 
157
  ***your input is too long for a single inference run. use the Longform inference function. this is particularly challenging with the Tsumugi (placeholder) checkpoint as the context length of the mLSTM layer is capped at 512, meaning you cannot generate more than ~10 seconds of audio without relying on the Longform function. but this shouldn't be an issue with the other checkpoint. all in all, this should not be a serious problem. as there's no theoretical limit to the output thanks to the Longform algoirthm.***
158
 
159
- 3. short inputs sound un-impressive:
160
 
161
  ***everything said in 2, applies here. make sure your style vector is suitable for that. but generally it's not recommended to use a very short input.***
162
 
163
- 4. About the Names used in kotodama inference:
164
-
165
  ***They are all random names mapped to the ids. they have no relation to the speaker, their role in a series or anything. there are hundreds of names so I should provide a metadata later. though the model should work with any random names thrown at it.***
166
 
167
- 5. Nans in 2nd Stage:
168
 
169
  ***Your gradients are probably exploding. try clipping or your batch size is way too high. if that didn't work, feel free to do the first few epochs which is the pre-training stage, using the original DP script. or use DP entriely.***
170
 
171
- 6. Supporting English (or other languages):
172
 
173
  ***There is a wide gap between English and other languages, so I mostly focus on non-English projects. but the good folks at Shoukan labs are trying to train a multilingual model with English included. however, if i ever do myself, it'll be focused on something specific (let's say accents).***
174
 
175
- 7. any questions:
176
-
177
- ```
178
- *saoshiant@Protonmail.com* or dm me on discord.
179
- ```
180
 
181
  ## Some cool projects:
182
 
 
69
  - [Cryptowooser](https://github.com/cryptowooser)
70
  - [Buttercream](https://github.com/korakoe)
71
 
72
+
73
  ## Why does it matter?
74
 
75
  Recently, there's a big trend towards larger models, increasing the scale. We're going the opposite way, trying to see how far we can push the limits by utilizing existing tools.
 
157
 
158
  ***your input is too long for a single inference run. use the Longform inference function. this is particularly challenging with the Tsumugi (placeholder) checkpoint as the context length of the mLSTM layer is capped at 512, meaning you cannot generate more than ~10 seconds of audio without relying on the Longform function. but this shouldn't be an issue with the other checkpoint. all in all, this should not be a serious problem. as there's no theoretical limit to the output thanks to the Longform algoirthm.***
159
 
160
+ 4. short inputs sound un-impressive:
161
 
162
  ***everything said in 2, applies here. make sure your style vector is suitable for that. but generally it's not recommended to use a very short input.***
163
 
164
+ 5. About the Names used in kotodama inference:
 
165
  ***They are all random names mapped to the ids. they have no relation to the speaker, their role in a series or anything. there are hundreds of names so I should provide a metadata later. though the model should work with any random names thrown at it.***
166
 
167
+ 6. Nans in 2nd Stage:
168
 
169
  ***Your gradients are probably exploding. try clipping or your batch size is way too high. if that didn't work, feel free to do the first few epochs which is the pre-training stage, using the original DP script. or use DP entriely.***
170
 
171
+ 7. Supporting English (or other languages):
172
 
173
  ***There is a wide gap between English and other languages, so I mostly focus on non-English projects. but the good folks at Shoukan labs are trying to train a multilingual model with English included. however, if i ever do myself, it'll be focused on something specific (let's say accents).***
174
 
175
+ 8. Any questions?
176
+ ```email
177
+ saoshiant@protonmail.com
178
+ ```
179
+ or simply DM me on discord.
180
 
181
  ## Some cool projects:
182