Update README.md
Browse files
README.md
CHANGED
@@ -100,12 +100,9 @@ This repo contains the following configurations under `./models/`:
|
|
100 |
* The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
|
101 |
* Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
|
102 |
* ...except STT, this received no STT training out of fear of botching the model.
|
103 |
-
*
|
104 |
-
|
105 |
-
*
|
106 |
-
* While more training seems to make it adhere to the prompt better, more training does not make the output more stable.
|
107 |
-
* It seems the exact same as the previous-erroneously-trained model (where it was actually trained to predict the next token, rather than the token in place).
|
108 |
-
* I would say that a bigger model might help, ignoring RVQ levels 1+ and solely focusing on NAR RVQ level 0 does not seem to matter.
|
109 |
|
110 |
Some additional configurations have been explored with, but experiments have not been fruitful:
|
111 |
* Exotic wrappers like `BitNet` seemed to yield little gains in inferencing, somehow. The memory savings is pretty much unneccessary as the models are already manageable at ~200M parameters.
|
|
|
100 |
* The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
|
101 |
* Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
|
102 |
* ...except STT, this received no STT training out of fear of botching the model.
|
103 |
+
* Weights will be added as the model is trained.
|
104 |
+
* This *was* expected to be a dud, but one very, very small oversight in the sampling code proved to be the culrpit......
|
105 |
+
* In other words, the model *does* work.
|
|
|
|
|
|
|
106 |
|
107 |
Some additional configurations have been explored with, but experiments have not been fruitful:
|
108 |
* Exotic wrappers like `BitNet` seemed to yield little gains in inferencing, somehow. The memory savings is pretty much unneccessary as the models are already manageable at ~200M parameters.
|