ecker
/

vall-e

ecker commited on Nov 19, 2024

Commit

d82ee7c

verified ·

1 Parent(s): 374fde8

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -100,12 +100,9 @@ This repo contains the following configurations under `./models/`:
   * The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
   * Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
     * ...except STT, this received no STT training out of fear of botching the model.
-  * ~~Weights will be added as the model is trained.~~
-  * I don't think the model can perform well at the current size.
-    * Longer utterances degrade and stutter.
-    * While more training seems to make it adhere to the prompt better, more training does not make the output more stable.
-      * It seems the exact same as the previous-erroneously-trained model (where it was actually trained to predict the next token, rather than the token in place).
-    * I would say that a bigger model might help, ignoring RVQ levels 1+ and solely focusing on NAR RVQ level 0 does not seem to matter.
 Some additional configurations have been explored with, but experiments have not been fruitful:
 * Exotic wrappers like `BitNet` seemed to yield little gains in inferencing, somehow. The memory savings is pretty much unneccessary as the models are already manageable at ~200M parameters.

   * The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
   * Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
     * ...except STT, this received no STT training out of fear of botching the model.
+  * Weights will be added as the model is trained.
+    * This *was* expected to be a dud, but one very, very small oversight in the sampling code proved to be the culrpit......
+    * In other words, the model *does* work.
 Some additional configurations have been explored with, but experiments have not been fruitful:
 * Exotic wrappers like `BitNet` seemed to yield little gains in inferencing, somehow. The memory savings is pretty much unneccessary as the models are already manageable at ~200M parameters.