ecker commited on
Commit
374fde8
·
verified ·
1 Parent(s): e302cd8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -1
README.md CHANGED
@@ -100,7 +100,12 @@ This repo contains the following configurations under `./models/`:
100
  * The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
101
  * Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
102
  * ...except STT, this received no STT training out of fear of botching the model.
103
- * Weights will be added as the model is trained.
 
 
 
 
 
104
 
105
  Some additional configurations have been explored with, but experiments have not been fruitful:
106
  * Exotic wrappers like `BitNet` seemed to yield little gains in inferencing, somehow. The memory savings is pretty much unneccessary as the models are already manageable at ~200M parameters.
 
100
  * The "confidence" issue on voices it hasn't seen / hasn't seen much of is much more noticeable as RVQ level 0 is much more susceptable to it.
101
  * Unlike the base model, this is trained with the current dataset without iteratively dripfeeding additional sources (like tacking on Emilia afterwards).
102
  * ...except STT, this received no STT training out of fear of botching the model.
103
+ * ~~Weights will be added as the model is trained.~~
104
+ * I don't think the model can perform well at the current size.
105
+ * Longer utterances degrade and stutter.
106
+ * While more training seems to make it adhere to the prompt better, more training does not make the output more stable.
107
+ * It seems the exact same as the previous-erroneously-trained model (where it was actually trained to predict the next token, rather than the token in place).
108
+ * I would say that a bigger model might help, ignoring RVQ levels 1+ and solely focusing on NAR RVQ level 0 does not seem to matter.
109
 
110
  Some additional configurations have been explored with, but experiments have not been fruitful:
111
  * Exotic wrappers like `BitNet` seemed to yield little gains in inferencing, somehow. The memory savings is pretty much unneccessary as the models are already manageable at ~200M parameters.