ecker commited on
Commit
cbdbdab
1 Parent(s): 4554bf4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -1
README.md CHANGED
@@ -61,6 +61,8 @@ This repo contains the following configurations under `./models/`:
61
  * Despite the model *technically* receiving some (wrong) training for this modality, it does work enough from an existing model, albeit not with quality on par with the base AR+NAR modality.
62
  * Weights will update as training progresses for NAR-len, and may pivot to be the default modality.
63
  * If all goes well, these weights will revert back to the original snapshot, while the reference model will be renamed to `ar+nar-len-llama-8` instead.
 
 
64
 
65
  * ~~`config.llama-tts+stt.yaml` / `ar+nar-tts+stt-llama-8`~~: The above, but with partially trained for STT.
66
  + These weights use the above weights but with additional training for the default `tts` task and a new `stt` task (at a 3:1 ratio).
@@ -134,4 +136,6 @@ This repo also contains some LoRAs to serve as a reference under `./loras/`.
134
  Using a LoRA is the same as a base model, except you're required to have the base model already (obviously). Just use the LoRA's config YAML to load from instead to use it.
135
 
136
  The only caveat is that my original dataset *does* contain (most of) these samples already, but given the sheer size of it, they're probably underutilized.
137
- * However, the base model already has *almost adequate* output from these speakers, but not enough to be satisfactory.
 
 
 
61
  * Despite the model *technically* receiving some (wrong) training for this modality, it does work enough from an existing model, albeit not with quality on par with the base AR+NAR modality.
62
  * Weights will update as training progresses for NAR-len, and may pivot to be the default modality.
63
  * If all goes well, these weights will revert back to the original snapshot, while the reference model will be renamed to `ar+nar-len-llama-8` instead.
64
+ * Training a LoRA under the `NAR-len` modality does work, but is still kind-of susceptible to the lesser quality the base `NAR-len` outputs.
65
+ * In other words, finetuning for a specific speaker doesn't fully fix the quality issue.
66
 
67
  * ~~`config.llama-tts+stt.yaml` / `ar+nar-tts+stt-llama-8`~~: The above, but with partially trained for STT.
68
  + These weights use the above weights but with additional training for the default `tts` task and a new `stt` task (at a 3:1 ratio).
 
136
  Using a LoRA is the same as a base model, except you're required to have the base model already (obviously). Just use the LoRA's config YAML to load from instead to use it.
137
 
138
  The only caveat is that my original dataset *does* contain (most of) these samples already, but given the sheer size of it, they're probably underutilized.
139
+ * However, the base model already has *almost adequate* output from these speakers, but not enough to be satisfactory.
140
+
141
+ LoRAs under `ckpt[ar+nar-old-llama-8]` are LoRAs married to an older checkpoint, while `ckpt` *should* work under the reference model.