Update README.md
Browse files
README.md
CHANGED
@@ -61,6 +61,8 @@ This repo contains the following configurations under `./models/`:
|
|
61 |
* Despite the model *technically* receiving some (wrong) training for this modality, it does work enough from an existing model, albeit not with quality on par with the base AR+NAR modality.
|
62 |
* Weights will update as training progresses for NAR-len, and may pivot to be the default modality.
|
63 |
* If all goes well, these weights will revert back to the original snapshot, while the reference model will be renamed to `ar+nar-len-llama-8` instead.
|
|
|
|
|
64 |
|
65 |
* ~~`config.llama-tts+stt.yaml` / `ar+nar-tts+stt-llama-8`~~: The above, but with partially trained for STT.
|
66 |
+ These weights use the above weights but with additional training for the default `tts` task and a new `stt` task (at a 3:1 ratio).
|
@@ -134,4 +136,6 @@ This repo also contains some LoRAs to serve as a reference under `./loras/`.
|
|
134 |
Using a LoRA is the same as a base model, except you're required to have the base model already (obviously). Just use the LoRA's config YAML to load from instead to use it.
|
135 |
|
136 |
The only caveat is that my original dataset *does* contain (most of) these samples already, but given the sheer size of it, they're probably underutilized.
|
137 |
-
* However, the base model already has *almost adequate* output from these speakers, but not enough to be satisfactory.
|
|
|
|
|
|
61 |
* Despite the model *technically* receiving some (wrong) training for this modality, it does work enough from an existing model, albeit not with quality on par with the base AR+NAR modality.
|
62 |
* Weights will update as training progresses for NAR-len, and may pivot to be the default modality.
|
63 |
* If all goes well, these weights will revert back to the original snapshot, while the reference model will be renamed to `ar+nar-len-llama-8` instead.
|
64 |
+
* Training a LoRA under the `NAR-len` modality does work, but is still kind-of susceptible to the lesser quality the base `NAR-len` outputs.
|
65 |
+
* In other words, finetuning for a specific speaker doesn't fully fix the quality issue.
|
66 |
|
67 |
* ~~`config.llama-tts+stt.yaml` / `ar+nar-tts+stt-llama-8`~~: The above, but with partially trained for STT.
|
68 |
+ These weights use the above weights but with additional training for the default `tts` task and a new `stt` task (at a 3:1 ratio).
|
|
|
136 |
Using a LoRA is the same as a base model, except you're required to have the base model already (obviously). Just use the LoRA's config YAML to load from instead to use it.
|
137 |
|
138 |
The only caveat is that my original dataset *does* contain (most of) these samples already, but given the sheer size of it, they're probably underutilized.
|
139 |
+
* However, the base model already has *almost adequate* output from these speakers, but not enough to be satisfactory.
|
140 |
+
|
141 |
+
LoRAs under `ckpt[ar+nar-old-llama-8]` are LoRAs married to an older checkpoint, while `ckpt` *should* work under the reference model.
|