Update README.md
Browse files
README.md
CHANGED
@@ -80,9 +80,10 @@ your desired response length:
|
|
80 |
![settings](https://files.catbox.moe/6lcz0u.png)
|
81 |
|
82 |
## Training procedure
|
83 |
-
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
84 |
-
The model has been trained as a 4-bit LoRA adapter
|
85 |
-
of 256 was used. It's suggested to merge
|
|
|
86 |
|
87 |
### Training hyperparameters
|
88 |
For the first pass these settings were used:
|
@@ -106,5 +107,6 @@ In the second pass, the `lora_model_dir` option was used to load and train the a
|
|
106 |
previously trained on a stories dataset. These settings were also changed:
|
107 |
|
108 |
- lora_dropout: 0.0
|
|
|
109 |
- gradient_accumulation_steps: 8
|
110 |
- learning_rate: 0.0006
|
|
|
80 |
![settings](https://files.catbox.moe/6lcz0u.png)
|
81 |
|
82 |
## Training procedure
|
83 |
+
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training
|
84 |
+
on a single NVidia RTX3090 GPU. The model has been trained as a 4-bit LoRA adapter, which
|
85 |
+
is so large because a LoRA rank of 256 was used. It's suggested to merge the adapter to
|
86 |
+
the base Llama2-7B model (or other Llama2-based models).
|
87 |
|
88 |
### Training hyperparameters
|
89 |
For the first pass these settings were used:
|
|
|
107 |
previously trained on a stories dataset. These settings were also changed:
|
108 |
|
109 |
- lora_dropout: 0.0
|
110 |
+
- micro_batch_size: 1
|
111 |
- gradient_accumulation_steps: 8
|
112 |
- learning_rate: 0.0006
|