Update README.md
Browse files
README.md
CHANGED
@@ -47,11 +47,14 @@ Character: {utterance}
|
|
47 |
- `User` and `Character` should be replaced with appropriate names.
|
48 |
|
49 |
|
50 |
-
## Training
|
51 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training.
|
52 |
The model has been trained as a 4-bit LoRA adapter. It's so large because a LoRA rank
|
53 |
of 256 was used. It's suggested to merge it to the base Llama2-7B model.
|
54 |
|
|
|
|
|
|
|
55 |
- learning_rate: 0.0002
|
56 |
- lr_scheduler_type: constant
|
57 |
- lora_r: 256
|
@@ -67,5 +70,5 @@ of 256 was used. It's suggested to merge it to the base Llama2-7B model.
|
|
67 |
- gradient_accumulation_steps: 1
|
68 |
- optimizer: adamw_torch
|
69 |
|
70 |
-
|
71 |
-
previously
|
|
|
47 |
- `User` and `Character` should be replaced with appropriate names.
|
48 |
|
49 |
|
50 |
+
## Training procedure
|
51 |
[Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training.
|
52 |
The model has been trained as a 4-bit LoRA adapter. It's so large because a LoRA rank
|
53 |
of 256 was used. It's suggested to merge it to the base Llama2-7B model.
|
54 |
|
55 |
+
### Training hyperparameters
|
56 |
+
For both passes these settings were used:
|
57 |
+
|
58 |
- learning_rate: 0.0002
|
59 |
- lr_scheduler_type: constant
|
60 |
- lora_r: 256
|
|
|
70 |
- gradient_accumulation_steps: 1
|
71 |
- optimizer: adamw_torch
|
72 |
|
73 |
+
In the second pass, the `lora_model_dir` option was used to load and train the adapter
|
74 |
+
previously trained on a stories dataset.
|