lemonilia commited on
Commit
23bb4d7
1 Parent(s): 1ae8a82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -47,11 +47,14 @@ Character: {utterance}
47
  - `User` and `Character` should be replaced with appropriate names.
48
 
49
 
50
- ## Training Hyperparameters
51
  [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training.
52
  The model has been trained as a 4-bit LoRA adapter. It's so large because a LoRA rank
53
  of 256 was used. It's suggested to merge it to the base Llama2-7B model.
54
 
 
 
 
55
  - learning_rate: 0.0002
56
  - lr_scheduler_type: constant
57
  - lora_r: 256
@@ -67,5 +70,5 @@ of 256 was used. It's suggested to merge it to the base Llama2-7B model.
67
  - gradient_accumulation_steps: 1
68
  - optimizer: adamw_torch
69
 
70
- For the multi-stage training, the `lora_model_dir` option was used to load and train the
71
- previously created adapter.
 
47
  - `User` and `Character` should be replaced with appropriate names.
48
 
49
 
50
+ ## Training procedure
51
  [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) was used for training.
52
  The model has been trained as a 4-bit LoRA adapter. It's so large because a LoRA rank
53
  of 256 was used. It's suggested to merge it to the base Llama2-7B model.
54
 
55
+ ### Training hyperparameters
56
+ For both passes these settings were used:
57
+
58
  - learning_rate: 0.0002
59
  - lr_scheduler_type: constant
60
  - lora_r: 256
 
70
  - gradient_accumulation_steps: 1
71
  - optimizer: adamw_torch
72
 
73
+ In the second pass, the `lora_model_dir` option was used to load and train the adapter
74
+ previously trained on a stories dataset.