Update README.md
Browse files
README.md
CHANGED
@@ -95,26 +95,26 @@ print(sequences[0]['generated_text'])
|
|
95 |
|
96 |
|
97 |
## Training hyperparameters
|
98 |
-
|
99 |
-
|
100 |
-
r=16
|
101 |
-
lora_alpha=16
|
102 |
-
lora_dropout=0.05
|
103 |
-
bias="none"
|
104 |
-
task_type="CAUSAL_LM"
|
105 |
-
target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
106 |
-
|
107 |
-
|
108 |
-
per_device_train_batch_size=4
|
109 |
-
gradient_accumulation_steps=4
|
110 |
-
gradient_checkpointing=True
|
111 |
-
learning_rate=5e-5
|
112 |
-
lr_scheduler_type="cosine"
|
113 |
-
max_steps=200
|
114 |
-
optim="paged_adamw_32bit"
|
115 |
-
warmup_steps=100
|
116 |
-
|
117 |
-
|
118 |
-
beta=0.1
|
119 |
-
max_prompt_length=1024
|
120 |
-
max_length=1536
|
|
|
95 |
|
96 |
|
97 |
## Training hyperparameters
|
98 |
+
|
99 |
+
**LoRA**:
|
100 |
+
* r=16
|
101 |
+
* lora_alpha=16
|
102 |
+
* lora_dropout=0.05
|
103 |
+
* bias="none"
|
104 |
+
* task_type="CAUSAL_LM"
|
105 |
+
* target_modules=['k_proj', 'gate_proj', 'v_proj', 'up_proj', 'q_proj', 'o_proj', 'down_proj']
|
106 |
+
|
107 |
+
**Training arguments**:
|
108 |
+
* per_device_train_batch_size=4
|
109 |
+
* gradient_accumulation_steps=4
|
110 |
+
* gradient_checkpointing=True
|
111 |
+
* learning_rate=5e-5
|
112 |
+
* lr_scheduler_type="cosine"
|
113 |
+
* max_steps=200
|
114 |
+
* optim="paged_adamw_32bit"
|
115 |
+
* warmup_steps=100
|
116 |
+
|
117 |
+
**DPOTrainer**:
|
118 |
+
* beta=0.1
|
119 |
+
* max_prompt_length=1024
|
120 |
+
* max_length=1536
|