Felladrin commited on
Commit
fddb2c5
1 Parent(s): aba0753

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -155,3 +155,30 @@ output = generate(
155
 
156
  print(output[0]["generated_text"])
157
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
 
156
  print(output[0]["generated_text"])
157
  ```
158
+
159
+ ## How it was trained
160
+
161
+ This model was trained with [SFT Trainer](https://huggingface.co/docs/trl/main/en/sft_trainer) and [DPO Trainer](https://huggingface.co/docs/trl/main/en/dpo_trainer), in several sessions, using the following settings:
162
+
163
+ For Supervised Fine-Tuning:
164
+
165
+ | Hyperparameter | Value |
166
+ | :-------------------------- | :-------------------------------------------- |
167
+ | learning_rate | 2e-5 |
168
+ | total_train_batch_size | 24 |
169
+ | max_seq_length | 2048 |
170
+ | weight_decay | 0 |
171
+ | warmup_ratio | 0.02 |
172
+
173
+ For Direct Preference Optimization:
174
+
175
+ | Hyperparameter | Value |
176
+ | :-------------------------- | :-------------------------------------------- |
177
+ | learning_rate | 7.5e-7 |
178
+ | total_train_batch_size | 6 |
179
+ | max_length | 2048 |
180
+ | max_prompt_length | 1536 |
181
+ | max_steps | 200 |
182
+ | weight_decay | 0 |
183
+ | warmup_ratio | 0.02 |
184
+ | beta | 0.1 |