01GangaPutraBheeshma commited on
Commit
156eb2a
1 Parent(s): a305790

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -26
README.md CHANGED
@@ -169,21 +169,24 @@ trainer = SFTTrainer(
169
  )
170
  ```
171
 
172
- output_dir: Directory to save the trained model and logs.
173
- per_device_train_batch_size: Number of training samples per GPU.
174
- gradient_accumulation_steps: Number of steps to accumulate gradients before updating the model.
175
- optim: Optimizer for training (e.g., "paged_adamw_32bit").
176
- save_steps: Save model checkpoints every N steps.
177
- logging_steps: Log training information every N steps.
178
- learning_rate: Initial learning rate for training.
179
- max_grad_norm: Maximum gradient norm for gradient clipping.
180
- max_steps: Maximum number of training steps.
181
- warmup_ratio: Ratio of warmup steps during learning rate warmup.
182
- lr_scheduler_type: Type of learning rate scheduler (e.g., "constant").
183
- fp16: Enable mixed-precision training.
184
- group_by_length: Group training samples by length for efficiency.
185
- ddp_find_unused_parameters: Enable distributed training parameter setting.
186
- push_to_hub: Push the trained model to the Hugging Face Model Hub.
 
 
 
187
 
188
  ### Training Data
189
 
@@ -191,17 +194,19 @@ push_to_hub: Push the trained model to the Hugging Face Model Hub.
191
 
192
  #### Metrics
193
 
194
- Step Training Loss
195
- 100 2.189900
196
- 200 2.014100
197
- 300 1.957200
198
- 400 1.990000
199
- 500 1.985200
200
- 600 1.986500
201
- 700 1.964300
202
- 800 1.951900
203
- 900 1.936900
204
- 1000 2.011200
 
 
205
 
206
  ### Results
207
 
 
169
  )
170
  ```
171
 
172
+ | Parameter | Description |
173
+ |-------------------------------|------------------------------------------------------------------|
174
+ | `output_dir` | Directory to save the trained model and logs. |
175
+ | `per_device_train_batch_size` | Number of training samples per GPU. |
176
+ | `gradient_accumulation_steps` | Number of steps to accumulate gradients before updating the model.|
177
+ | `optim` | Optimizer for training (e.g., "paged_adamw_32bit"). |
178
+ | `save_steps` | Save model checkpoints every N steps. |
179
+ | `logging_steps` | Log training information every N steps. |
180
+ | `learning_rate` | Initial learning rate for training. |
181
+ | `max_grad_norm` | Maximum gradient norm for gradient clipping. |
182
+ | `max_steps` | Maximum number of training steps. |
183
+ | `warmup_ratio` | Ratio of warmup steps during learning rate warmup. |
184
+ | `lr_scheduler_type` | Type of learning rate scheduler (e.g., "constant"). |
185
+ | `fp16` | Enable mixed-precision training. |
186
+ | `group_by_length` | Group training samples by length for efficiency. |
187
+ | `ddp_find_unused_parameters` | Enable distributed training parameter setting. |
188
+ | `push_to_hub` | Push the trained model to the Hugging Face Model Hub. |
189
+
190
 
191
  ### Training Data
192
 
 
194
 
195
  #### Metrics
196
 
197
+ | Step | Training Loss |
198
+ |-------|---------------|
199
+ | 100 | 2.189900 |
200
+ | 200 | 2.014100 |
201
+ | 300 | 1.957200 |
202
+ | 400 | 1.990000 |
203
+ | 500 | 1.985200 |
204
+ | 600 | 1.986500 |
205
+ | 700 | 1.964300 |
206
+ | 800 | 1.951900 |
207
+ | 900 | 1.936900 |
208
+ | 1000 | 2.011200 |
209
+
210
 
211
  ### Results
212