01GangaPutraBheeshma
/

databricks-facebook-opt2-ft-dolly-UT

@@ -169,21 +169,24 @@ trainer = SFTTrainer(
 )
 ```
-output_dir: Directory to save the trained model and logs.
-per_device_train_batch_size: Number of training samples per GPU.
-gradient_accumulation_steps: Number of steps to accumulate gradients before updating the model.
-optim: Optimizer for training (e.g., "paged_adamw_32bit").
-save_steps: Save model checkpoints every N steps.
-logging_steps: Log training information every N steps.
-learning_rate: Initial learning rate for training.
-max_grad_norm: Maximum gradient norm for gradient clipping.
-max_steps: Maximum number of training steps.
-warmup_ratio: Ratio of warmup steps during learning rate warmup.
-lr_scheduler_type: Type of learning rate scheduler (e.g., "constant").
-fp16: Enable mixed-precision training.
-group_by_length: Group training samples by length for efficiency.
-ddp_find_unused_parameters: Enable distributed training parameter setting.
-push_to_hub: Push the trained model to the Hugging Face Model Hub.
 ### Training Data
@@ -191,17 +194,19 @@ push_to_hub: Push the trained model to the Hugging Face Model Hub.
 #### Metrics
-Step	Training Loss
-100	      2.189900
-200	      2.014100
-300	      1.957200
-400	      1.990000
-500	      1.985200
-600	      1.986500
-700	      1.964300
-800	      1.951900
-900	      1.936900
-1000	  2.011200
 ### Results

 )
 ```
+| Parameter                     | Description                                                      |
+|-------------------------------|------------------------------------------------------------------|
+| `output_dir`                  | Directory to save the trained model and logs.                    |
+| `per_device_train_batch_size` | Number of training samples per GPU.                               |
+| `gradient_accumulation_steps` | Number of steps to accumulate gradients before updating the model.|
+| `optim`                       | Optimizer for training (e.g., "paged_adamw_32bit").               |
+| `save_steps`                  | Save model checkpoints every N steps.                            |
+| `logging_steps`               | Log training information every N steps.                          |
+| `learning_rate`               | Initial learning rate for training.                               |
+| `max_grad_norm`               | Maximum gradient norm for gradient clipping.                      |
+| `max_steps`                   | Maximum number of training steps.                                 |
+| `warmup_ratio`                | Ratio of warmup steps during learning rate warmup.               |
+| `lr_scheduler_type`          | Type of learning rate scheduler (e.g., "constant").              |
+| `fp16`                        | Enable mixed-precision training.                                  |
+| `group_by_length`             | Group training samples by length for efficiency.                 |
+| `ddp_find_unused_parameters` | Enable distributed training parameter setting.                   |
+| `push_to_hub`                 | Push the trained model to the Hugging Face Model Hub.            |
 ### Training Data
 #### Metrics
+| Step  | Training Loss |
+|-------|---------------|
+| 100   | 2.189900      |
+| 200   | 2.014100      |
+| 300   | 1.957200      |
+| 400   | 1.990000      |
+| 500   | 1.985200      |
+| 600   | 1.986500      |
+| 700   | 1.964300      |
+| 800   | 1.951900      |
+| 900   | 1.936900      |
+| 1000  | 2.011200      |
 ### Results