ytcheng
/

llama-3-8b-hf-ft-chat-lora

Generated from Trainer

Model card Files Files and versions Metrics Training metrics Community

ytcheng commited on May 10

Commit

20adfa6

•

1 Parent(s): 8f06284

Model save

Files changed (1) hide show

README.md +9 -13

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
 It achieves the following results on the evaluation set:
-- Loss: 3.0952
 ## Model description
@@ -39,29 +39,25 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.0001
-- train_batch_size: 32
-- eval_batch_size: 32
 - seed: 42
 - distributed_type: multi-GPU
 - gradient_accumulation_steps: 2
-- total_train_batch_size: 64
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.3
-- num_epochs: 8
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 3.6219        | 0.9851 | 33   | 3.2342          |
-| 2.6017        | 2.0    | 67   | 2.4826          |
-| 2.2366        | 2.9851 | 100  | 2.3797          |
-| 2.0617        | 4.0    | 134  | 2.6861          |
-| 1.9633        | 4.9851 | 167  | 3.0894          |
-| 1.8968        | 6.0    | 201  | 3.1514          |
-| 1.8985        | 6.9851 | 234  | 3.1113          |
-| 1.8886        | 7.8806 | 264  | 3.0952          |
 ### Framework versions

 This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
 It achieves the following results on the evaluation set:
+- Loss: 4.8363
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.0001
+- train_batch_size: 8
+- eval_batch_size: 8
 - seed: 42
 - distributed_type: multi-GPU
 - gradient_accumulation_steps: 2
+- total_train_batch_size: 16
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_ratio: 0.3
+- num_epochs: 4
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 2.3345        | 0.9963 | 133  | 2.4116          |
+| 1.9916        | 2.0    | 267  | 3.9850          |
+| 1.8552        | 2.9963 | 400  | 4.8651          |
+| 1.8627        | 3.9850 | 532  | 4.8363          |
 ### Framework versions