ytcheng commited on
Commit
91efe70
1 Parent(s): 035fb6e

Model save

Browse files
Files changed (1) hide show
  1. README.md +13 -13
README.md CHANGED
@@ -19,7 +19,7 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 2.4204
23
 
24
  ## Model description
25
 
@@ -38,7 +38,7 @@ More information needed
38
  ### Training hyperparameters
39
 
40
  The following hyperparameters were used during training:
41
- - learning_rate: 3e-05
42
  - train_batch_size: 32
43
  - eval_batch_size: 32
44
  - seed: 42
@@ -47,21 +47,21 @@ The following hyperparameters were used during training:
47
  - total_train_batch_size: 64
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
- - lr_scheduler_warmup_ratio: 0.1
51
  - num_epochs: 8
52
 
53
  ### Training results
54
 
55
- | Training Loss | Epoch | Step | Validation Loss |
56
- |:-------------:|:------:|:----:|:---------------:|
57
- | 3.9305 | 0.9855 | 34 | 3.4654 |
58
- | 2.8968 | 2.0 | 69 | 2.6918 |
59
- | 2.5375 | 2.9855 | 103 | 2.5330 |
60
- | 2.4472 | 4.0 | 138 | 2.4676 |
61
- | 2.3782 | 4.9855 | 172 | 2.4321 |
62
- | 2.3757 | 6.0 | 207 | 2.4202 |
63
- | 2.3563 | 6.9855 | 241 | 2.4204 |
64
- | 2.3775 | 7.8841 | 272 | 2.4204 |
65
 
66
 
67
  ### Framework versions
 
19
 
20
  This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 3.0057
23
 
24
  ## Model description
25
 
 
38
  ### Training hyperparameters
39
 
40
  The following hyperparameters were used during training:
41
+ - learning_rate: 0.0001
42
  - train_batch_size: 32
43
  - eval_batch_size: 32
44
  - seed: 42
 
47
  - total_train_batch_size: 64
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
+ - lr_scheduler_warmup_ratio: 0.3
51
  - num_epochs: 8
52
 
53
  ### Training results
54
 
55
+ | Training Loss | Epoch | Step | Validation Loss |
56
+ |:-------------:|:-----:|:----:|:---------------:|
57
+ | 3.6712 | 1.0 | 33 | 3.2633 |
58
+ | 2.6495 | 2.0 | 66 | 2.5112 |
59
+ | 2.3212 | 3.0 | 99 | 2.3937 |
60
+ | 2.0921 | 4.0 | 132 | 2.4587 |
61
+ | 1.9862 | 5.0 | 165 | 2.8611 |
62
+ | 1.9494 | 6.0 | 198 | 2.8478 |
63
+ | 1.9216 | 7.0 | 231 | 3.0062 |
64
+ | 1.9042 | 8.0 | 264 | 3.0057 |
65
 
66
 
67
  ### Framework versions