ytcheng commited on
Commit
20adfa6
1 Parent(s): 8f06284

Model save

Browse files
Files changed (1) hide show
  1. README.md +9 -13
README.md CHANGED
@@ -19,7 +19,7 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
- - Loss: 3.0952
23
 
24
  ## Model description
25
 
@@ -39,29 +39,25 @@ More information needed
39
 
40
  The following hyperparameters were used during training:
41
  - learning_rate: 0.0001
42
- - train_batch_size: 32
43
- - eval_batch_size: 32
44
  - seed: 42
45
  - distributed_type: multi-GPU
46
  - gradient_accumulation_steps: 2
47
- - total_train_batch_size: 64
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
  - lr_scheduler_warmup_ratio: 0.3
51
- - num_epochs: 8
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:------:|:----:|:---------------:|
57
- | 3.6219 | 0.9851 | 33 | 3.2342 |
58
- | 2.6017 | 2.0 | 67 | 2.4826 |
59
- | 2.2366 | 2.9851 | 100 | 2.3797 |
60
- | 2.0617 | 4.0 | 134 | 2.6861 |
61
- | 1.9633 | 4.9851 | 167 | 3.0894 |
62
- | 1.8968 | 6.0 | 201 | 3.1514 |
63
- | 1.8985 | 6.9851 | 234 | 3.1113 |
64
- | 1.8886 | 7.8806 | 264 | 3.0952 |
65
 
66
 
67
  ### Framework versions
 
19
 
20
  This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 4.8363
23
 
24
  ## Model description
25
 
 
39
 
40
  The following hyperparameters were used during training:
41
  - learning_rate: 0.0001
42
+ - train_batch_size: 8
43
+ - eval_batch_size: 8
44
  - seed: 42
45
  - distributed_type: multi-GPU
46
  - gradient_accumulation_steps: 2
47
+ - total_train_batch_size: 16
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
  - lr_scheduler_type: cosine
50
  - lr_scheduler_warmup_ratio: 0.3
51
+ - num_epochs: 4
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:------:|:----:|:---------------:|
57
+ | 2.3345 | 0.9963 | 133 | 2.4116 |
58
+ | 1.9916 | 2.0 | 267 | 3.9850 |
59
+ | 1.8552 | 2.9963 | 400 | 4.8651 |
60
+ | 1.8627 | 3.9850 | 532 | 4.8363 |
 
 
 
 
61
 
62
 
63
  ### Framework versions