ytcheng commited on
Commit
63dab02
1 Parent(s): 92d8948

Model save

Browse files
Files changed (1) hide show
  1. README.md +16 -16
README.md CHANGED
@@ -5,6 +5,8 @@ tags:
5
  - sft
6
  - generated_from_trainer
7
  base_model: ytcheng/llama-3-8b-hf-sm-lora-merged
 
 
8
  model-index:
9
  - name: llama-3-8b-hf-ft-chat-lora
10
  results: []
@@ -15,9 +17,9 @@ should probably proofread and complete it, then remove this comment. -->
15
 
16
  # llama-3-8b-hf-ft-chat-lora
17
 
18
- This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on an unknown dataset.
19
  It achieves the following results on the evaluation set:
20
- - Loss: 1.2877
21
 
22
  ## Model description
23
 
@@ -36,32 +38,30 @@ More information needed
36
  ### Training hyperparameters
37
 
38
  The following hyperparameters were used during training:
39
- - learning_rate: 5e-05
40
  - train_batch_size: 32
41
  - eval_batch_size: 32
42
  - seed: 42
43
  - distributed_type: multi-GPU
44
- - num_devices: 4
45
  - gradient_accumulation_steps: 2
46
- - total_train_batch_size: 256
47
- - total_eval_batch_size: 128
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
- - lr_scheduler_type: linear
50
- - lr_scheduler_warmup_ratio: 0.03
51
  - num_epochs: 8
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:------:|:----:|:---------------:|
57
- | 2.2208 | 0.9870 | 38 | 1.9844 |
58
- | 1.7491 | 2.0 | 77 | 1.6850 |
59
- | 1.4968 | 2.9870 | 115 | 1.4926 |
60
- | 1.3633 | 4.0 | 154 | 1.3957 |
61
- | 1.2802 | 4.9870 | 192 | 1.3377 |
62
- | 1.2565 | 6.0 | 231 | 1.3025 |
63
- | 1.229 | 6.9870 | 269 | 1.2892 |
64
- | 1.1984 | 7.8961 | 304 | 1.2877 |
65
 
66
 
67
  ### Framework versions
 
5
  - sft
6
  - generated_from_trainer
7
  base_model: ytcheng/llama-3-8b-hf-sm-lora-merged
8
+ datasets:
9
+ - generator
10
  model-index:
11
  - name: llama-3-8b-hf-ft-chat-lora
12
  results: []
 
17
 
18
  # llama-3-8b-hf-ft-chat-lora
19
 
20
+ This model is a fine-tuned version of [ytcheng/llama-3-8b-hf-sm-lora-merged](https://huggingface.co/ytcheng/llama-3-8b-hf-sm-lora-merged) on the generator dataset.
21
  It achieves the following results on the evaluation set:
22
+ - Loss: 2.3111
23
 
24
  ## Model description
25
 
 
38
  ### Training hyperparameters
39
 
40
  The following hyperparameters were used during training:
41
+ - learning_rate: 3e-06
42
  - train_batch_size: 32
43
  - eval_batch_size: 32
44
  - seed: 42
45
  - distributed_type: multi-GPU
 
46
  - gradient_accumulation_steps: 2
47
+ - total_train_batch_size: 64
 
48
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
49
+ - lr_scheduler_type: cosine
50
+ - lr_scheduler_warmup_ratio: 0.1
51
  - num_epochs: 8
52
 
53
  ### Training results
54
 
55
  | Training Loss | Epoch | Step | Validation Loss |
56
  |:-------------:|:------:|:----:|:---------------:|
57
+ | 2.4275 | 0.9836 | 30 | 2.4239 |
58
+ | 2.4061 | 2.0 | 61 | 2.3996 |
59
+ | 2.3688 | 2.9836 | 91 | 2.3683 |
60
+ | 2.3269 | 4.0 | 122 | 2.3389 |
61
+ | 2.3136 | 4.9836 | 152 | 2.3166 |
62
+ | 2.3001 | 6.0 | 183 | 2.3114 |
63
+ | 2.318 | 6.9836 | 213 | 2.3110 |
64
+ | 2.2928 | 7.8689 | 240 | 2.3111 |
65
 
66
 
67
  ### Framework versions