chansung commited on
Commit
18a20e0
1 Parent(s): af8c8f2

Model save

Browse files
README.md CHANGED
@@ -2,13 +2,12 @@
2
  license: gemma
3
  library_name: peft
4
  tags:
5
- - alignment-handbook
6
  - trl
7
  - sft
8
  - generated_from_trainer
9
  base_model: google/gemma-2b
10
  datasets:
11
- - llama-duo/synth_summarize_dataset_dedup
12
  model-index:
13
  - name: gemma2b-summarize-gpt4o-128k
14
  results: []
@@ -19,9 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
19
 
20
  # gemma2b-summarize-gpt4o-128k
21
 
22
- This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the llama-duo/synth_summarize_dataset_dedup dataset.
23
  It achieves the following results on the evaluation set:
24
- - Loss: 2.5233
25
 
26
  ## Model description
27
 
@@ -52,27 +51,22 @@ The following hyperparameters were used during training:
52
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
53
  - lr_scheduler_type: cosine
54
  - lr_scheduler_warmup_ratio: 0.1
55
- - num_epochs: 15
56
 
57
  ### Training results
58
 
59
  | Training Loss | Epoch | Step | Validation Loss |
60
  |:-------------:|:-----:|:----:|:---------------:|
61
- | 1.2085 | 1.0 | 293 | 2.4863 |
62
- | 1.1135 | 2.0 | 586 | 2.4516 |
63
- | 1.0715 | 3.0 | 879 | 2.4473 |
64
- | 1.0471 | 4.0 | 1172 | 2.4524 |
65
- | 1.0357 | 5.0 | 1465 | 2.4685 |
66
- | 0.993 | 6.0 | 1758 | 2.4703 |
67
- | 0.9941 | 7.0 | 2051 | 2.4906 |
68
- | 0.9844 | 8.0 | 2344 | 2.4896 |
69
- | 0.9779 | 9.0 | 2637 | 2.5025 |
70
- | 0.9639 | 10.0 | 2930 | 2.5126 |
71
- | 0.952 | 11.0 | 3223 | 2.5192 |
72
- | 0.9505 | 12.0 | 3516 | 2.5205 |
73
- | 0.9442 | 13.0 | 3809 | 2.5223 |
74
- | 0.9469 | 14.0 | 4102 | 2.5227 |
75
- | 0.9444 | 15.0 | 4395 | 2.5233 |
76
 
77
 
78
  ### Framework versions
 
2
  license: gemma
3
  library_name: peft
4
  tags:
 
5
  - trl
6
  - sft
7
  - generated_from_trainer
8
  base_model: google/gemma-2b
9
  datasets:
10
+ - generator
11
  model-index:
12
  - name: gemma2b-summarize-gpt4o-128k
13
  results: []
 
18
 
19
  # gemma2b-summarize-gpt4o-128k
20
 
21
+ This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the generator dataset.
22
  It achieves the following results on the evaluation set:
23
+ - Loss: 2.7978
24
 
25
  ## Model description
26
 
 
51
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
52
  - lr_scheduler_type: cosine
53
  - lr_scheduler_warmup_ratio: 0.1
54
+ - num_epochs: 10
55
 
56
  ### Training results
57
 
58
  | Training Loss | Epoch | Step | Validation Loss |
59
  |:-------------:|:-----:|:----:|:---------------:|
60
+ | 1.1249 | 1.0 | 293 | 2.4641 |
61
+ | 1.0415 | 2.0 | 586 | 2.4514 |
62
+ | 0.9915 | 3.0 | 879 | 2.4750 |
63
+ | 0.9551 | 4.0 | 1172 | 2.5292 |
64
+ | 0.9287 | 5.0 | 1465 | 2.5925 |
65
+ | 0.8733 | 6.0 | 1758 | 2.6555 |
66
+ | 0.8577 | 7.0 | 2051 | 2.7316 |
67
+ | 0.8364 | 8.0 | 2344 | 2.7742 |
68
+ | 0.8311 | 9.0 | 2637 | 2.7971 |
69
+ | 0.8243 | 10.0 | 2930 | 2.7978 |
 
 
 
 
 
70
 
71
 
72
  ### Framework versions
adapter_model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:ac8057f28991d560ba81fa781ad1f23b1cfb0bb8237f05c02f83bf70833ec035
3
  size 78480320
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f15b613b4b9132b5415d28f8f8112434a3f6b55580e613d9c3c48eeb9510e0af
3
  size 78480320
all_results.json CHANGED
@@ -1,14 +1,9 @@
1
  {
2
- "epoch": 15.0,
3
- "eval_loss": 2.523277521133423,
4
- "eval_runtime": 0.5332,
5
- "eval_samples": 25,
6
- "eval_samples_per_second": 18.753,
7
- "eval_steps_per_second": 1.875,
8
- "total_flos": 2.581505823377195e+18,
9
- "train_loss": 1.0488379673203783,
10
- "train_runtime": 23446.7186,
11
  "train_samples": 129221,
12
- "train_samples_per_second": 8.983,
13
- "train_steps_per_second": 0.187
14
  }
 
1
  {
2
+ "epoch": 10.0,
3
+ "total_flos": 1.7464232891960525e+18,
4
+ "train_loss": 0.9647074054125633,
5
+ "train_runtime": 17674.2713,
 
 
 
 
 
6
  "train_samples": 129221,
7
+ "train_samples_per_second": 7.945,
8
+ "train_steps_per_second": 0.166
9
  }
runs/Jun10_18-07-27_user-HP-Z8-Fury-G5-Workstation-Desktop-PC/events.out.tfevents.1718010465.user-HP-Z8-Fury-G5-Workstation-Desktop-PC.8637.0 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4ecaf3e1573e9f0c3a5dafe90e14ed799d74f5c23912fbfea8f6fdd5f914a74b
3
- size 130527
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5ab0e33b4d4c1f897135ba921c654553191b29c08d4b354ddd6477cc7642357
3
+ size 132418
train_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
- "epoch": 15.0,
3
- "total_flos": 2.581505823377195e+18,
4
- "train_loss": 1.0488379673203783,
5
- "train_runtime": 23446.7186,
6
  "train_samples": 129221,
7
- "train_samples_per_second": 8.983,
8
- "train_steps_per_second": 0.187
9
  }
 
1
  {
2
+ "epoch": 10.0,
3
+ "total_flos": 1.7464232891960525e+18,
4
+ "train_loss": 0.9647074054125633,
5
+ "train_runtime": 17674.2713,
6
  "train_samples": 129221,
7
+ "train_samples_per_second": 7.945,
8
+ "train_steps_per_second": 0.166
9
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff