n3wtou commited on
Commit
478fb71
1 Parent(s): 1416f7d

Training in progress epoch 0

Browse files
Files changed (4) hide show
  1. README.md +6 -78
  2. config.json +1 -1
  3. generation_config.json +1 -1
  4. tf_model.h5 +1 -1
README.md CHANGED
@@ -14,9 +14,9 @@ probably proofread and complete it, then remove this comment. -->
14
 
15
  This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
- - Train Loss: 0.1500
18
- - Validation Loss: 5.6063
19
- - Epoch: 72
20
 
21
  ## Model description
22
 
@@ -35,91 +35,19 @@ More information needed
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
- - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0003, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0003, 'decay_steps': 99900, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 100, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
39
  - training_precision: mixed_float16
40
 
41
  ### Training results
42
 
43
  | Train Loss | Validation Loss | Epoch |
44
  |:----------:|:---------------:|:-----:|
45
- | 7.7378 | 4.4308 | 0 |
46
- | 4.9883 | 3.7424 | 1 |
47
- | 4.2872 | 3.4004 | 2 |
48
- | 3.8475 | 3.1811 | 3 |
49
- | 3.5692 | 3.0209 | 4 |
50
- | 3.3360 | 2.9025 | 5 |
51
- | 3.1530 | 2.8074 | 6 |
52
- | 3.0035 | 2.7699 | 7 |
53
- | 2.8622 | 2.7444 | 8 |
54
- | 2.7423 | 2.7162 | 9 |
55
- | 2.6218 | 2.7089 | 10 |
56
- | 2.5106 | 2.6997 | 11 |
57
- | 2.4090 | 2.7081 | 12 |
58
- | 2.3063 | 2.7278 | 13 |
59
- | 2.2076 | 2.7389 | 14 |
60
- | 2.1084 | 2.7752 | 15 |
61
- | 2.0043 | 2.8056 | 16 |
62
- | 1.9061 | 2.8248 | 17 |
63
- | 1.8142 | 2.8616 | 18 |
64
- | 1.7280 | 2.9050 | 19 |
65
- | 1.6480 | 2.9312 | 20 |
66
- | 1.5664 | 3.0067 | 21 |
67
- | 1.4709 | 3.0329 | 22 |
68
- | 1.4106 | 3.0626 | 23 |
69
- | 1.3306 | 3.1512 | 24 |
70
- | 1.2525 | 3.1912 | 25 |
71
- | 1.1883 | 3.2798 | 26 |
72
- | 1.1302 | 3.3261 | 27 |
73
- | 1.0607 | 3.4132 | 28 |
74
- | 1.0138 | 3.4018 | 29 |
75
- | 0.9581 | 3.4898 | 30 |
76
- | 0.9053 | 3.6052 | 31 |
77
- | 0.8553 | 3.6480 | 32 |
78
- | 0.8045 | 3.7776 | 33 |
79
- | 0.7669 | 3.7579 | 34 |
80
- | 0.7209 | 3.7751 | 35 |
81
- | 0.6860 | 3.9205 | 36 |
82
- | 0.6473 | 4.0297 | 37 |
83
- | 0.6129 | 4.0663 | 38 |
84
- | 0.5853 | 4.0667 | 39 |
85
- | 0.5518 | 4.2401 | 40 |
86
- | 0.5205 | 4.2675 | 41 |
87
- | 0.4964 | 4.2551 | 42 |
88
- | 0.4765 | 4.3178 | 43 |
89
- | 0.4589 | 4.4624 | 44 |
90
- | 0.4319 | 4.4997 | 45 |
91
- | 0.4107 | 4.5586 | 46 |
92
- | 0.3886 | 4.6677 | 47 |
93
- | 0.3755 | 4.6753 | 48 |
94
- | 0.3536 | 4.7340 | 49 |
95
- | 0.3382 | 4.8393 | 50 |
96
- | 0.3225 | 4.7817 | 51 |
97
- | 0.3140 | 4.8783 | 52 |
98
- | 0.2949 | 4.9444 | 53 |
99
- | 0.2853 | 5.0210 | 54 |
100
- | 0.2739 | 4.9796 | 55 |
101
- | 0.2646 | 5.0427 | 56 |
102
- | 0.2492 | 5.0848 | 57 |
103
- | 0.2408 | 5.2522 | 58 |
104
- | 0.2334 | 5.2251 | 59 |
105
- | 0.2233 | 5.3535 | 60 |
106
- | 0.2110 | 5.3478 | 61 |
107
- | 0.2097 | 5.2551 | 62 |
108
- | 0.2003 | 5.3240 | 63 |
109
- | 0.1914 | 5.5138 | 64 |
110
- | 0.1863 | 5.4430 | 65 |
111
- | 0.1796 | 5.4543 | 66 |
112
- | 0.1755 | 5.5029 | 67 |
113
- | 0.1673 | 5.4727 | 68 |
114
- | 0.1587 | 5.5600 | 69 |
115
- | 0.1569 | 5.5672 | 70 |
116
- | 0.1508 | 5.7395 | 71 |
117
- | 0.1500 | 5.6063 | 72 |
118
 
119
 
120
  ### Framework versions
121
 
122
- - Transformers 4.29.2
123
  - TensorFlow 2.12.0
124
  - Datasets 2.12.0
125
  - Tokenizers 0.13.3
 
14
 
15
  This model is a fine-tuned version of [google/mt5-small](https://huggingface.co/google/mt5-small) on an unknown dataset.
16
  It achieves the following results on the evaluation set:
17
+ - Train Loss: 5.6636
18
+ - Validation Loss: 2.9818
19
+ - Epoch: 0
20
 
21
  ## Model description
22
 
 
35
  ### Training hyperparameters
36
 
37
  The following hyperparameters were used during training:
38
+ - optimizer: {'name': 'AdamWeightDecay', 'learning_rate': {'class_name': 'WarmUp', 'config': {'initial_learning_rate': 0.0003, 'decay_schedule_fn': {'class_name': 'PolynomialDecay', 'config': {'initial_learning_rate': 0.0003, 'decay_steps': 19900, 'end_learning_rate': 0.0, 'power': 1.0, 'cycle': False, 'name': None}, '__passive_serialization__': True}, 'warmup_steps': 100, 'power': 1.0, 'name': None}}, 'decay': 0.0, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-08, 'amsgrad': False, 'weight_decay_rate': 0.001}
39
  - training_precision: mixed_float16
40
 
41
  ### Training results
42
 
43
  | Train Loss | Validation Loss | Epoch |
44
  |:----------:|:---------------:|:-----:|
45
+ | 5.6636 | 2.9818 | 0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
46
 
47
 
48
  ### Framework versions
49
 
50
+ - Transformers 4.30.2
51
  - TensorFlow 2.12.0
52
  - Datasets 2.12.0
53
  - Tokenizers 0.13.3
config.json CHANGED
@@ -28,7 +28,7 @@
28
  "relative_attention_num_buckets": 32,
29
  "tie_word_embeddings": false,
30
  "tokenizer_class": "T5Tokenizer",
31
- "transformers_version": "4.29.2",
32
  "use_cache": true,
33
  "vocab_size": 250112
34
  }
 
28
  "relative_attention_num_buckets": 32,
29
  "tie_word_embeddings": false,
30
  "tokenizer_class": "T5Tokenizer",
31
+ "transformers_version": "4.30.2",
32
  "use_cache": true,
33
  "vocab_size": 250112
34
  }
generation_config.json CHANGED
@@ -3,5 +3,5 @@
3
  "decoder_start_token_id": 0,
4
  "eos_token_id": 1,
5
  "pad_token_id": 0,
6
- "transformers_version": "4.29.2"
7
  }
 
3
  "decoder_start_token_id": 0,
4
  "eos_token_id": 1,
5
  "pad_token_id": 0,
6
+ "transformers_version": "4.30.2"
7
  }
tf_model.h5 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:d33b726aa4bc201fe641c601cf53e2c3ea7f6f2f079af891380fade03fb062fe
3
  size 2225556280
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f10143b692b9945dbdae6886abbdad14c3d14b7b666ec50d742931a545f56610
3
  size 2225556280