bri25yu commited on
Commit
c5953a4
1 Parent(s): 1da2cfb

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -0
README.md ADDED
@@ -0,0 +1,176 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - generated_from_trainer
5
+ datasets:
6
+ - wmt19
7
+ metrics:
8
+ - bleu
9
+ model-index:
10
+ - name: wmt19-ende-t5-small
11
+ results:
12
+ - task:
13
+ name: Sequence-to-sequence Language Modeling
14
+ type: text2text-generation
15
+ dataset:
16
+ name: wmt19
17
+ type: wmt19
18
+ config: de-en
19
+ split: validation
20
+ args: de-en
21
+ metrics:
22
+ - name: Bleu
23
+ type: bleu
24
+ value: 16.085214160195623
25
+ ---
26
+
27
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
+ should probably proofread and complete it, then remove this comment. -->
29
+
30
+ # wmt19-ende-t5-small
31
+
32
+ This model is a fine-tuned version of [t5-small](https://huggingface.co/t5-small) on the wmt19 dataset.
33
+ It achieves the following results on the evaluation set:
34
+ - Loss: 1.5150
35
+ - Bleu: 16.0852
36
+ - Brevity Penalty: 0.5512
37
+
38
+ ## Model description
39
+
40
+ More information needed
41
+
42
+ ## Intended uses & limitations
43
+
44
+ More information needed
45
+
46
+ ## Training and evaluation data
47
+
48
+ More information needed
49
+
50
+ ## Training procedure
51
+
52
+ ### Training hyperparameters
53
+
54
+ The following hyperparameters were used during training:
55
+ - learning_rate: 0.0001
56
+ - train_batch_size: 256
57
+ - eval_batch_size: 512
58
+ - seed: 42
59
+ - gradient_accumulation_steps: 2
60
+ - total_train_batch_size: 512
61
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
62
+ - lr_scheduler_type: constant
63
+ - training_steps: 10000
64
+
65
+ ### Training results
66
+
67
+ | Training Loss | Epoch | Step | Validation Loss | Bleu | Brevity Penalty |
68
+ |:-------------:|:-----:|:-----:|:---------------:|:-------:|:---------------:|
69
+ | 2.7369 | 0.01 | 100 | 2.0018 | 9.0851 | 0.5107 |
70
+ | 3.3896 | 0.02 | 200 | 1.9108 | 9.9970 | 0.5127 |
71
+ | 3.0442 | 0.03 | 300 | 1.8627 | 10.7670 | 0.5245 |
72
+ | 2.5136 | 0.04 | 400 | 1.8244 | 10.9280 | 0.5132 |
73
+ | 2.4092 | 0.05 | 500 | 1.7951 | 11.4717 | 0.5260 |
74
+ | 3.2441 | 0.06 | 600 | 1.7736 | 11.7350 | 0.5197 |
75
+ | 2.6997 | 0.07 | 700 | 1.7563 | 12.0741 | 0.5260 |
76
+ | 2.5072 | 0.08 | 800 | 1.7416 | 12.3735 | 0.5283 |
77
+ | 2.3788 | 0.09 | 900 | 1.7267 | 12.4288 | 0.5285 |
78
+ | 2.3533 | 0.1 | 1000 | 1.7247 | 12.4395 | 0.5249 |
79
+ | 2.2911 | 0.11 | 1100 | 1.7078 | 12.3887 | 0.5201 |
80
+ | 2.3949 | 0.12 | 1200 | 1.6997 | 12.8109 | 0.5288 |
81
+ | 2.2343 | 0.13 | 1300 | 1.6930 | 12.8213 | 0.5283 |
82
+ | 2.2525 | 0.14 | 1400 | 1.6851 | 13.1221 | 0.5285 |
83
+ | 2.2604 | 0.15 | 1500 | 1.6795 | 13.0896 | 0.5261 |
84
+ | 2.3146 | 0.16 | 1600 | 1.6723 | 13.1741 | 0.5291 |
85
+ | 2.5767 | 0.17 | 1700 | 1.6596 | 13.4224 | 0.5248 |
86
+ | 2.698 | 0.18 | 1800 | 1.6576 | 13.6733 | 0.5334 |
87
+ | 2.6416 | 0.19 | 1900 | 1.6514 | 13.7184 | 0.5350 |
88
+ | 3.0841 | 0.2 | 2000 | 1.6448 | 13.9079 | 0.5357 |
89
+ | 2.5039 | 0.21 | 2100 | 1.6375 | 13.9860 | 0.5361 |
90
+ | 2.5829 | 0.22 | 2200 | 1.6366 | 13.9246 | 0.5328 |
91
+ | 2.5332 | 0.23 | 2300 | 1.6348 | 13.4895 | 0.5209 |
92
+ | 2.5832 | 0.24 | 2400 | 1.6240 | 14.0445 | 0.5349 |
93
+ | 2.8577 | 0.25 | 2500 | 1.6182 | 14.1085 | 0.5344 |
94
+ | 2.9157 | 0.26 | 2600 | 1.6285 | 13.7982 | 0.5365 |
95
+ | 2.6758 | 0.27 | 2700 | 1.6249 | 13.8638 | 0.5392 |
96
+ | 2.0391 | 0.28 | 2800 | 1.6205 | 13.9645 | 0.5396 |
97
+ | 2.8146 | 0.29 | 2900 | 1.6210 | 14.2823 | 0.5409 |
98
+ | 2.6602 | 0.3 | 3000 | 1.6219 | 13.9663 | 0.5391 |
99
+ | 1.7745 | 0.31 | 3100 | 1.6088 | 14.4206 | 0.5413 |
100
+ | 2.3483 | 0.32 | 3200 | 1.6050 | 14.6208 | 0.5471 |
101
+ | 1.9911 | 0.33 | 3300 | 1.6004 | 14.5458 | 0.5396 |
102
+ | 1.8973 | 0.34 | 3400 | 1.5985 | 14.5387 | 0.5400 |
103
+ | 2.6956 | 0.35 | 3500 | 1.6005 | 14.7482 | 0.5458 |
104
+ | 2.322 | 0.36 | 3600 | 1.5949 | 14.7322 | 0.5448 |
105
+ | 1.5147 | 0.37 | 3700 | 1.5966 | 14.8456 | 0.5431 |
106
+ | 2.0606 | 0.38 | 3800 | 1.5899 | 14.6267 | 0.5333 |
107
+ | 3.0341 | 0.39 | 3900 | 1.5842 | 14.7705 | 0.5414 |
108
+ | 1.5069 | 0.4 | 4000 | 1.5911 | 14.6861 | 0.5372 |
109
+ | 2.339 | 0.41 | 4100 | 1.5949 | 14.6970 | 0.5481 |
110
+ | 2.5221 | 0.42 | 4200 | 1.5870 | 14.6996 | 0.5403 |
111
+ | 1.6398 | 0.43 | 4300 | 1.5790 | 14.8826 | 0.5431 |
112
+ | 2.2758 | 0.44 | 4400 | 1.5818 | 14.5580 | 0.5375 |
113
+ | 2.2622 | 0.45 | 4500 | 1.5821 | 15.0062 | 0.5428 |
114
+ | 1.3329 | 0.46 | 4600 | 1.5792 | 14.7609 | 0.5377 |
115
+ | 1.7537 | 0.47 | 4700 | 1.5744 | 15.1037 | 0.5425 |
116
+ | 2.5379 | 0.48 | 4800 | 1.5756 | 15.2684 | 0.5479 |
117
+ | 2.1236 | 0.49 | 4900 | 1.5822 | 14.8229 | 0.5478 |
118
+ | 2.9621 | 0.5 | 5000 | 1.5747 | 14.9948 | 0.5443 |
119
+ | 1.9832 | 0.51 | 5100 | 1.5838 | 14.8682 | 0.5468 |
120
+ | 1.4962 | 0.52 | 5200 | 1.5836 | 14.8094 | 0.5397 |
121
+ | 2.4318 | 0.53 | 5300 | 1.5826 | 14.8213 | 0.5422 |
122
+ | 1.9338 | 0.54 | 5400 | 1.5869 | 14.5571 | 0.5402 |
123
+ | 1.404 | 0.55 | 5500 | 1.5891 | 14.5103 | 0.5414 |
124
+ | 2.2803 | 0.56 | 5600 | 1.5864 | 14.6338 | 0.5417 |
125
+ | 2.3725 | 0.57 | 5700 | 1.5893 | 14.3405 | 0.5385 |
126
+ | 1.1436 | 0.58 | 5800 | 1.5703 | 15.3309 | 0.5457 |
127
+ | 2.1695 | 0.59 | 5900 | 1.5690 | 15.3571 | 0.5438 |
128
+ | 1.7295 | 0.6 | 6000 | 1.5653 | 15.3547 | 0.5421 |
129
+ | 1.3033 | 0.61 | 6100 | 1.5649 | 15.3084 | 0.5442 |
130
+ | 2.396 | 0.62 | 6200 | 1.5592 | 15.5594 | 0.5440 |
131
+ | 2.133 | 0.63 | 6300 | 1.5634 | 15.3689 | 0.5420 |
132
+ | 1.1775 | 0.64 | 6400 | 1.5639 | 15.4869 | 0.5389 |
133
+ | 2.0793 | 0.65 | 6500 | 1.5541 | 15.6320 | 0.5453 |
134
+ | 1.7569 | 0.66 | 6600 | 1.5588 | 15.7405 | 0.5429 |
135
+ | 1.1035 | 0.67 | 6700 | 1.5520 | 15.7011 | 0.5450 |
136
+ | 1.5799 | 0.68 | 6800 | 1.5517 | 15.9203 | 0.5490 |
137
+ | 1.7737 | 0.69 | 6900 | 1.5473 | 15.8992 | 0.5480 |
138
+ | 1.3071 | 0.7 | 7000 | 1.5491 | 15.7140 | 0.5446 |
139
+ | 2.2214 | 0.71 | 7100 | 1.5460 | 15.9360 | 0.5479 |
140
+ | 1.7848 | 0.72 | 7200 | 1.5431 | 15.9338 | 0.5490 |
141
+ | 1.1231 | 0.73 | 7300 | 1.5398 | 15.8774 | 0.5444 |
142
+ | 1.7741 | 0.74 | 7400 | 1.5399 | 15.9724 | 0.5451 |
143
+ | 1.7098 | 0.75 | 7500 | 1.5361 | 15.9098 | 0.5447 |
144
+ | 1.0787 | 0.76 | 7600 | 1.5393 | 15.9781 | 0.5457 |
145
+ | 1.9856 | 0.77 | 7700 | 1.5348 | 15.9521 | 0.5462 |
146
+ | 2.1294 | 0.78 | 7800 | 1.5345 | 16.0042 | 0.5463 |
147
+ | 1.1938 | 0.79 | 7900 | 1.5314 | 16.0554 | 0.5495 |
148
+ | 1.9579 | 0.8 | 8000 | 1.5307 | 15.9349 | 0.5482 |
149
+ | 1.844 | 0.81 | 8100 | 1.5285 | 15.8589 | 0.5448 |
150
+ | 1.1464 | 0.82 | 8200 | 1.5413 | 15.9210 | 0.5435 |
151
+ | 2.2903 | 0.83 | 8300 | 1.5230 | 16.0164 | 0.5405 |
152
+ | 2.1489 | 0.84 | 8400 | 1.5263 | 15.9423 | 0.5443 |
153
+ | 1.8138 | 0.85 | 8500 | 1.5350 | 15.8267 | 0.5464 |
154
+ | 2.4025 | 0.86 | 8600 | 1.5275 | 15.8493 | 0.5430 |
155
+ | 1.6758 | 0.87 | 8700 | 1.5206 | 15.9246 | 0.5464 |
156
+ | 1.3671 | 0.88 | 8800 | 1.5235 | 15.9662 | 0.5460 |
157
+ | 2.3341 | 0.89 | 8900 | 1.5221 | 16.0465 | 0.5456 |
158
+ | 1.8405 | 0.9 | 9000 | 1.5201 | 16.0834 | 0.5454 |
159
+ | 1.4133 | 0.91 | 9100 | 1.5250 | 15.8619 | 0.5442 |
160
+ | 2.4374 | 0.92 | 9200 | 1.5261 | 15.8174 | 0.5429 |
161
+ | 1.3627 | 0.93 | 9300 | 1.5257 | 15.7541 | 0.5450 |
162
+ | 1.5003 | 0.94 | 9400 | 1.5249 | 15.9109 | 0.5463 |
163
+ | 2.2002 | 0.95 | 9500 | 1.5252 | 15.8338 | 0.5434 |
164
+ | 2.3461 | 0.96 | 9600 | 1.5262 | 15.9195 | 0.5469 |
165
+ | 1.2607 | 0.97 | 9700 | 1.5197 | 15.8370 | 0.5459 |
166
+ | 2.3737 | 0.98 | 9800 | 1.5178 | 16.0579 | 0.5475 |
167
+ | 1.3968 | 0.99 | 9900 | 1.5132 | 16.1729 | 0.5522 |
168
+ | 1.1816 | 1.0 | 10000 | 1.5150 | 16.0852 | 0.5512 |
169
+
170
+
171
+ ### Framework versions
172
+
173
+ - Transformers 4.30.2
174
+ - Pytorch 2.0.1+cu118
175
+ - Datasets 2.12.0
176
+ - Tokenizers 0.13.3