sayanmandal
commited on
Commit
•
df13e9d
1
Parent(s):
c220c25
update model card README.md
Browse files
README.md
ADDED
@@ -0,0 +1,175 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- translation
|
4 |
+
- generated_from_trainer
|
5 |
+
datasets:
|
6 |
+
- cmu_hinglish_do_g
|
7 |
+
metrics:
|
8 |
+
- bleu
|
9 |
+
model-index:
|
10 |
+
- name: t5-small_6_3-hi_en-to-en
|
11 |
+
results:
|
12 |
+
- task:
|
13 |
+
name: Sequence-to-sequence Language Modeling
|
14 |
+
type: text2text-generation
|
15 |
+
dataset:
|
16 |
+
name: cmu_hinglish_do_g
|
17 |
+
type: cmu_hinglish_do_g
|
18 |
+
args: hi_en-en
|
19 |
+
metrics:
|
20 |
+
- name: Bleu
|
21 |
+
type: bleu
|
22 |
+
value: 18.0863
|
23 |
+
---
|
24 |
+
|
25 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
26 |
+
should probably proofread and complete it, then remove this comment. -->
|
27 |
+
|
28 |
+
# t5-small_6_3-hi_en-to-en
|
29 |
+
|
30 |
+
This model was trained from scratch on the cmu_hinglish_do_g dataset.
|
31 |
+
It achieves the following results on the evaluation set:
|
32 |
+
- Loss: 2.3662
|
33 |
+
- Bleu: 18.0863
|
34 |
+
- Gen Len: 15.2708
|
35 |
+
|
36 |
+
## Model description
|
37 |
+
|
38 |
+
More information needed
|
39 |
+
|
40 |
+
## Intended uses & limitations
|
41 |
+
|
42 |
+
More information needed
|
43 |
+
|
44 |
+
## Training and evaluation data
|
45 |
+
|
46 |
+
More information needed
|
47 |
+
|
48 |
+
## Training procedure
|
49 |
+
|
50 |
+
### Training hyperparameters
|
51 |
+
|
52 |
+
The following hyperparameters were used during training:
|
53 |
+
- learning_rate: 5e-05
|
54 |
+
- train_batch_size: 32
|
55 |
+
- eval_batch_size: 32
|
56 |
+
- seed: 42
|
57 |
+
- gradient_accumulation_steps: 2
|
58 |
+
- total_train_batch_size: 64
|
59 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
60 |
+
- lr_scheduler_type: linear
|
61 |
+
- num_epochs: 100
|
62 |
+
- mixed_precision_training: Native AMP
|
63 |
+
|
64 |
+
### Training results
|
65 |
+
|
66 |
+
| Training Loss | Epoch | Step | Validation Loss | Bleu | Gen Len |
|
67 |
+
|:-------------:|:-----:|:-----:|:---------------:|:-------:|:-------:|
|
68 |
+
| No log | 1.0 | 126 | 3.0601 | 4.7146 | 11.9904 |
|
69 |
+
| No log | 2.0 | 252 | 2.8885 | 5.9584 | 12.3418 |
|
70 |
+
| No log | 3.0 | 378 | 2.7914 | 6.649 | 12.3758 |
|
71 |
+
| 3.4671 | 4.0 | 504 | 2.7347 | 7.3305 | 12.3854 |
|
72 |
+
| 3.4671 | 5.0 | 630 | 2.6832 | 8.3132 | 12.4268 |
|
73 |
+
| 3.4671 | 6.0 | 756 | 2.6485 | 8.339 | 12.3641 |
|
74 |
+
| 3.4671 | 7.0 | 882 | 2.6096 | 8.7269 | 12.414 |
|
75 |
+
| 3.0208 | 8.0 | 1008 | 2.5814 | 9.2163 | 12.2675 |
|
76 |
+
| 3.0208 | 9.0 | 1134 | 2.5542 | 9.448 | 12.3875 |
|
77 |
+
| 3.0208 | 10.0 | 1260 | 2.5339 | 9.9011 | 12.4321 |
|
78 |
+
| 3.0208 | 11.0 | 1386 | 2.5043 | 9.7529 | 12.5149 |
|
79 |
+
| 2.834 | 12.0 | 1512 | 2.4848 | 9.9606 | 12.4193 |
|
80 |
+
| 2.834 | 13.0 | 1638 | 2.4737 | 9.9368 | 12.3673 |
|
81 |
+
| 2.834 | 14.0 | 1764 | 2.4458 | 10.3182 | 12.4352 |
|
82 |
+
| 2.834 | 15.0 | 1890 | 2.4332 | 10.486 | 12.4671 |
|
83 |
+
| 2.7065 | 16.0 | 2016 | 2.4239 | 10.6921 | 12.414 |
|
84 |
+
| 2.7065 | 17.0 | 2142 | 2.4064 | 10.7426 | 12.4607 |
|
85 |
+
| 2.7065 | 18.0 | 2268 | 2.3941 | 11.0509 | 12.4087 |
|
86 |
+
| 2.7065 | 19.0 | 2394 | 2.3826 | 11.2407 | 12.3386 |
|
87 |
+
| 2.603 | 20.0 | 2520 | 2.3658 | 11.3711 | 12.3992 |
|
88 |
+
| 2.603 | 21.0 | 2646 | 2.3537 | 11.42 | 12.5032 |
|
89 |
+
| 2.603 | 22.0 | 2772 | 2.3475 | 12.0665 | 12.5074 |
|
90 |
+
| 2.603 | 23.0 | 2898 | 2.3398 | 12.0343 | 12.4342 |
|
91 |
+
| 2.5192 | 24.0 | 3024 | 2.3298 | 12.1011 | 12.5096 |
|
92 |
+
| 2.5192 | 25.0 | 3150 | 2.3216 | 12.2562 | 12.4809 |
|
93 |
+
| 2.5192 | 26.0 | 3276 | 2.3131 | 12.4585 | 12.4427 |
|
94 |
+
| 2.5192 | 27.0 | 3402 | 2.3052 | 12.7094 | 12.534 |
|
95 |
+
| 2.4445 | 28.0 | 3528 | 2.2984 | 12.7432 | 12.5053 |
|
96 |
+
| 2.4445 | 29.0 | 3654 | 2.2920 | 12.8409 | 12.4501 |
|
97 |
+
| 2.4445 | 30.0 | 3780 | 2.2869 | 12.6365 | 12.4936 |
|
98 |
+
| 2.4445 | 31.0 | 3906 | 2.2777 | 12.8523 | 12.5234 |
|
99 |
+
| 2.3844 | 32.0 | 4032 | 2.2788 | 12.9216 | 12.4204 |
|
100 |
+
| 2.3844 | 33.0 | 4158 | 2.2710 | 12.9568 | 12.5064 |
|
101 |
+
| 2.3844 | 34.0 | 4284 | 2.2643 | 12.9641 | 12.4299 |
|
102 |
+
| 2.3844 | 35.0 | 4410 | 2.2621 | 12.9787 | 12.448 |
|
103 |
+
| 2.3282 | 36.0 | 4536 | 2.2554 | 13.1264 | 12.4374 |
|
104 |
+
| 2.3282 | 37.0 | 4662 | 2.2481 | 13.1853 | 12.4416 |
|
105 |
+
| 2.3282 | 38.0 | 4788 | 2.2477 | 13.3259 | 12.4119 |
|
106 |
+
| 2.3282 | 39.0 | 4914 | 2.2448 | 13.2017 | 12.4278 |
|
107 |
+
| 2.2842 | 40.0 | 5040 | 2.2402 | 13.3772 | 12.4437 |
|
108 |
+
| 2.2842 | 41.0 | 5166 | 2.2373 | 13.2184 | 12.414 |
|
109 |
+
| 2.2842 | 42.0 | 5292 | 2.2357 | 13.5267 | 12.4342 |
|
110 |
+
| 2.2842 | 43.0 | 5418 | 2.2310 | 13.5754 | 12.4087 |
|
111 |
+
| 2.2388 | 44.0 | 5544 | 2.2244 | 13.653 | 12.4427 |
|
112 |
+
| 2.2388 | 45.0 | 5670 | 2.2243 | 13.6028 | 12.431 |
|
113 |
+
| 2.2388 | 46.0 | 5796 | 2.2216 | 13.7128 | 12.4151 |
|
114 |
+
| 2.2388 | 47.0 | 5922 | 2.2231 | 13.749 | 12.4172 |
|
115 |
+
| 2.2067 | 48.0 | 6048 | 2.2196 | 13.7256 | 12.4034 |
|
116 |
+
| 2.2067 | 49.0 | 6174 | 2.2125 | 13.8237 | 12.396 |
|
117 |
+
| 2.2067 | 50.0 | 6300 | 2.2131 | 13.6642 | 12.4416 |
|
118 |
+
| 2.2067 | 51.0 | 6426 | 2.2115 | 13.8876 | 12.4119 |
|
119 |
+
| 2.1688 | 52.0 | 6552 | 2.2091 | 14.0323 | 12.4639 |
|
120 |
+
| 2.1688 | 53.0 | 6678 | 2.2082 | 13.916 | 12.3843 |
|
121 |
+
| 2.1688 | 54.0 | 6804 | 2.2071 | 13.924 | 12.3758 |
|
122 |
+
| 2.1688 | 55.0 | 6930 | 2.2046 | 13.9563 | 12.4416 |
|
123 |
+
| 2.1401 | 56.0 | 7056 | 2.2020 | 14.0592 | 12.483 |
|
124 |
+
| 2.1401 | 57.0 | 7182 | 2.2047 | 13.8879 | 12.4076 |
|
125 |
+
| 2.1401 | 58.0 | 7308 | 2.2018 | 13.9267 | 12.3949 |
|
126 |
+
| 2.1401 | 59.0 | 7434 | 2.1964 | 14.0518 | 12.4363 |
|
127 |
+
| 2.1092 | 60.0 | 7560 | 2.1926 | 14.1518 | 12.4883 |
|
128 |
+
| 2.1092 | 61.0 | 7686 | 2.1972 | 14.132 | 12.4034 |
|
129 |
+
| 2.1092 | 62.0 | 7812 | 2.1939 | 14.2066 | 12.4151 |
|
130 |
+
| 2.1092 | 63.0 | 7938 | 2.1905 | 14.2923 | 12.4459 |
|
131 |
+
| 2.0932 | 64.0 | 8064 | 2.1932 | 14.2476 | 12.3418 |
|
132 |
+
| 2.0932 | 65.0 | 8190 | 2.1925 | 14.2057 | 12.3907 |
|
133 |
+
| 2.0932 | 66.0 | 8316 | 2.1906 | 14.2978 | 12.4055 |
|
134 |
+
| 2.0932 | 67.0 | 8442 | 2.1903 | 14.3276 | 12.4427 |
|
135 |
+
| 2.0706 | 68.0 | 8568 | 2.1918 | 14.4681 | 12.4034 |
|
136 |
+
| 2.0706 | 69.0 | 8694 | 2.1882 | 14.3751 | 12.4225 |
|
137 |
+
| 2.0706 | 70.0 | 8820 | 2.1870 | 14.5904 | 12.4204 |
|
138 |
+
| 2.0706 | 71.0 | 8946 | 2.1865 | 14.6409 | 12.4512 |
|
139 |
+
| 2.0517 | 72.0 | 9072 | 2.1831 | 14.6505 | 12.4352 |
|
140 |
+
| 2.0517 | 73.0 | 9198 | 2.1835 | 14.7485 | 12.4363 |
|
141 |
+
| 2.0517 | 74.0 | 9324 | 2.1824 | 14.7344 | 12.4586 |
|
142 |
+
| 2.0517 | 75.0 | 9450 | 2.1829 | 14.8097 | 12.4575 |
|
143 |
+
| 2.0388 | 76.0 | 9576 | 2.1822 | 14.6681 | 12.4108 |
|
144 |
+
| 2.0388 | 77.0 | 9702 | 2.1823 | 14.6421 | 12.4342 |
|
145 |
+
| 2.0388 | 78.0 | 9828 | 2.1816 | 14.7014 | 12.4459 |
|
146 |
+
| 2.0388 | 79.0 | 9954 | 2.1810 | 14.744 | 12.4565 |
|
147 |
+
| 2.0224 | 80.0 | 10080 | 2.1839 | 14.7889 | 12.4437 |
|
148 |
+
| 2.0224 | 81.0 | 10206 | 2.1793 | 14.802 | 12.4565 |
|
149 |
+
| 2.0224 | 82.0 | 10332 | 2.1776 | 14.7702 | 12.4214 |
|
150 |
+
| 2.0224 | 83.0 | 10458 | 2.1809 | 14.6772 | 12.4236 |
|
151 |
+
| 2.0115 | 84.0 | 10584 | 2.1786 | 14.709 | 12.4214 |
|
152 |
+
| 2.0115 | 85.0 | 10710 | 2.1805 | 14.7693 | 12.3981 |
|
153 |
+
| 2.0115 | 86.0 | 10836 | 2.1790 | 14.7628 | 12.4172 |
|
154 |
+
| 2.0115 | 87.0 | 10962 | 2.1785 | 14.7538 | 12.3992 |
|
155 |
+
| 2.0007 | 88.0 | 11088 | 2.1788 | 14.7493 | 12.3726 |
|
156 |
+
| 2.0007 | 89.0 | 11214 | 2.1788 | 14.8793 | 12.4045 |
|
157 |
+
| 2.0007 | 90.0 | 11340 | 2.1786 | 14.8318 | 12.3747 |
|
158 |
+
| 2.0007 | 91.0 | 11466 | 2.1769 | 14.8061 | 12.4013 |
|
159 |
+
| 1.9967 | 92.0 | 11592 | 2.1757 | 14.8108 | 12.3843 |
|
160 |
+
| 1.9967 | 93.0 | 11718 | 2.1747 | 14.8036 | 12.379 |
|
161 |
+
| 1.9967 | 94.0 | 11844 | 2.1764 | 14.7447 | 12.3737 |
|
162 |
+
| 1.9967 | 95.0 | 11970 | 2.1759 | 14.7759 | 12.3875 |
|
163 |
+
| 1.9924 | 96.0 | 12096 | 2.1760 | 14.7695 | 12.3875 |
|
164 |
+
| 1.9924 | 97.0 | 12222 | 2.1762 | 14.8022 | 12.3769 |
|
165 |
+
| 1.9924 | 98.0 | 12348 | 2.1763 | 14.7519 | 12.3822 |
|
166 |
+
| 1.9924 | 99.0 | 12474 | 2.1760 | 14.7756 | 12.3832 |
|
167 |
+
| 1.9903 | 100.0 | 12600 | 2.1761 | 14.7713 | 12.3822 |
|
168 |
+
|
169 |
+
|
170 |
+
### Framework versions
|
171 |
+
|
172 |
+
- Transformers 4.20.0.dev0
|
173 |
+
- Pytorch 1.8.0
|
174 |
+
- Datasets 2.1.0
|
175 |
+
- Tokenizers 0.12.1
|