carlosep93
commited on
Commit
•
d408e65
1
Parent(s):
edd2fde
Update README.md
Browse files
README.md
CHANGED
@@ -62,30 +62,30 @@ print(tokenizer.detokenize(translated[0][0]['tokens']))
|
|
62 |
|
63 |
The was trained on a combination of the following datasets:
|
64 |
|
65 |
-
| Dataset | Sentences |
|
66 |
-
|
67 |
-
| Global Voices | 21.342 |
|
68 |
-
| Memories Lluires | 1.173.055 |
|
69 |
-
| Wikimatrix | 1.205.908 |
|
70 |
-
| TED Talks | 50.979 |
|
71 |
-
| Tatoeba | 5.500 |
|
72 |
-
| CoVost 2 ca-en | 79.633 |
|
73 |
-
| CoVost 2 en-ca | 263.891 |
|
74 |
-
| Europarl | 1.965.734 |
|
75 |
-
| jw300 | 97.081 |
|
76 |
-
| Crawled Generalitat| 38.595 |
|
77 |
-
| Opus Books | 4.580 |
|
78 |
-
| CC Aligned | 5.787.682 |
|
79 |
-
| COVID_Wikipedia | 1.531 |
|
80 |
-
| EuroBooks | 3.746 |
|
81 |
-
| Gnome | 2.183 |
|
82 |
-
| KDE 4 | 144.153 |
|
83 |
-
| OpenSubtitles | 427.913 |
|
84 |
-
| QED | 69.823 |
|
85 |
-
| Ubuntu | 6.781 |
|
86 |
-
| Wikimedia | 208.073 |
|
87 |
-
|
88 |
-
| **Total** | **11.558.183** |
|
89 |
|
90 |
### Training procedure
|
91 |
|
@@ -103,26 +103,26 @@ The was trained on a combination of the following datasets:
|
|
103 |
The model is based on the Transformer-XLarge proposed by [Subramanian et al.](https://aclanthology.org/2021.wmt-1.18.pdf)
|
104 |
The following hyperparamenters were set on the Fairseq toolkit:
|
105 |
|
106 |
-
| Hyperparameter | Value
|
107 |
-
|
108 |
-
| Architecture |
|
109 |
-
| Embedding size | 1024
|
110 |
-
| Feedforward size | 4096
|
111 |
-
| Number of heads | 16
|
112 |
-
| Encoder layers | 24
|
113 |
-
| Decoder layers | 6
|
114 |
-
| Normalize before attention | True
|
115 |
-
| --share-decoder-input-output-embed | True
|
116 |
-
| --share-all-embeddings | True
|
117 |
-
| Effective batch size | 96.000
|
118 |
-
| Optimizer | adam
|
119 |
-
| Adam betas | (0.9, 0.980)
|
120 |
-
| Clip norm | 0.0
|
121 |
-
| Learning rate | 1e-3
|
122 |
-
| Lr. schedurer | inverse sqrt
|
123 |
-
| Warmup updates | 4000
|
124 |
-
| Dropout | 0.1
|
125 |
-
| Label smoothing | 0.1
|
126 |
|
127 |
The model was trained for a total of 35.000 updates. Weights were saved every 1000 updates and reported results are the average of the last 16 checkpoints.
|
128 |
|
@@ -139,17 +139,16 @@ Below are the evaluation results on the machine translation from Catalan to Engl
|
|
139 |
|
140 |
| Test set | SoftCatalà | Google Translate | mt-aina-ca-en |
|
141 |
|----------------------|------------|------------------|---------------|
|
142 |
-
| Spanish Constitution |
|
143 |
-
| United Nations |
|
144 |
-
| aina_aapp |
|
145 |
-
|
|
146 |
-
| Flores 101
|
147 |
-
|
|
148 |
-
|
|
149 |
-
| wmt
|
150 |
-
| wmt 13 news | | 39,8 | 39,3 |
|
151 |
|----------------------|------------|------------------|---------------|
|
152 |
-
| Average |
|
153 |
|
154 |
|
155 |
## Additional information
|
|
|
62 |
|
63 |
The was trained on a combination of the following datasets:
|
64 |
|
65 |
+
| Dataset | Sentences |
|
66 |
+
|--------------------|----------------|
|
67 |
+
| Global Voices | 21.342 |
|
68 |
+
| Memories Lluires | 1.173.055 |
|
69 |
+
| Wikimatrix | 1.205.908 |
|
70 |
+
| TED Talks | 50.979 |
|
71 |
+
| Tatoeba | 5.500 |
|
72 |
+
| CoVost 2 ca-en | 79.633 |
|
73 |
+
| CoVost 2 en-ca | 263.891 |
|
74 |
+
| Europarl | 1.965.734 |
|
75 |
+
| jw300 | 97.081 |
|
76 |
+
| Crawled Generalitat| 38.595 |
|
77 |
+
| Opus Books | 4.580 |
|
78 |
+
| CC Aligned | 5.787.682 |
|
79 |
+
| COVID_Wikipedia | 1.531 |
|
80 |
+
| EuroBooks | 3.746 |
|
81 |
+
| Gnome | 2.183 |
|
82 |
+
| KDE 4 | 144.153 |
|
83 |
+
| OpenSubtitles | 427.913 |
|
84 |
+
| QED | 69.823 |
|
85 |
+
| Ubuntu | 6.781 |
|
86 |
+
| Wikimedia | 208.073 |
|
87 |
+
|--------------------|----------------|
|
88 |
+
| **Total** | **11.558.183** |
|
89 |
|
90 |
### Training procedure
|
91 |
|
|
|
103 |
The model is based on the Transformer-XLarge proposed by [Subramanian et al.](https://aclanthology.org/2021.wmt-1.18.pdf)
|
104 |
The following hyperparamenters were set on the Fairseq toolkit:
|
105 |
|
106 |
+
| Hyperparameter | Value |
|
107 |
+
|------------------------------------|-----------------------------------|
|
108 |
+
| Architecture | transformer_vaswani_wmt_en_de_big |
|
109 |
+
| Embedding size | 1024 |
|
110 |
+
| Feedforward size | 4096 |
|
111 |
+
| Number of heads | 16 |
|
112 |
+
| Encoder layers | 24 |
|
113 |
+
| Decoder layers | 6 |
|
114 |
+
| Normalize before attention | True |
|
115 |
+
| --share-decoder-input-output-embed | True |
|
116 |
+
| --share-all-embeddings | True |
|
117 |
+
| Effective batch size | 96.000 |
|
118 |
+
| Optimizer | adam |
|
119 |
+
| Adam betas | (0.9, 0.980) |
|
120 |
+
| Clip norm | 0.0 |
|
121 |
+
| Learning rate | 1e-3 |
|
122 |
+
| Lr. schedurer | inverse sqrt |
|
123 |
+
| Warmup updates | 4000 |
|
124 |
+
| Dropout | 0.1 |
|
125 |
+
| Label smoothing | 0.1 |
|
126 |
|
127 |
The model was trained for a total of 35.000 updates. Weights were saved every 1000 updates and reported results are the average of the last 16 checkpoints.
|
128 |
|
|
|
139 |
|
140 |
| Test set | SoftCatalà | Google Translate | mt-aina-ca-en |
|
141 |
|----------------------|------------|------------------|---------------|
|
142 |
+
| Spanish Constitution | 35,8 | 43,2 | 40,3 |
|
143 |
+
| United Nations | 44,4 | 47,4 | 44,8 |
|
144 |
+
| aina_aapp | 48,8 | 53 | 51,5 |
|
145 |
+
| Flores 101 dev | 42,7 | 47,5 | 46,1 |
|
146 |
+
| Flores 101 devtest | 42,5 | 46,9 | 45,2 |
|
147 |
+
| Cybersecurity | 52,5 | 58 | 54,2 |
|
148 |
+
| wmt 19 biomedical | 18,3 | 23,4 | 21,6 |
|
149 |
+
| wmt 13 news | 37,8 | 39,8 | 39,3 |
|
|
|
150 |
|----------------------|------------|------------------|---------------|
|
151 |
+
| Average | 39,2 | 45,0 | 41,6 |
|
152 |
|
153 |
|
154 |
## Additional information
|