projecte-aina
/

aina-translator-ca-en

Fairseq

Catalan

English

Model card Files Files and versions Community

carlosep93 commited on Nov 29, 2022

Commit

d408e65

•

1 Parent(s): edd2fde

Update README.md

Browse files

Files changed (1) hide show

README.md +53 -54

README.md CHANGED Viewed

@@ -62,30 +62,30 @@ print(tokenizer.detokenize(translated[0][0]['tokens']))
 The was trained on a combination of the following datasets:
-| Dataset            | Sentences      | Tokens            |
-|--------------------|----------------|-------------------|
-| Global Voices      | 21.342         | 438.032           |
-| Memories Lluires   | 1.173.055      | 9.452.382         |
-| Wikimatrix         | 1.205.908      | 28.111.517        |
-| TED Talks          | 50.979         | 770.774           |
-| Tatoeba            | 5.500          | 34.872            |
-| CoVost 2 ca-en     | 79.633         | 809.660           |
-| CoVost 2 en-ca     | 263.891        | 2.953.096         |
-| Europarl           | 1.965.734      | 50.417.289        |
-| jw300              | 97.081         | 1.809.252         |
-| Crawled Generalitat| 38.595         | 858.385           |
-| Opus Books         | 4.580          | 73.416            |
-| CC Aligned         | 5.787.682      | 89.606.874        |
-| COVID_Wikipedia    | 1.531          | 34.836            |
-| EuroBooks          | 3.746          | 82.067            |
-| Gnome              | 2.183          | 30.228            |
-| KDE 4              | 144.153        | 1.450.631         |
-| OpenSubtitles      | 427.913        | 2.796.350         |
-| QED                | 69.823         | 1.058.003         |
-| Ubuntu             | 6.781          | 33.321            |
-| Wikimedia          | 208.073        | 5.761.409         |
-|--------------------|----------------|-------------------|
-| **Total**          | **11.558.183** | **196.582.394**   |
 ### Training procedure
@@ -103,26 +103,26 @@ The was trained on a combination of the following datasets:
 The model is based on the Transformer-XLarge proposed by [Subramanian et al.](https://aclanthology.org/2021.wmt-1.18.pdf)
 The following hyperparamenters were set on the Fairseq toolkit:
-| Hyperparameter                     | Value                            |
-|------------------------------------|----------------------------------|
-| Architecture                       | transformer_vaswani_wmt_en_de_bi |
-| Embedding size                     | 1024                             |
-| Feedforward size                   | 4096                             |
-| Number of heads                    | 16                               |
-| Encoder layers                     | 24                               |
-| Decoder layers                     | 6                                |
-| Normalize before attention         | True                             |
-| --share-decoder-input-output-embed | True                             |
-| --share-all-embeddings             | True                             |
-| Effective batch size               | 96.000                           |
-| Optimizer                          | adam                             |
-| Adam betas                         | (0.9, 0.980)                     |
-| Clip norm                          | 0.0                              |
-| Learning rate                      | 1e-3                             |
-| Lr. schedurer                      | inverse sqrt                     |
-| Warmup updates                     | 4000                             |
-| Dropout                            | 0.1                              |
-| Label smoothing                    | 0.1                              |
 The model was trained for a total of 35.000 updates. Weights were saved every 1000 updates and reported results are the average of the last 16 checkpoints.
@@ -139,17 +139,16 @@ Below are the evaluation results on the machine translation from Catalan to Engl
 | Test set             | SoftCatalà | Google Translate | mt-aina-ca-en |
 |----------------------|------------|------------------|---------------|
-| Spanish Constitution |            | 43,2             | 40,3          |
-| United Nations       |            | 47,4             | 44,8          |
-| aina_aapp            |            | 53               | 51,5          |
-| aina_eu_comission    |            |                  |               |
-| Flores 101 dev       |            | 47,5             | 46,1          |
-| Flores 101 devtest   |            | 46,9             | 45,2          |
-| Cybersecurity        |            | 58               | 54,2          |
-| wmt 19 biomedical    |            | 23,4             | 21,6          |
-| wmt 13 news          |            | 39,8             | 39,3          |
 |----------------------|------------|------------------|---------------|
-| Average              |            |                  |               |
 ## Additional information

 The was trained on a combination of the following datasets:
+| Dataset            | Sentences      |
+|--------------------|----------------|
+| Global Voices      | 21.342         |
+| Memories Lluires   | 1.173.055      |
+| Wikimatrix         | 1.205.908      |
+| TED Talks          | 50.979         |
+| Tatoeba            | 5.500          |
+| CoVost 2 ca-en     | 79.633         |
+| CoVost 2 en-ca     | 263.891        |
+| Europarl           | 1.965.734      |
+| jw300              | 97.081         |
+| Crawled Generalitat| 38.595         |
+| Opus Books         | 4.580          |
+| CC Aligned         | 5.787.682      |
+| COVID_Wikipedia    | 1.531          |
+| EuroBooks          | 3.746          |
+| Gnome              | 2.183          |
+| KDE 4              | 144.153        |
+| OpenSubtitles      | 427.913        |
+| QED                | 69.823         |
+| Ubuntu             | 6.781          |
+| Wikimedia          | 208.073        |
+|--------------------|----------------|
+| **Total**          | **11.558.183** |
 ### Training procedure
 The model is based on the Transformer-XLarge proposed by [Subramanian et al.](https://aclanthology.org/2021.wmt-1.18.pdf)
 The following hyperparamenters were set on the Fairseq toolkit:
+| Hyperparameter                     | Value                             |
+|------------------------------------|-----------------------------------|
+| Architecture                       | transformer_vaswani_wmt_en_de_big |
+| Embedding size                     | 1024                              |
+| Feedforward size                   | 4096                              |
+| Number of heads                    | 16                                |
+| Encoder layers                     | 24                                |
+| Decoder layers                     | 6                                 |
+| Normalize before attention         | True                              |
+| --share-decoder-input-output-embed | True                              |
+| --share-all-embeddings             | True                              |
+| Effective batch size               | 96.000                            |
+| Optimizer                          | adam                              |
+| Adam betas                         | (0.9, 0.980)                      |
+| Clip norm                          | 0.0                               |
+| Learning rate                      | 1e-3                              |
+| Lr. schedurer                      | inverse sqrt                      |
+| Warmup updates                     | 4000                              |
+| Dropout                            | 0.1                               |
+| Label smoothing                    | 0.1                               |
 The model was trained for a total of 35.000 updates. Weights were saved every 1000 updates and reported results are the average of the last 16 checkpoints.
 | Test set             | SoftCatalà | Google Translate | mt-aina-ca-en |
 |----------------------|------------|------------------|---------------|
+| Spanish Constitution | 35,8       | 43,2             | 40,3          |
+| United Nations       | 44,4       | 47,4             | 44,8          |
+| aina_aapp            | 48,8       | 53               | 51,5          |
+| Flores 101 dev       | 42,7       | 47,5             | 46,1          |
+| Flores 101 devtest   | 42,5       | 46,9             | 45,2          |
+| Cybersecurity        | 52,5       | 58               | 54,2          |
+| wmt 19 biomedical    | 18,3       | 23,4             | 21,6          |
+| wmt 13 news          | 37,8       | 39,8             | 39,3          |
 |----------------------|------------|------------------|---------------|
+| Average              | 39,2       | 45,0             | 41,6          |
 ## Additional information