Update README.md
Browse files
README.md
CHANGED
@@ -26,7 +26,7 @@ license: apache-2.0
|
|
26 |
|
27 |
## Model description
|
28 |
|
29 |
-
This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Catalan-French datasets, which after filtering and cleaning comprised 18.634.844 sentence pairs. The model is evaluated on the Flores
|
30 |
|
31 |
## Intended uses and limitations
|
32 |
|
@@ -83,7 +83,7 @@ All datasets are deduplicated and filtered to remove any sentence pairs with a c
|
|
83 |
|
84 |
#### Tokenization
|
85 |
|
86 |
-
All data is tokenized using sentencepiece, with 50 thousand token sentencepiece model
|
87 |
|
88 |
#### Hyperparameters
|
89 |
|
|
|
26 |
|
27 |
## Model description
|
28 |
|
29 |
+
This model was trained from scratch using the [Fairseq toolkit](https://fairseq.readthedocs.io/en/latest/) on a combination of Catalan-French datasets, which after filtering and cleaning comprised 18.634.844 sentence pairs. The model is evaluated on the Flores and NTREX evaluation sets.
|
30 |
|
31 |
## Intended uses and limitations
|
32 |
|
|
|
83 |
|
84 |
#### Tokenization
|
85 |
|
86 |
+
All data is tokenized using sentencepiece, with 50 thousand token sentencepiece model learned from the combination of all filtered training data. This model is included.
|
87 |
|
88 |
#### Hyperparameters
|
89 |
|