Update README.md
Browse files
README.md
CHANGED
@@ -50,7 +50,7 @@ The recommended environments include the following transfomer versions: 4.12.3 ,
|
|
50 |
|
51 |
### Training Data
|
52 |
|
53 |
-
The
|
54 |
|
55 |
|
56 |
| Dataset | Sentences before cleaning |
|
@@ -67,7 +67,7 @@ The Catalan-Basque data collected from the web was a combination of the followin
|
|
67 |
| WikiMatrix | 119,480 |
|
68 |
| **Total** | **15,653,108** |
|
69 |
|
70 |
-
The 9,033,998 sentence pairs of synthetic parallel data were created by translating a compendium of ES-EU parallel corpora into
|
71 |
|
72 |
|
73 |
### Training Procedure
|
|
|
50 |
|
51 |
### Training Data
|
52 |
|
53 |
+
The Basque-English data collected from the web was a combination of the following datasets:
|
54 |
|
55 |
|
56 |
| Dataset | Sentences before cleaning |
|
|
|
67 |
| WikiMatrix | 119,480 |
|
68 |
| **Total** | **15,653,108** |
|
69 |
|
70 |
+
The 9,033,998 sentence pairs of synthetic parallel data were created by translating a compendium of ES-EU parallel corpora into Basque using the [ES-EU translator from HiTZ](https://huggingface.co/HiTZ/mt-hitz-es-eu).
|
71 |
|
72 |
|
73 |
### Training Procedure
|