sandrarrey
commited on
Commit
•
95ef0ae
1
Parent(s):
8aba977
Update README_English.md
Browse files- README_English.md +3 -3
README_English.md
CHANGED
@@ -29,12 +29,12 @@ onmt_translate -src input_text -model NOS-MT-en-gl -output ./output_file.txt -r
|
|
29 |
|
30 |
**Training**
|
31 |
|
32 |
-
In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of
|
33 |
|
34 |
**Training process**
|
35 |
|
36 |
+ Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
|
37 |
-
+ The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py)
|
38 |
+ Using .yaml in this repository you can replicate the training process as follows
|
39 |
|
40 |
```bash
|
@@ -48,7 +48,7 @@ The parameters used for the development of the model can be directly consulted i
|
|
48 |
|
49 |
**Evaluation**
|
50 |
|
51 |
-
The BLEU evaluation of the models is made with a mixture of internally developed tests (gold1, gold2, test-suite)
|
52 |
|
53 |
| GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
|
54 |
| ------------- |:-------------:| -------:|----------:|
|
|
|
29 |
|
30 |
**Training**
|
31 |
|
32 |
+
In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of English-Portuguese translations, which we have converted into English-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
|
33 |
|
34 |
**Training process**
|
35 |
|
36 |
+ Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
|
37 |
+
+ The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
|
38 |
+ Using .yaml in this repository you can replicate the training process as follows
|
39 |
|
40 |
```bash
|
|
|
48 |
|
49 |
**Evaluation**
|
50 |
|
51 |
+
The BLEU evaluation of the models is made with a mixture of internally developed tests (gold1, gold2, test-suite) and other datasets available in Galician (Flores).
|
52 |
|
53 |
| GOLD 1 | GOLD 2 | FLORES | TEST-SUITE|
|
54 |
| ------------- |:-------------:| -------:|----------:|
|