proxectonos
/

Nos_MT-OpenNMT-en-gl

Model card Files Files and versions Community

sandrarrey commited on Feb 28, 2023

Commit

95ef0ae

·

1 Parent(s): 8aba977

Update README_English.md

Files changed (1) hide show

README_English.md +3 -3

README_English.md CHANGED Viewed

@@ -29,12 +29,12 @@ onmt_translate -src input_text -model NOS-MT-en-gl -output ./output_file.txt -r
 **Training**
-In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of english-portuguese translations, which we have converted into english-galician by means of portuguese-galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
 **Training process**
 + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
-+ The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) da open NMT
 + Using .yaml in this repository you can replicate the training process as follows
 ```bash
@@ -48,7 +48,7 @@ The parameters used for the development of the model can be directly consulted i
 **Evaluation**
-The BLEU evaluation of the models is made with a mixture of internally developed tests (gold1, gold2, test-suite) with other datasets available in Galician (Flores).
 | GOLD 1        | GOLD 2        | FLORES  | TEST-SUITE|
 | ------------- |:-------------:| -------:|----------:|

 **Training**
+In the training we have used authentic and synthetic corpora from [ProxectoNós](https://github.com/proxectonos/corpora). The former are corpora of translations directly produced by human translators. The latter are corpora of English-Portuguese translations, which we have converted into English-Galician by means of Portuguese-Galician translation with Opentrad/Apertium and transliteration for out-of-vocabulary words.
 **Training process**
 + Tokenization of the datasets made with linguakit tokeniser https://github.com/citiususc/Linguakit
++ The vocabulary for the models was generated through the script [learn_bpe.py](https://github.com/OpenNMT/OpenNMT-py/blob/master/tools/learn_bpe.py) of OpenNMT
 + Using .yaml in this repository you can replicate the training process as follows
 ```bash
 **Evaluation**
+The BLEU evaluation of the models is made with a mixture of internally developed tests (gold1, gold2, test-suite) and other datasets available in Galician (Flores).
 | GOLD 1        | GOLD 2        | FLORES  | TEST-SUITE|
 | ------------- |:-------------:| -------:|----------:|