Spaces:

somosnlp-hackathon-2022
/

Spanish-Nahuatl-Translation

Running

App Files Files Community

rockdrigoma commited on Apr 4, 2022

Commit

ba4cd1f

•

1 Parent(s): 792ecc4

Update app.py

Browse files

Files changed (1) hide show

app.py +3 -3

app.py CHANGED Viewed

@@ -7,7 +7,7 @@ Nahuatl is the most widely spoken indigenous language in Mexico. However, traini
 ## Motivation
-One of the Sustainable Development Goals is (["Reduced Inequalities"](https://www.un.org/sustainabledevelopment/inequality/)). We know for sure that language is one of the most powerful tools we have and a way to distribute knowledge and experience. But most of the progress that has been done among important topics like technology, education, human rights and law, news and so on, is biased due to lack of resources in different languages. We expect this approach to become an important platform for others in order to reduce inequality and get all Nahuatl speakers closer to what they need to thrive and why not, share with us they valuable knowledge, costumes and way of living.
 ## Model description
@@ -57,7 +57,7 @@ Also, to increase the amount of data we collected 3,000 extra samples from the w
 We employ two training-stages using a multilingual T5-small. This model was chosen because it can handle different vocabularies and suffixes. T5-small is pretrained on different tasks and languages (French, Romanian, English, German).
 ### Training-stage 1 (learning Spanish)
-In training stage 1 we first introduce Spanish to the model. The objective is to learn a new language rich in data (Spanish) and not lose the previous knowledge acquired. We use the English-Spanish [Anki](https://www.manythings.org/anki/) dataset, which consists of 118.964 text pairs. We train the model till convergence adding the suffix "Translate Spanish to English: ".
 ### Training-stage 2 (learning Nahuatl)
 We use the pretrained Spanish-English model to learn Spanish-Nahuatl. Since the amount of Nahuatl pairs is limited, we also add to our dataset 20,000 samples from the English-Spanish Anki dataset. This two-task-training avoids overfitting end makes the model more robust.
@@ -108,7 +108,7 @@ gr.Interface(
      ],
    theme="peach",
    title='🌽 Spanish to Nahuatl Automatic Translation',
-   description='This model is a T5 Transformer (t5-small) fine-tuned on spanish and nahuatl sentences collected from the web. The dataset is normalized using "sep" normalization from py-elotl.',
    examples=[
      'conejo',
      'estrella',

 ## Motivation
+One of the Sustainable Development Goals is ["Reduced Inequalities"](https://www.un.org/sustainabledevelopment/inequality/). We know for sure that language is one of the most powerful tools we have and a way to distribute knowledge and experience. But most of the progress that has been done among important topics like technology, education, human rights and law, news and so on, is biased due to lack of resources in different languages. We expect this approach to become an important platform for others in order to reduce inequality and get all Nahuatl speakers closer to what they need to thrive and why not, share with us their valuable knowledge, costumes and way of living.
 ## Model description
 We employ two training-stages using a multilingual T5-small. This model was chosen because it can handle different vocabularies and suffixes. T5-small is pretrained on different tasks and languages (French, Romanian, English, German).
 ### Training-stage 1 (learning Spanish)
+In training stage 1 we first introduce Spanish to the model. The objective is to learn a new language rich in data (Spanish) and not lose the previous knowledge acquired. We use the English-Spanish [Anki](https://www.manythings.org/anki/) dataset, which consists of 118,964 text pairs. We train the model till convergence adding the suffix "Translate Spanish to English: ".
 ### Training-stage 2 (learning Nahuatl)
 We use the pretrained Spanish-English model to learn Spanish-Nahuatl. Since the amount of Nahuatl pairs is limited, we also add to our dataset 20,000 samples from the English-Spanish Anki dataset. This two-task-training avoids overfitting end makes the model more robust.
      ],
    theme="peach",
    title='🌽 Spanish to Nahuatl Automatic Translation',
+   description='Insert your text in Spanish in the left text box and you will get its Nahuatl translation on the right text box',
    examples=[
      'conejo',
      'estrella',