milmor commited on
Commit
d5e88ac
1 Parent(s): a4946fb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -11,7 +11,7 @@ widget:
11
  ---
12
 
13
  # t5-small-spanish-nahuatl
14
- Nahuatl is the most widely spoken indigenous language in Mexico. However, training a neural network for the neural machine translation task is challenging due to the lack of structured data. The most popular datasets, such as the Axolot and bible-corpus, only consist of ~16,000 and ~7,000 samples, respectively. Moreover, there are multiple variants of Nahuatl, which makes this task even more difficult. For example, it is possible to find a single word from the Axolot dataset written in more than three different ways. Therefore, we leverage the T5 text-to-text prefix training strategy to compensate for the lack of data. We first train the multilingual model to learn Spanish and then adapt the model to Nahuatl. The resulting model successfully translates short sentences. Finally, we report Chrf and BLEU results.
15
 
16
 
17
  ## Model description
 
11
  ---
12
 
13
  # t5-small-spanish-nahuatl
14
+ Nahuatl is the most widely spoken indigenous language in Mexico. However, training a neural network for the neural machine translation task is challenging due to the lack of structured data. The most popular datasets, such as the Axolot and bible-corpus, only consist of ~16,000 and ~7,000 samples, respectively. Moreover, there are multiple variants of Nahuatl, which makes this task even more difficult. For example, it is possible to find a single word from the Axolot dataset written in more than three different ways. Therefore, we leverage the T5 text-to-text prefix training strategy to compensate for the lack of data. We first train the multilingual model to learn Spanish and then adapt it to Nahuatl. The resulting T5 Transformer successfully translates short sentences. Finally, we report Chrf and BLEU results.
15
 
16
 
17
  ## Model description