m-nagoudi commited on
Commit
9e3a770
1 Parent(s): 83e06ca

Update Readme.md

Browse files
Files changed (1) hide show
  1. Readme.md +3 -16
Readme.md CHANGED
@@ -1,5 +1,5 @@
1
  # IndT5: A Text-to-Text Transformer for 10 Indigenous Languages
2
- <img src="IND_langs_large7.png" alt="drawing" width="45%" height="45%" align="right"/>
3
  In this work, we introduce IndT5, the first Transformer language model for Indigenous languages. To train IndT5, we build IndCorpu, a new corpus for 10 Indigenous languages and Spanish. We also present the application of IndT5 to machine translation by investigating different approaches to translate between Spanish and the Indigenous languages as part of our contribution to theAmericasNLP 2021 Shared Task on OpenMachine Translation.
4
 
5
  &nbsp;
@@ -49,22 +49,9 @@ We build IndCorpus, a collection of 10 Indigeous languages and Spanish comprisin
49
  |Total | 1.15K | 5.22M | 19.8 | 125.3K|
50
 
51
 
52
- # Parallel datasets for machine translation
53
- The datasets are provided by AmericasNLP 2021 Shared Task on Open Machine Translation (https://github.com/AmericasNLP/americasnlp2021).
54
- ### Number of sentences in parallel dataset (train, dev and test set)
55
- | **Language Pair** | **Train** | **Dev** | **Test** |
56
- |-------------------|------------------|-------------------|------------------------|
57
- |es-aym | 6,531 | 996 | 1,003 |
58
- |es-cni | 3,883 | 883 | 1,003 |
59
- |es-bzd | 7,506 | 996 | 1,003 |
60
- |es-gn | 26,032 | 995 | 1,003 |
61
- |es-oto | 4,889 | 599 | 1,003 |
62
- |es-nah | 16,145 | 672 | 1,003 |
63
- |es-quy | 125,008 | 996 | 1,003 |
64
- |es-tar | 14,720 | 995 | 1,003 |
65
- |es-shp | 14,592 | 996 | 1,003 |
66
- |es-hch | 8,966 | 994 | 1,003 |
67
 
 
68
 
69
  # BibTex
70
 
 
1
  # IndT5: A Text-to-Text Transformer for 10 Indigenous Languages
2
+ <img src="https://huggingface.co/UBC-NLP/IndT5/raw/main/IND_langs_large7.png" alt="drawing" width="45%" height="45%" align="right"/>
3
  In this work, we introduce IndT5, the first Transformer language model for Indigenous languages. To train IndT5, we build IndCorpu, a new corpus for 10 Indigenous languages and Spanish. We also present the application of IndT5 to machine translation by investigating different approaches to translate between Spanish and the Indigenous languages as part of our contribution to theAmericasNLP 2021 Shared Task on OpenMachine Translation.
4
 
5
  &nbsp;
 
49
  |Total | 1.15K | 5.22M | 19.8 | 125.3K|
50
 
51
 
52
+ # Github
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ More details about our model can be found here: https://github.com/UBC-NLP/IndT5
55
 
56
  # BibTex
57