Davlan commited on
Commit
879a5d5
1 Parent(s): 34094ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +187 -1
README.md CHANGED
@@ -1,5 +1,191 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
 
5
- SIB-200 paper
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ tags:
4
+ - generated_from_trainer
5
+ model-index:
6
+ - name: afro-xlmr-large-76L
7
+ results: []
8
+ language:
9
+ - en
10
+ - am
11
+ - ar
12
+ - so
13
+ - sw
14
+ - pt
15
+ - af
16
+ - fr
17
+ - zu
18
+ - mg
19
+ - ha
20
+ - sn
21
+ - arz
22
+ - ny
23
+ - ig
24
+ - xh
25
+ - yo
26
+ - st
27
+ - rw
28
+ - tn
29
+ - ti
30
+ - ts
31
+ - om
32
+ - run
33
+ - nso
34
+ - ee
35
+ - ln
36
+ - tw
37
+ - pcm
38
+ - gaa
39
+ - loz
40
+ - lg
41
+ - guw
42
+ - bem
43
+ - efi
44
+ - lue
45
+ - lua
46
+ - toi
47
+ - ve
48
+ - tum
49
+ - tll
50
+ - iso
51
+ - kqn
52
+ - zne
53
+ - umb
54
+ - mos
55
+ - tiv
56
+ - lu
57
+ - ff
58
+ - kwy
59
+ - bci
60
+ - rnd
61
+ - luo
62
+ - wal
63
+ - ss
64
+ - lun
65
+ - wo
66
+ - nyk
67
+ - kj
68
+ - ki
69
+ - fon
70
+ - bm
71
+ - cjk
72
+ - din
73
+ - dyu
74
+ - kab
75
+ - kam
76
+ - kbp
77
+ - kr
78
+ - kmb
79
+ - kg
80
+ - nus
81
+ - sg
82
+ - taq
83
+ - tzm
84
+ - nqo
85
  ---
86
 
87
+
88
+ # afro-xlmr-large-76L
89
+
90
+ AfroXLMR-large-76L was created by an MLM adaptation of the expanded XLM-R-large model on 76 languages widely spoken in Africa
91
+ including 4 high-resource languages.
92
+
93
+ ### Pre-training corpus
94
+ A mix of mC4, Wikipedia and OPUS data
95
+
96
+ ### Languages
97
+
98
+ There are 75 languages available :
99
+ - English (eng)
100
+ - Amharic (amh)
101
+ - Arabic (ara)
102
+ - Somali (som)
103
+ - Kiswahili (swa)
104
+ - Portuguese (por)
105
+ - Afrikaans (afr)
106
+ - French (fra)
107
+ - isiZulu (zul)
108
+ - Malagasy (mlg)
109
+ - Hausa (hau)
110
+ - chiShona (sna)
111
+ - Egyptian Arabic (arz)
112
+ - Chichewa (nya)
113
+ - Igbo (ibo)
114
+ - isiXhosa (xho)
115
+ - Yorùbá (yor)
116
+ - Sesotho (sot)
117
+ - Kinyarwanda (kin)
118
+ - Tigrinya (tir)
119
+ - Tsonga (tso)
120
+ - Oromo (orm)
121
+ - Rundi (run)
122
+ - Northern Sotho (nso)
123
+ - Ewe (ewe)
124
+ - Lingala (lin)
125
+ - Twi (twi)
126
+ - Nigerian Pidgin (pcm)
127
+ - Ga (gaa)
128
+ - Lozi (loz)
129
+ - Luganda (lug)
130
+ - Gun (guw)
131
+ - Bemba (bem)
132
+ - Efik (efi)
133
+ - Luvale (lue)
134
+ - Luba-Lulua (lua)
135
+ - Tonga (toi)
136
+ - Tshivenḓa (ven)
137
+ - Tumbuka (tum)
138
+ - Tetela (tll)
139
+ - Isoko (iso)
140
+ - Kaonde (kqn)
141
+ - Zande (zne)
142
+ - Umbundu (umb)
143
+ - Mossi (mos)
144
+ - Tiv (tiv)
145
+ - Luba-Katanga (lub)
146
+ - Fula (fuv)
147
+ - San Salvador Kongo (kwy)
148
+ - Baoulé (bci)
149
+ - Ruund (rnd)
150
+ - Luo (luo)
151
+ - Wolaitta (wal)
152
+ - Swazi (ssw)
153
+ - Lunda (lun)
154
+ - Wolof (wol)
155
+ - Nyaneka (nyk)
156
+ - Kwanyama (kua)
157
+ - Kikuyu (kik)
158
+ - Fon (fon)
159
+ - Bambara (bam)
160
+ - Chokwe (cjk)
161
+ - Dinka (dik)
162
+ - Dyula (dyu)
163
+ - Kabyle (kab)
164
+ - Kamba (kam)
165
+ - Kabiyè (kbp)
166
+ - Kanuri (knc)
167
+ - Kimbundu (kmb)
168
+ - Kikongo (kon)
169
+ - Nuer (nus)
170
+ - Sango (sag)
171
+ - Tamasheq (taq)
172
+ - Tamazight (tzm)
173
+ - N'ko (nqo)
174
+
175
+
176
+ ### Acknowledgment
177
+ We would like to thank Google Cloud for providing us access to TPU v3-8 through the free cloud credits. Model trained using flax, before converted to pytorch.
178
+
179
+
180
+ ### BibTeX entry and citation info.
181
+ ```
182
+ @misc{adelani2023sib200,
183
+ title={SIB-200: A Simple, Inclusive, and Big Evaluation Dataset for Topic Classification in 200+ Languages and Dialects},
184
+ author={David Ifeoluwa Adelani and Hannah Liu and Xiaoyu Shen and Nikita Vassilyev and Jesujoba O. Alabi and Yanke Mao and Haonan Gao and Annie En-Shiun Lee},
185
+ year={2023},
186
+ eprint={2309.07445},
187
+ archivePrefix={arXiv},
188
+ primaryClass={cs.CL}
189
+ }
190
+
191
+ ```