ToluClassics commited on
Commit
475acc8
1 Parent(s): d60e552

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -0
README.md ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Hugging Face's logo
2
+ ---
3
+ language:
4
+ - om
5
+ - am
6
+ - rw
7
+ - rn
8
+ - ha
9
+ - ig
10
+ - pcm
11
+ - so
12
+ - sw
13
+ - ti
14
+ - yo
15
+ - multilingual
16
+ - T5
17
+
18
+ ---
19
+ # afriteva_small
20
+
21
+ ## Model desription
22
+
23
+ AfriTeVa small is a sequence to sequence model pretrained on 10 African languages
24
+
25
+ ## Languages
26
+
27
+ Afaan Oromoo(orm), Amharic(amh), Gahuza(gah), Hausa(hau), Igbo(igb), Nigerian Pidgin(pcm), Somali(som), Swahili(swa), Tigrinya(tig), Yoruba(yor)
28
+
29
+ ### More information on the model, dataset:
30
+
31
+ ### The model
32
+
33
+ - 64M parameters encoder-decoder architecture (T5-like)
34
+ - 6 layers, 8 attention heads and 512 token sequence length
35
+
36
+ ### The dataset
37
+
38
+ - Multilingual: 10 African languages listed above
39
+ - 143 Million Tokens (1GB of text data)
40
+ - Tokenizer Vocabulary Size: 70,000 tokens
41
+
42
+ ## Training Procedure
43
+
44
+ For information on training procedures, please refer to the AfriTeVa [paper](#) or [repository](https://github.com/castorini/afriteva)
45
+
46
+ ## BibTex entry and Citation info
47
+
48
+ coming soon ...
49
+
50
+
51
+