Update README.md
Browse files
README.md
CHANGED
@@ -7,16 +7,30 @@ This model adapt T5 on Arabic Language by pre-training T5 on ArabicWikipedia, Ma
|
|
7 |
|
8 |
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
9 |
|
10 |
-
| Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
|
11 |
-
|
12 |
-
| AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) |
|
13 |
-
| AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) |
|
14 |
-
| AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) |
|
15 |
-
| mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
|
16 |
-
| ArabicT5-Base | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
|
17 |
-
| ArabicT5-Large | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
|
18 |
-
| **ArabicT5-xLarge** | **768** | **12** | **36** | **32K** |**TPUv3-128** | **500K** | **512** | **2.0x** |**17GB (MSA)** |
|
19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
20 |
|
21 |
# Paper
|
22 |
|
|
|
7 |
|
8 |
## Pre-training Settings and Results on TyDi QA Development Dataset ( Model in this card is highlighted in bold )
|
9 |
|
10 |
+
| Model | Hidden Layer | Atten. head | Atten. Layers | Vocab | Hardware |Training Steps | Batch | Train x Batch Factor |Corpora |
|
11 |
+
|------------------|--------------|-------------|---------------|-------|-----------|---------------|--------|-----------------------|------------------------|
|
12 |
+
| AraT5-Base | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |248GB 29B tokens (MSA + Tweets) |
|
13 |
+
| AraT5-Base-MSA | 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |70GB (MSA) |
|
14 |
+
| AraT5-Base-Tweets| 768 | 12 | 12 | 110K |TPUv3-8 | 1M | 128 | 1.0x |178GB (Tweets) |
|
15 |
+
| mT5-Base | 768 | 12 | 12 | 250K |TPUv3-32 | 1M | 1024 | 8.0x |6.3T tokens (mC4)|
|
16 |
+
| ArabicT5-Base | 512 | 8 | 20 | 32K |TPUv3-32 | 256K | 256 | 0.5x |17GB (MSA) |
|
17 |
+
| ArabicT5-Large | 768 | 12 | 16 | 32K |TPUv3-128 | 500K | 512 | 2.0x |17GB (MSA) |
|
18 |
+
| **ArabicT5-xLarge** | **768** | **12** | **36** | **32K** |**TPUv3-128** | **500K** | **512** | **2.0x** |**17GB (MSA)** |
|
19 |
+
|
20 |
+
|
21 |
+
## Results on TyDi QA, HARD, Sentiment Analysis, Sarcasm Detection ( Best Score is highlighted in bold )
|
22 |
+
|
23 |
+
| Model | <center>TyDi QA (Dev) | <center>HARD (Hotel Review) | <center>ArSarcasm-v2 (Sentiment Analysis) | <center>ArSarcasm-v2 (Sarcasm Detection) |
|
24 |
+
|----------------------|---------------|---------------------|-------------------------------------|----------------------------------|
|
25 |
+
| AraT5-Base | <center>70.36/84.21 |<center>96.49|<center>69.7/72.63|<center>60.44|
|
26 |
+
| AraT5-Base-MSA | <center>70.90/84.00 |<center>**96.52**|<center>70.03/72.73|<center>60.69|
|
27 |
+
| AraT5-Base-Tweets | <center>65.14/79.00 |<center>96.26|<center>70.67/73.52|<center>61.11|
|
28 |
+
| mT5-Base | <center>72.20/84.13 |<center>96.24|<center>67.33/68.78|<center>52.18|
|
29 |
+
| ArabicT5-Base | <center>70.79/84.76 |<center>96.36|<center>68.93/71.20|<center>58.93|
|
30 |
+
| ArabicT5-Large | <center>73.29/86.08 |<center>96.40|<center>70.4/73.01|<center>59.79|
|
31 |
+
| ArabicT5-xLarge | <center>**75.46/87.12** |<center>96.50| <center>**72.23/75.17**|<center>**61.66**|
|
32 |
+
|
33 |
+
Evaluation Metrics : TyDi QA (EM/F1), HARD (Accuracy), Sentiment Analysis (Accuracy / F1-PN positive-negative), Sarcasm Detection (F1-sarcastic)
|
34 |
|
35 |
# Paper
|
36 |
|