File size: 654 Bytes
a5ace5e 1d0695f a5ace5e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
---
language:
- ar
datasets:
- mc4
- oscar
- arabic_billion_words
---
# arabic-t5-small
This is a T5v1.1 (small) trained on the concatenation of the Arabic Billion Words corpus and the Arabic subsets of the mC4 and Oscar datasets. The model could only be trained for about `10%` of the whole dataset due to time limitations.
## Training parameters
| | |
| :-------------------: | :-----------: |
| steps | `22'000` |
| Training batch size | `384` |
| Evaluation batch size | `768` |
| learning rate | `1e-2` |
| dtype | `jnp.float32` |
|