arabic-t5-small / README.md
salti's picture
Update README.md
1d0695f
|
raw
history blame
No virus
654 Bytes
metadata
language:
  - ar
datasets:
  - mc4
  - oscar
  - arabic_billion_words

arabic-t5-small

This is a T5v1.1 (small) trained on the concatenation of the Arabic Billion Words corpus and the Arabic subsets of the mC4 and Oscar datasets. The model could only be trained for about 10% of the whole dataset due to time limitations.

Training parameters

steps 22'000
Training batch size 384
Evaluation batch size 768
learning rate 1e-2
dtype jnp.float32