File size: 654 Bytes

a5ace5e
 
 
 
 
 
 
 
 
 
 
1d0695f
 
a5ace5e

---
language:
  - ar
datasets:
  - mc4
  - oscar
  - arabic_billion_words
---

# arabic-t5-small

This is a T5v1.1 (small) trained on the concatenation of the Arabic Billion Words corpus and the Arabic subsets of the mC4 and Oscar datasets. The model could only be trained for about `10%` of the whole dataset due to time limitations.

## Training parameters

|                       |               |
| :-------------------: | :-----------: |
|         steps         |   `22'000`    |
|  Training batch size  |     `384`     |
| Evaluation batch size |     `768`     |
|     learning rate     |    `1e-2`     |
|         dtype         | `jnp.float32` |