Back to all models
fill-mask mask_token: [MASK]
Query this model
🔥 This model is currently loaded and running on the Inference API. ⚠️ This model could not be loaded by the inference API. ⚠️ This model can be loaded on the Inference API on-demand.
JSON Output
API endpoint  

⚡️ Upgrade your account to access the Inference API

							curl -X POST \
-H "Authorization: Bearer YOUR_ORG_OR_USER_API_TOKEN" \
-H "Content-Type: application/json" \
-d '"json encoded string"' \
Share Copied link to clipboard

Monthly model downloads

kuisailab/albert-base-arabic kuisailab/albert-base-arabic
last 30 days



Contributed by

KUIS AI Lab university
1 team member · 3 models

How to use this model directly from the 🤗/transformers library:

Copy to clipboard
from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("kuisailab/albert-base-arabic") model = AutoModel.from_pretrained("kuisailab/albert-base-arabic")

Arabic-ALBERT Base

Arabic edition of ALBERT Base pretrained language model

Pretraining data

The models were pretrained on ~4.4 Billion words:

Notes on training data:

  • Our final version of corpus contains some non-Arabic words inlines, which we did not remove from sentences since that would affect some tasks like NER.
  • Although non-Arabic characters were lowered as a preprocessing step, since Arabic characters do not have upper or lower case, there is no cased and uncased version of the model.
  • The corpus and vocabulary set are not restricted to Modern Standard Arabic, they contain some dialectical Arabic too.

Pretraining details

  • These models were trained using Google ALBERT's github repository on a single TPU v3-8 provided for free from TFRC.
  • Our pretraining procedure follows training settings of bert with some changes: trained for 7M training steps with batchsize of 64, instead of 125K with batchsize of 4096.


albert-base albert-large albert-xlarge
Hidden Layers 12 24 24
Attention heads 12 16 32
Hidden size 768 1024 2048


For further details on the models performance or any other queries, please refer to Arabic-ALBERT

How to use

You can use these models by installing torch or tensorflow and Huggingface library transformers. And you can use it directly by initializing it like this:

from transformers import AutoTokenizer, AutoModel

# loading the tokenizer
base_tokenizer    = AutoTokenizer.from_pretrained("kuisailab/albert-base-arabic")

# loading the model
base_model   = AutoModel.from_pretrained("kuisailab/albert-base-arabic")


Thanks to Google for providing free TPU for the training process and for Huggingface for hosting these models on their servers 😊