ARBERT is one of two models described in the paper "ARBERT & MARBERT: Deep Bidirectional Transformers for Arabic". ARBERT is a large-scale pre-trained masked language model focused on Modern Standard Arabic (MSA). To train ARBERT, we use the same architecture as BERT-base: 12 attention layers, each has 12 attention heads and 768 hidden dimensions, a vocabulary of 100K WordPieces, making ∼163M parameters. We train ARBERT on a collection of Arabic datasets comprising 61GB of text (6.2B tokens). For more information, please visit our own GitHub repo.

Downloads last month
Hosted inference API

Unable to determine this model’s pipeline type. Check the docs .