|
--- |
|
datasets: |
|
- AfnanTS/Final_ArLAMA_DS_tokenized_for_ARBERTv2 |
|
language: |
|
- ar |
|
base_model: |
|
- UBC-NLP/ARBERTv2 |
|
pipeline_tag: fill-mask |
|
--- |
|
|
|
|
|
<img src="./arab_icon2.png" alt="Model Logo" width="30%" height="30%" align="right"/> |
|
|
|
**ArBERTV1_MLM** is a pre-trained Arabic language model fine-tuned using Masked Language Modeling (MLM) tasks. This model leverages Knowledge Graphs (KGs) to capture semantic relations in Arabic text, aiming to improve vocabulary comprehension and performance in downstream tasks. |
|
|
|
|
|
|
|
## Uses |
|
|
|
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> |
|
|
|
### Direct Use |
|
|
|
|
|
Filling masked tokens in Arabic text, particularly in contexts enriched with knowledge from KGs. |
|
|
|
|
|
### Downstream Use |
|
|
|
Can be further fine-tuned for Arabic NLP tasks that require semantic understanding, such as text classification or question answering. |
|
|
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import pipeline |
|
fill_mask = pipeline("fill-mask", model="AfnanTS/ARBERT_ArLAMA") |
|
fill_mask("اللغة [MASK] مهمة جدا." |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
Trained on the ArLAMA dataset, which is designed to represent Knowledge Graphs in natural language. |
|
|
|
|
|
|
|
### Training Procedure |
|
|
|
Continued pre-training of ArBERTv1 using Masked Language Modeling (MLM) to integrate KG-based knowledge. |