File size: 1,415 Bytes
5f7cbfe
b4ab44d
 
 
 
 
 
 
5f7cbfe
 
 
b4ab44d
5f7cbfe
fe2c230
a23fc1a
5f7cbfe
 
b4ab44d
5f7cbfe
b4ab44d
5f7cbfe
b4ab44d
5f7cbfe
 
b4ab44d
5f7cbfe
 
b4ab44d
5f7cbfe
b4ab44d
5f7cbfe
 
 
b4ab44d
5f7cbfe
b4ab44d
 
a23fc1a
b4ab44d
 
5f7cbfe
b4ab44d
 
 
 
 
 
 
 
 
 
a23fc1a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
datasets:
- AfnanTS/Final_ArLAMA_DS_tokenized_for_ARBERTv2
language:
- ar
base_model:
- UBC-NLP/ARBERTv2
pipeline_tag: fill-mask
---


<img src="./arab_icon2.png" alt="Model Logo" width="30%" height="30%" align="right"/>

**ARBERT_ArLAMA** is a pre-trained Arabic language model fine-tuned using Masked Language Modeling (MLM) tasks. This model leverages Knowledge Graphs (KGs) to capture semantic relations in Arabic text, aiming to improve vocabulary comprehension and performance in downstream tasks.



## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use


Filling masked tokens in Arabic text, particularly in contexts enriched with knowledge from KGs.


### Downstream Use

Can be further fine-tuned for Arabic NLP tasks that require semantic understanding, such as text classification or question answering.



## How to Get Started with the Model

```python
from transformers import pipeline
fill_mask = pipeline("fill-mask", model="AfnanTS/ARBERT_ArLAMA")
fill_mask("اللغة [MASK] مهمة جدا."
```

## Training Details

### Training Data

Trained on the ArLAMA dataset, which is designed to represent Knowledge Graphs in natural language.



### Training Procedure

Continued pre-training of ArBERTv1 using Masked Language Modeling (MLM) to integrate KG-based knowledge.