ayoubkirouane
commited on
Commit
•
e843fb5
1
Parent(s):
de62610
Update README.md
Browse files
README.md
CHANGED
@@ -4,4 +4,62 @@ datasets:
|
|
4 |
language:
|
5 |
- ar
|
6 |
pipeline_tag: token-classification
|
7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
language:
|
5 |
- ar
|
6 |
pipeline_tag: token-classification
|
7 |
+
---
|
8 |
+
|
9 |
+
|
10 |
+
|
11 |
+
|
12 |
+
## Model Name: BERT-base_NER-ar
|
13 |
+
|
14 |
+
### Model Description :
|
15 |
+
|
16 |
+
**BERT-base_NER-ar** is a fine-tuned **BERT** multilingual base model for Named Entity Recognition (NER) in Arabic. The base model was pretrained on a diverse set of languages and fine-tuned specifically for the task of NER using the "wikiann" dataset. This model is case-sensitive, distinguishing between different letter cases, such as "english" and "English."
|
17 |
+
|
18 |
+
### Dataset
|
19 |
+
The model was fine-tuned on the **wikiann** dataset, which is a multilingual named entity recognition dataset. It contains Wikipedia articles annotated with three types of named entities: LOC (location), PER (person), and ORG (organization). The annotations are in the IOB2 format. The dataset supports 176 of the 282 languages from the original WikiANN corpus.
|
20 |
+
|
21 |
+
### Supported Tasks and Leaderboards
|
22 |
+
The primary supported task for this model is named entity recognition (NER) in Arabic. However, it can also be used to explore the zero-shot cross-lingual capabilities of multilingual models, allowing for NER in various languages.
|
23 |
+
|
24 |
+
### Use Cases
|
25 |
+
+ **Arabic Named Entity Recognition**: *BERT-base_NER-ar* can be used to extract named entities (such as names of people, locations, and organizations) from Arabic text. This is valuable for information retrieval, text summarization, and content analysis in Arabic language applications.
|
26 |
+
|
27 |
+
+ **Multilingual NER**: The model's multilingual capabilities enable it to perform NER in other languages supported by the "wikiann" dataset, making it versatile for cross-lingual NER tasks.
|
28 |
+
|
29 |
+
### Limitations
|
30 |
+
|
31 |
+
+ **Language Limitation**: While the model supports multiple languages, it may not perform equally well in all of them. Performance could vary depending on the quality and quantity of training data available for specific languages.
|
32 |
+
|
33 |
+
+ **Fine-Tuning Data**: The model's performance is dependent on the quality and representativeness of the fine-tuning data (the "wikiann" dataset in this case). If the dataset is limited or biased, it may affect the model's performance.
|
34 |
+
|
35 |
+
## Usage :
|
36 |
+
|
37 |
+
```python
|
38 |
+
from transformers import AutoModelForTokenClassification, AutoTokenizer
|
39 |
+
import torch
|
40 |
+
# Load the fine-tuned model
|
41 |
+
model = AutoModelForTokenClassification.from_pretrained("ayoubkirouane/BERT-base_NER-ar")
|
42 |
+
tokenizer = AutoTokenizer.from_pretrained("ayoubkirouane/BERT-base_NER-ar")
|
43 |
+
|
44 |
+
# Tokenize your input text
|
45 |
+
text = "أبو ظبي هي عاصمة دولة الإمارات العربية المتحدة."
|
46 |
+
tokens = tokenizer.tokenize(tokenizer.decode(tokenizer.encode(text)))
|
47 |
+
|
48 |
+
# Convert tokens to input IDs
|
49 |
+
input_ids = tokenizer.convert_tokens_to_ids(tokens)
|
50 |
+
|
51 |
+
# Perform NER inference
|
52 |
+
with torch.no_grad():
|
53 |
+
outputs = model(torch.tensor([input_ids]))
|
54 |
+
|
55 |
+
# Get the predicted labels for each token
|
56 |
+
predicted_labels = outputs[0].argmax(dim=2).cpu().numpy()[0]
|
57 |
+
|
58 |
+
# Map label IDs to human-readable labels
|
59 |
+
predicted_labels = [model.config.id2label[label_id] for label_id in predicted_labels]
|
60 |
+
|
61 |
+
# Print the tokenized text and its associated labels
|
62 |
+
for token, label in zip(tokens, predicted_labels):
|
63 |
+
print(f"Token: {token}, Label: {label}")
|
64 |
+
|
65 |
+
```
|