arabnizer-xlmr-panx-ar

This model is a fine-tuned version of xlm-roberta-base on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 0.2073
F1: 0.8885
Accuracy: 0.9533

Intended uses & limitations

Simple Usages:

from transformers import pipeline

ner_tagger = pipeline("token-classification", "mohammedaly22/arabnizer-xlmr-panx-ar")
text = "اسمي محمد، اعمل في أورانج و اسكن في القاهرة."

ner_tagger(text, grouped_entities=True)

result:

[{'entity_group': 'PER',
  'score': 0.9486102,
  'word': 'محمد',
  'start': 5,
  'end': 9},
 {'entity_group': 'ORG',
  'score': 0.8212871,
  'word': 'اورانج',
  'start': 24,
  'end': 30},
 {'entity_group': 'LOC',
  'score': 0.9967932,
  'word': 'القاهرة',
  'start': 41,
  'end': 48}]

Training and evaluation data

The Cross-lingual Natural Language Inference (XNLI) corpus is a crowd-sourced collection of 5,000 test and 2,500 dev pairs for the MultiNLI corpus. The pairs are annotated with textual entailment and translated into 14 languages: French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu. This results in 112.5k annotated pairs. Each premise can be associated with the corresponding hypothesis in the 15 languages, summing up to more than 1.5M combinations. The corpus is made to evaluate how to perform inference in any language (including low-resources ones like Swahili or Urdu) when only English NLI data is available at training time. One solution is cross-lingual sentence encoding, for which XNLI is an evaluation benchmark. The Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark is a benchmark for the evaluation of the cross-lingual generalization ability of pre-trained multilingual models. It covers 40 typologically diverse languages (spanning 12 language families) and includes nine tasks that collectively require reasoning about different levels of syntax and semantics. The languages in XTREME are selected to maximize language diversity, coverage in existing tasks, and availability of training data. Among these are many under-studied languages, such as the Dravidian languages Tamil (spoken in southern India, Sri Lanka, and Singapore), Telugu and Malayalam (spoken mainly in southern India), and the Niger-Congo languages Swahili and Yoruba, spoken in Africa.

Column	Description
tokens	A list of tokens.
ner_tags	A list of the associated NER tags for each token.
langs	A list of the language of each token.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	F1	Accuracy
0.2031	1.0	1250	0.2078	0.8598	0.9426
0.1559	2.0	2500	0.2043	0.8736	0.9490
0.0992	3.0	3750	0.2073	0.8885	0.9533

Framework versions

Transformers 4.38.2
Pytorch 2.2.1+cu121
Datasets 2.18.0
Tokenizers 0.15.2

mohammedaly22
/

arabnizer-xlmr-panx-ar