---
license: mit
base_model: xlm-roberta-base
datasets:
- xtreme
metrics:
- precision
- recall
- f1
- accuracy
model-index:
- name: roberta-base-NER
  results:
  - task:
      name: Token Classification
      type: token-classification
    dataset:
      name: xtreme
      type: xtreme
      config: PAN-X.en
      split: validation
      args: PAN-X.en
    metrics:
    - name: Precision
      type: precision
      value: 0.8003614625330182
    - name: Recall
      type: recall
      value: 0.8110735418427726
    - name: F1
      type: f1
      value: 0.8056818976978517
    - name: Accuracy
      type: accuracy
      value: 0.9194332683336213
language:
- en
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# roberta-base-NER
## Model description

**xlm-roberta-base-multilingual-cased-ner** is a **Named Entity Recognition** model based on a fine-tuned XLM-RoBERTa base model. 
It has been trained to recognize three types of entities: location (LOC), organizations (ORG), and person (PER). 
Specifically, this model is a *XLMRoreberta-base-multilingual-cased* model that was fine-tuned on an aggregation of 10 high-resourced languages.

## Intended uses & limitations

#### How to use

You can use this model with Transformers *pipeline* for NER.

```python
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("Tirendaz/multilingual-xlm-roberta-for-ner")
model = AutoModelForTokenClassification.from_pretrained("Tirendaz/multilingual-xlm-roberta-for-ner")

nlp = pipeline("ner", model=model, tokenizer=tokenizer)
example = "My name is Wolfgang and I live in Berlin"

ner_results = nlp(example)
print(ner_results)
```

Abbreviation|Description
-|-
O|Outside of a named entity
B-PER |Beginning of a person’s name right after another person’s name
I-PER |Person’s name
B-ORG |Beginning of an organisation right after another organisation
I-ORG |Organisation
B-LOC |Beginning of a location right after another location
I-LOC |Location

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 5

### Training results

| Training Loss | Epoch | Step | Validation Loss | Precision | Recall | F1     | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:---------:|:------:|:------:|:--------:|
| No log        | 1.0   | 417  | 0.3359          | 0.7286    | 0.7675 | 0.7476 | 0.8991   |
| 0.4227        | 2.0   | 834  | 0.2951          | 0.7711    | 0.7980 | 0.7843 | 0.9131   |
| 0.2818        | 3.0   | 1251 | 0.2824          | 0.7852    | 0.8076 | 0.7962 | 0.9174   |
| 0.2186        | 4.0   | 1668 | 0.2853          | 0.7934    | 0.8150 | 0.8041 | 0.9193   |
| 0.1801        | 5.0   | 2085 | 0.2935          | 0.8004    | 0.8111 | 0.8057 | 0.9194   |


### Framework versions

- Transformers 4.33.0
- Pytorch 2.0.0
- Datasets 2.1.0
- Tokenizers 0.13.3