gunghio
/

xlm-roberta-base-finetuned-panx-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-base-finetuned-panx-ner / README.md

gunghio's picture

Add multilingual to the language tag (#1)

7c99146 almost 2 years ago

|

history blame contribute delete

1.97 kB

	---
	language:
	- it
	- en
	- de
	- fr
	- es
	- multilingual
	license:
	- mit
	datasets:
	- xtreme
	metrics:
	- precision: 0.874
	- recall: 0.88
	- f1: 0.877
	- accuracy: 0.943
	inference:
	parameters:
	aggregation_strategy: first
	---

	# gunghio/xlm-roberta-base-finetuned-panx-ner

	This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

	`xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

	Only 75% of the whole dataset was used.

	## Intended uses & limitations

	Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

	## Training and evaluation data

	Training dataset: [xtreme](https://huggingface.co/datasets/xtreme)

	### Training results

	It achieves the following results on the evaluation set:

	- Precision: 0.8744154472771157
	- Recall: 0.8791424269015351
	- F1: 0.8767725659462058
	- Accuracy: 0.9432040948504613

	Details:

	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| PER \| 0.922 \| 0.908 \| 0.915 \| 26639 \|
	\| LOC \| 0.880 \| 0.906 \| 0.892 \| 37623 \|
	\| ORG \| 0.821 \| 0.816 \| 0.818 \| 28045 \|
	\| Overall \| 0.874 \| 0.879 \| 0.877 \| 92307 \|

	## Usage

	Set aggregation stragey according to [documentation](https://huggingface.co/docs/transformers/v4.18.0/en/main_classes/pipelines#transformers.TokenClassificationPipeline).

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	from transformers import pipeline

	tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
	model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

	nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
	example = "My name is Wolfgang and I live in Berlin"

	ner_results = nlp(example)
	print(ner_results)
	```