gunghio
/

xlm-roberta-base-finetuned-panx-ner

Token Classification

Inference Endpoints

Model card Files Files and versions Community

xlm-roberta-base-finetuned-panx-ner / README.md

gunghio's picture

Update README.md

aa584f8 over 2 years ago

|

1.75 kB

	---
	license:
	- mit
	datasets:
	- xtreme
	language:
	- it
	- en
	- de
	- fr
	- es
	metrics:
	- precision: 0.874
	- recall: 0.880
	- f1: 0.877
	- accuracy: 0.943
	---

	# gunghio/xlm-roberta-base-finetuned-panx-ner

	This model was trained starting from xlm-roberta-base on a subset of xtreme dataset.

	`xtreme` datasets subsets used are: PAN-X.{lang}. Language used for training/validation are: italian, english, german, french and spanish.

	Only 75% of the whole dataset was used.

	## Intended uses & limitations

	Fine-tuned model can be used for Named Entity Recognition in it, en, de, fr, and es.

	## Training and evaluation data

	Training dataset: [conll2003](https://huggingface.co/datasets/xtreme)

	### Training results

	It achieves the following results on the evaluation set:

	- Precision: 0.8744154472771157
	- Recall: 0.8791424269015351
	- F1: 0.8767725659462058
	- Accuracy: 0.9432040948504613

	Details:

	\| Label \| Precision \| Recall \| F1-Score \| Support \|
	\|---------\|-----------\|--------\|----------\|---------\|
	\| PER \| 0.922 \| 0.908 \| 0.915 \| 26639 \|
	\| LOC \| 0.880 \| 0.906 \| 0.892 \| 37623 \|
	\| ORG \| 0.821 \| 0.816 \| 0.818 \| 28045 \|
	\| Overall \| 0.874 \| 0.879 \| 0.877 \| 92307 \|

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForTokenClassification
	from transformers import pipeline

	tokenizer = AutoTokenizer.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")
	model = AutoModelForTokenClassification.from_pretrained("gunghio/xlm-roberta-base-finetuned-panx-ner")

	nlp = pipeline("ner", model=model, tokenizer=tokenizer, aggregation_strategy="first")
	example = "My name is Wolfgang and I live in Berlin"

	ner_results = nlp(example)
	print(ner_results)
	```