File size: 1,695 Bytes
a51a9dd
 
af59147
 
 
 
 
 
 
 
 
 
 
a51a9dd
 
af59147
a51a9dd
af59147
a51a9dd
ef74b67
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
 
a51a9dd
af59147
 
a51a9dd
af59147
 
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
7beedd8
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
a51a9dd
af59147
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
library_name: transformers
license: apache-2.0
datasets:
- conll2003
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: token-classification
---

# bert-base-cased-finetuned-conll2003-ner-v2

BERT ("bert-base-cased") finetuned on CoNLL-2003 (Conference on Computational Natural Language Learning).

The model performs named entity recognition (NER). It pertains to section 2 of chapter 7 of the Hugging Face "NLP Course" (https://huggingface.co/learn/nlp-course/chapter7/2).

It was trained using a custom PyTorch loop with Hugging Face Accelerate.

Code: https://github.com/sambitmukherjee/huggingface-notebooks/blob/main/course/en/chapter7/section2_pt.ipynb

Experiment tracking: https://wandb.ai/sadhaklal/bert-base-cased-finetuned-conll2003-ner-v2

## Usage

```
from transformers import pipeline

model_checkpoint = "sadhaklal/bert-base-cased-finetuned-conll2003-ner-v2"
token_classifier = pipeline("token-classification", model=model_checkpoint, aggregation_strategy="simple")

print(token_classifier("My name is Sylvain and I work at Hugging Face in Brooklyn."))
```

## Dataset

From the dataset page:

> The shared task of CoNLL-2003 concerns language-independent named entity recognition. We will concentrate on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.

Examples: https://huggingface.co/datasets/conll2003/viewer

## Metrics

Accuracy on the 'validation' split of CoNLL-2003: 0.9858

Precision on the 'validation' split of CoNLL-2003: 0.9243

Recall on the 'validation' split of CoNLL-2003: 0.947

F1 on the 'validation' split of CoNLL-2003: 0.9355