BERT fine-tuned for Named Entity Recognition (CoNLL-2003)

A fine-tuned version of bert-base-cased for Named Entity Recognition (NER), trained on the CoNLL-2003 English dataset as part of working through the Hugging Face LLM Course, Chapter 7. It achieves the following results on the evaluation set:

Loss: 0.0599
Precision: 0.9319
Recall: 0.9507
F1: 0.9412
Accuracy: 0.9867

Model details

Attribute	Value
Base model	`bert-base-cased`
Architecture	Transformer Encoder (BERT)
Task	Token Classification (NER)
Training dataset	CoNLL-2003 (English)
Training epochs	3
Learning rate	2e-5
Weight decay	0.01
Hardware	Google Colab (T4 GPU)

Entity types

The model recognises four entity types in IOB2 format:

Label	Description
PER	Person
ORG	Organisation
LOC	Location
MISC	Miscellaneous

Usage

from transformers import pipeline

ner = pipeline(
    "token-classification",
    model="AlexStamp/bert-finetuned-ner",
    aggregation_strategy="simple"
)

ner("Alexis works at CERN in Switzerland.")

Training procedure

Fine-tuning was performed using the Hugging Face Trainer API with DataCollatorForTokenClassification and evaluated using the seqeval library, which computes entity-level F1 — stricter than token-level accuracy since the entire entity span must be correctly identified.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Precision	Recall	F1	Accuracy
0.0759	1.0	1756	0.0651	0.8905	0.9310	0.9103	0.9812
0.0355	2.0	3512	0.0681	0.9321	0.9473	0.9397	0.9853
0.0224	3.0	5268	0.0599	0.9319	0.9507	0.9412	0.9867

Framework versions

Transformers 5.12.0
Pytorch 2.11.0+cu128
Datasets 4.0.0
Tokenizers 0.22.2

Limitations

Trained on English news wire text (Reuters corpus); may generalise poorly to other domains or languages
bert-base-cased is case-sensitive by design, which is appropriate for NER but means casing errors in input text can degrade performance

Notes

This model was trained as a portfolio exercise. The base model choice (bert-base-cased over bert-base-uncased) is deliberate — NER is case-sensitive since capitalisation is a strong signal for entity detection.

Downloads last month: 77

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for AlexStamp/bert-finetuned-ner

Base model

google-bert/bert-base-cased

Finetuned

(2920)

this model

AlexStamp
/

bert-finetuned-ner