metadata

license: apache-2.0
language:
  - en
metrics:
  - accuracy
  - precision
  - recall
  - f1
datasets:
  - eriktks/conll2003
base_model:
  - google-bert/bert-base-cased

BERT Base Uncased Fine-Tuned on CoNLL2003 for English Named Entity Recognition (NER)

This model is a fine-tuned version of BERT-base-cased on the CoNLL2003 dataset for Named Entity Recognition (NER) in English. The CoNLL2003 dataset contains four types of named entities: Person (PER), Location (LOC), Organization (ORG), and Miscellaneous (MISC).

Model Details

Model Architecture: BERT (Bidirectional Encoder Representations from Transformers)
Pre-trained Base Model: bert-base-cased
Dataset: CoNLL2003 (NER task)
Languages: English
Fine-tuned for: Named Entity Recognition (NER)
Entities recognized:
PER: Person
LOC: Location
ORG: Organization
MISC: Miscellaneous entities

Use Cases

This model is ideal for tasks that require identifying and classifying named entities within English text, such as:

Information extraction from unstructured text
Content classification and tagging
Automated text summarization
Question answering systems with a focus on entity recognition

How to Use

To use this model in your code, you can load it via Hugging Face’s Transformers library:

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("MrRobson9/bert-base-cased-finetuned-conll2003-english-ner")
model = AutoModelForTokenClassification.from_pretrained("MrRobson9/bert-base-cased-finetuned-conll2003-english-ner")

nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer)
result = nlp_ner("John lives in New York and works for the United Nations.")
print(result)

Performance

accuracy	precision	recall	f1-score
0.991	0.946	0.953	0.950

License

This model is licensed under the same terms as the BERT-base-cased model and the CoNLL2003 dataset. Please ensure compliance with all respective licenses when using this model.