metadata
license: apache-2.0
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
datasets:
- eriktks/conll2003
base_model:
- google-bert/bert-base-cased
BERT Base Uncased Fine-Tuned on CoNLL2003 for English Named Entity Recognition (NER)
This model is a fine-tuned version of BERT-base-cased on the CoNLL2003 dataset for Named Entity Recognition (NER) in English. The CoNLL2003 dataset contains four types of named entities: Person (PER), Location (LOC), Organization (ORG), and Miscellaneous (MISC).
Model Details
- Model Architecture: BERT (Bidirectional Encoder Representations from Transformers)
- Pre-trained Base Model: bert-base-cased
- Dataset: CoNLL2003 (NER task)
- Languages: English
- Fine-tuned for: Named Entity Recognition (NER)
- Entities recognized:
- PER: Person
- LOC: Location
- ORG: Organization
- MISC: Miscellaneous entities
Use Cases
This model is ideal for tasks that require identifying and classifying named entities within English text, such as:
- Information extraction from unstructured text
- Content classification and tagging
- Automated text summarization
- Question answering systems with a focus on entity recognition
How to Use
To use this model in your code, you can load it via Hugging Face’s Transformers library:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("MrRobson9/bert-base-cased-finetuned-conll2003-english-ner")
model = AutoModelForTokenClassification.from_pretrained("MrRobson9/bert-base-cased-finetuned-conll2003-english-ner")
nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer)
result = nlp_ner("John lives in New York and works for the United Nations.")
print(result)
Performance
accuracy | precision | recall | f1-score |
---|---|---|---|
0.991 | 0.946 | 0.953 | 0.950 |
License
This model is licensed under the same terms as the BERT-base-cased model and the CoNLL2003 dataset. Please ensure compliance with all respective licenses when using this model.