BERT Fine-Tuned for Named Entity Recognition (NER)

This repository contains a BERT model fine-tuned for Named Entity Recognition (NER) tasks. The model was fine-tuned using the Hugging Face transformers library and is capable of recognizing named entities like people, locations, organizations, and more from text.

Model Details

Model Architecture: BERT-base
Fine-Tuning Task: Named Entity Recognition (NER)
Dataset Used: This model was fine-tuned on the CoNLL-2003 NER dataset, which includes labeled data for entities such as persons, organizations, locations, and miscellaneous.
Intended Use: The model is suitable for NER tasks in various applications, including information extraction, question answering, and chatbots.

Usage

You can use this model with the Hugging Face transformers library to quickly get started with NER tasks. Below is an example of how to load and use this model for inference.

Installation

First, make sure you have the required packages:

pip install transformers

Loading the Model

from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("heenamir/bert-finetuned-ner")
model = AutoModelForTokenClassification.from_pretrained("heenamir/bert-finetuned-ner")

# Initialize the NER pipeline
nlp = pipeline("ner", model=model, tokenizer=tokenizer)

# Example text
text = "John Doe is a software engineer at OpenAI in San Francisco."

# Perform NER
entities = nlp(text)
print(entities)

Example Output

The model will return a list of entities in the following format:

[
  {"entity": "B-PER", "score": 0.99, "index": 1, "word": "John", "start": 0, "end": 4},
  {"entity": "I-PER", "score": 0.98, "index": 2, "word": "Doe", "start": 5, "end": 8},
  {"entity": "B-ORG", "score": 0.95, "index": 7, "word": "OpenAI", "start": 28, "end": 34},
  {"entity": "B-LOC", "score": 0.97, "index": 10, "word": "San Francisco", "start": 38, "end": 51},
]

Entity Labels

The model is fine-tuned to detect the following entity types:

PER: Person
ORG: Organization
LOC: Location
MISC: Miscellaneous

Scoring

The model outputs a score for each detected entity, representing its confidence level. You can use these scores to filter out low-confidence predictions if needed.

Model Performance

The model's performance can vary depending on the complexity and context of the input text. It performs well on structured text but may struggle with informal or highly technical language.

Evaluation Metrics

The model was evaluated on the CoNLL-2003 test set with the following metrics:

Precision: 93.04%
Recall: 94.98%
F1 Score: 94%

Limitations and Considerations

The model may not perform well on texts outside of the domains it was trained on.
Like all NER models, it may occasionally misclassify entities or fail to recognize them, especially in cases of polysemy or ambiguity.
It is also limited to English text, as it was fine-tuned on an English dataset.

Credits

Fine-tuning and Model: Heena Mirchandani & Krish Murjani
Dataset: CoNLL-2003 NER dataset

License

This model is available for use under the Apache License 2.0. See the LICENSE file for more details.

For more details on BERT and Named Entity Recognition, refer to the Hugging Face documentation.

heenamir
/

bert-finetuned-ner