--- license: apache-2.0 datasets: - eriktks/conll2003 language: - en metrics: - accuracy base_model: - google-bert/bert-base-cased pipeline_tag: token-classification library_name: transformers tags: - code --- # BERT Fine-Tuned for Named Entity Recognition (NER) This repository contains a BERT model fine-tuned for Named Entity Recognition (NER) tasks. The model was fine-tuned using the Hugging Face `transformers` library and is capable of recognizing named entities like people, locations, organizations, and more from text. ## Model Details - **Model Architecture**: `BERT-base` - **Fine-Tuning Task**: Named Entity Recognition (NER) - **Dataset Used**: This model was fine-tuned on the [CoNLL-2003](https://www.aclweb.org/anthology/W03-0419) NER dataset, which includes labeled data for entities such as persons, organizations, locations, and miscellaneous. - **Intended Use**: The model is suitable for NER tasks in various applications, including information extraction, question answering, and chatbots. ## Usage You can use this model with the Hugging Face `transformers` library to quickly get started with NER tasks. Below is an example of how to load and use this model for inference. ### Installation First, make sure you have the required packages: ```bash pip install transformers ``` ### Loading the Model ``` from transformers import AutoTokenizer, AutoModelForTokenClassification from transformers import pipeline # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained("heenamir/bert-finetuned-ner") model = AutoModelForTokenClassification.from_pretrained("heenamir/bert-finetuned-ner") # Initialize the NER pipeline nlp = pipeline("ner", model=model, tokenizer=tokenizer) # Example text text = "John Doe is a software engineer at OpenAI in San Francisco." # Perform NER entities = nlp(text) print(entities) ``` ### Example Output The model will return a list of entities in the following format: ``` [ {"entity": "B-PER", "score": 0.99, "index": 1, "word": "John", "start": 0, "end": 4}, {"entity": "I-PER", "score": 0.98, "index": 2, "word": "Doe", "start": 5, "end": 8}, {"entity": "B-ORG", "score": 0.95, "index": 7, "word": "OpenAI", "start": 28, "end": 34}, {"entity": "B-LOC", "score": 0.97, "index": 10, "word": "San Francisco", "start": 38, "end": 51}, ] ``` ### Entity Labels The model is fine-tuned to detect the following entity types: * **PER**: Person * **ORG**: Organization * **LOC**: Location * **MISC**: Miscellaneous ### Scoring The model outputs a score for each detected entity, representing its confidence level. You can use these scores to filter out low-confidence predictions if needed. ## Model Performance The model's performance can vary depending on the complexity and context of the input text. It performs well on structured text but may struggle with informal or highly technical language. ### Evaluation Metrics The model was evaluated on the CoNLL-2003 test set with the following metrics: * **Precision**: 93.04% * **Recall**: 94.98% * **F1 Score**: 94% ## Limitations and Considerations * The model may not perform well on texts outside of the domains it was trained on. * Like all NER models, it may occasionally misclassify entities or fail to recognize them, especially in cases of polysemy or ambiguity. * It is also limited to English text, as it was fine-tuned on an English dataset. ## Credits * Fine-tuning and Model: [Heena Mirchandani](https://huggingface.co/heenamir) & [Krish Murjani](https://huggingface.co/krishmurjani) * Dataset: CoNLL-2003 NER dataset ## License This model is available for use under the Apache License 2.0. See the LICENSE file for more details. --- For more details on BERT and Named Entity Recognition, refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers).