|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- eriktks/conll2003 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
base_model: |
|
- distilbert/distilbert-base-cased |
|
--- |
|
|
|
# DistilBERT Base Cased Fine-Tuned on CoNLL2003 for English Named Entity Recognition (NER) |
|
|
|
This model is a fine-tuned version of [DistilBERT-base-cased](https://huggingface.co/distilbert/distilbert-base-cased) on the [CoNLL2003](https://huggingface.co/datasets/eriktks/conll2003) dataset for Named Entity Recognition (NER) in English. The CoNLL2003 dataset contains four types of named entities: Person (PER), Location (LOC), Organization (ORG), and Miscellaneous (MISC). |
|
|
|
## Model Details |
|
- Model Architecture: BERT (Bidirectional Encoder Representations from Transformers) |
|
- Pre-trained Base Model: bert-base-cased |
|
- Dataset: CoNLL2003 (NER task) |
|
- Languages: English |
|
- Fine-tuned for: Named Entity Recognition (NER) |
|
- Entities recognized: |
|
- PER: Person |
|
- LOC: Location |
|
- ORG: Organization |
|
- MISC: Miscellaneous entities |
|
|
|
## Use Cases |
|
This model is ideal for tasks that require identifying and classifying named entities within English text, such as: |
|
|
|
- Information extraction from unstructured text |
|
- Content classification and tagging |
|
- Automated text summarization |
|
- Question answering systems with a focus on entity recognition |
|
|
|
## How to Use |
|
To use this model in your code, you can load it via Hugging Face’s Transformers library: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
from transformers import pipeline |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("MrRobson9/distilbert-base-cased-finetuned-conll2003-english-ner") |
|
model = AutoModelForTokenClassification.from_pretrained("MrRobson9/distilbert-base-cased-finetuned-conll2003-english-ner") |
|
|
|
nlp_ner = pipeline("ner", model=model, tokenizer=tokenizer) |
|
result = nlp_ner("John lives in New York and works for the United Nations.") |
|
print(result) |
|
``` |
|
|
|
## Performance |
|
|accuracy |precision |recall |f1-score| |
|
|:-------:|:--------:|:-----:|:------:| |
|
| 0.987 | 0.937 | 0.941 | 0.939 | |
|
|
|
## License |
|
This model is licensed under the same terms as the BERT-base-cased model and the CoNLL2003 dataset. Please ensure compliance with all respective licenses when using this model. |