|
--- |
|
datasets: |
|
- eriktks/conll2003 |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
base_model: |
|
- google-bert/bert-base-cased |
|
pipeline_tag: token-classification |
|
library_name: transformers |
|
--- |
|
|
|
## Dataset Used |
|
|
|
This model was trained on the [CoNLL 2003 dataset](https://huggingface.co/datasets/eriktks/conll2003) for Named Entity Recognition (NER) tasks. |
|
|
|
The dataset includes the following labels: |
|
- `O`, `B-PER`, `I-PER`, `B-ORG`, `I-ORG`, `B-LOC`, `I-LOC`, `B-MISC`, `I-MISC` |
|
|
|
For detailed descriptions of these labels, please refer to the [dataset card](https://huggingface.co/datasets/eriktks/conll2003). |
|
|
|
## Model Training Details |
|
|
|
### Training Arguments |
|
- **Model Architecture**: `bert-base-cased` for token classification |
|
- **Learning Rate**: `2e-5` |
|
- **Number of Epochs**: `20` |
|
- **Weight Decay**: `0.01` |
|
- **Evaluation Strategy**: `epoch` |
|
- **Save Strategy**: `epoch` |
|
|
|
*Additional default parameters from the Hugging Face Transformers library were used.* |
|
|
|
## Evaluation Results |
|
|
|
### Validation Set Performance |
|
- **Overall Metrics**: |
|
- Precision: 94.44% |
|
- Recall: 95.74% |
|
- F1 Score: 95.09% |
|
- Accuracy: 98.73% |
|
|
|
#### Per-Label Performance |
|
| Entity Type | Precision | Recall | F1 Score | |
|
|------------|-----------|--------|----------| |
|
| LOC | 97.27% | 97.11% | 97.19% | |
|
| MISC | 87.46% | 91.54% | 89.45% | |
|
| ORG | 93.37% | 93.44% | 93.40% | |
|
| PER | 96.02% | 98.15% | 97.07% | |
|
|
|
### Test Set Performance |
|
- **Overall Metrics**: |
|
- Precision: 89.90% |
|
- Recall: 91.91% |
|
- F1 Score: 90.89% |
|
- Accuracy: 97.27% |
|
|
|
#### Per-Label Performance |
|
| Entity Type | Precision | Recall | F1 Score | |
|
|------------|-----------|--------|----------| |
|
| LOC | 92.87% | 92.87% | 92.87% | |
|
| MISC | 75.55% | 82.76% | 78.99% | |
|
| ORG | 88.32% | 90.61% | 89.45% | |
|
| PER | 95.28% | 96.23% | 95.75% | |
|
|
|
## How to Use the Model |
|
|
|
You can load the model directly from the Hugging Face Model Hub: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
# Replace with your specific model checkpoint |
|
model_checkpoint = "Prikshit7766/bert-finetuned-ner" |
|
token_classifier = pipeline( |
|
"token-classification", |
|
model=model_checkpoint, |
|
aggregation_strategy="simple" |
|
) |
|
|
|
# Example usage |
|
result = token_classifier("My name is Sylvain and I work at Hugging Face in Brooklyn.") |
|
print(result) |
|
``` |
|
|
|
### Example Output |
|
```python |
|
[ |
|
{ |
|
"entity_group":"PER", |
|
"score":0.9999881, |
|
"word":"Sylvain", |
|
"start":11, |
|
"end":18 |
|
}, |
|
{ |
|
"entity_group":"ORG", |
|
"score":0.99961376, |
|
"word":"Hugging Face", |
|
"start":33, |
|
"end":45 |
|
}, |
|
{ |
|
"entity_group":"LOC", |
|
"score":0.99989843, |
|
"word":"Brooklyn", |
|
"start":49, |
|
"end":57 |
|
} |
|
] |
|
``` |