Extended BERT-base-NER
Model Description
Extended BERT-base-NER is a fine-tuned BERT model that extends the original bert-base-NER with 10 additional entity types for comprehensive named entity recognition.
Entity Types (14 total)
Original (4):
- PER (Person) - Names of people
- ORG (Organization) - Company names, institutions
- LOC (Location) - Places, cities, countries
- MISC (Miscellaneous) - Other named entities
New (10):
- MED (Medicine) - Medicine names, drug names
- ZIP (Zip Code) - Postal codes, ZIP codes
- COUNTRY_CODE - Country codes (US, UK, CA, etc.)
- STATE - States, provinces, regions
- ETHNICITY - Ethnic groups, cultural backgrounds
- RACE - Racial categories
- CONTINENT - Continents (North America, Europe, etc.)
- TERRITORY - Territories, dependencies
- PHONE - Phone numbers
- EMAIL - Email addresses
Usage
Using Transformers Pipeline
from transformers import pipeline
# Load the model
nlp = pipeline("ner", model="BikashML/extended-bert-base-ner", aggregation_strategy="simple")
# Example text
text = "Dr. Maria Garcia prescribed Aspirin for the patient from California, USA. Contact her at maria.garcia@hospital.com or call 555-123-4567."
# Get predictions
results = nlp(text)
# Print results
for entity in results:
print(f"{entity['word']} -> {entity['entity_group']} (confidence: {entity['score']:.3f})")
Expected Output
Dr. Maria Garcia -> PER (confidence: 0.660) Aspirin -> MED (confidence: 0.401) California -> LOC (confidence: 0.261) USA -> STATE (confidence: 0.372) maria.garcia@hospital.com -> EMAIL (confidence: 0.700) 555-123-4567 -> PHONE (confidence: 0.713)
Model Architecture
- Base Model: bert-base-cased
- Architecture: BertForTokenClassification
- Parameters: 110M
- Total Labels: 29 (BIO tagging scheme)
- Max Sequence Length: 512 tokens
Training Data
This model was trained on:
- Base Dataset: CoNLL-2003 Named Entity Recognition dataset
- Extended Data: 69 custom annotated examples
- Entity Types: All 14 entity types with diverse examples
- Training Approach: Fine-tuning from bert-base-NER
Use Cases
- Medical Records: Extract patient information, medications, contact details
- Business Documents: Identify companies, locations, contact information
- Personal Data: Extract names, addresses, phone numbers, emails
- Geographic Data: Identify locations, states, countries, territories
- Demographic Analysis: Extract ethnicity, race, geographic information
Limitations
- Language: English only
- Domain: May perform better on domains similar to training data
- Entity Boundaries: May occasionally misclassify entity boundaries
Citation
@misc{extended-bert-base-ner,
title={Extended BERT-base-NER: Multi-domain Named Entity Recognition},
author={BikashML},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/BikashML/extended-bert-base-ner}
}
License
This model is licensed under the MIT License.
- Downloads last month
- 13