Extended BERT-base-NER

Model Description

Extended BERT-base-NER is a fine-tuned BERT model that extends the original bert-base-NER with 10 additional entity types for comprehensive named entity recognition.

Entity Types (14 total)

Original (4):

  • PER (Person) - Names of people
  • ORG (Organization) - Company names, institutions
  • LOC (Location) - Places, cities, countries
  • MISC (Miscellaneous) - Other named entities

New (10):

  • MED (Medicine) - Medicine names, drug names
  • ZIP (Zip Code) - Postal codes, ZIP codes
  • COUNTRY_CODE - Country codes (US, UK, CA, etc.)
  • STATE - States, provinces, regions
  • ETHNICITY - Ethnic groups, cultural backgrounds
  • RACE - Racial categories
  • CONTINENT - Continents (North America, Europe, etc.)
  • TERRITORY - Territories, dependencies
  • PHONE - Phone numbers
  • EMAIL - Email addresses

Usage

Using Transformers Pipeline

from transformers import pipeline

# Load the model
nlp = pipeline("ner", model="BikashML/extended-bert-base-ner", aggregation_strategy="simple")

# Example text
text = "Dr. Maria Garcia prescribed Aspirin for the patient from California, USA. Contact her at maria.garcia@hospital.com or call 555-123-4567."

# Get predictions
results = nlp(text)

# Print results
for entity in results:
    print(f"{entity['word']} -> {entity['entity_group']} (confidence: {entity['score']:.3f})")

Expected Output

Dr. Maria Garcia -> PER (confidence: 0.660) Aspirin -> MED (confidence: 0.401) California -> LOC (confidence: 0.261) USA -> STATE (confidence: 0.372) maria.garcia@hospital.com -> EMAIL (confidence: 0.700) 555-123-4567 -> PHONE (confidence: 0.713)

Model Architecture

  • Base Model: bert-base-cased
  • Architecture: BertForTokenClassification
  • Parameters: 110M
  • Total Labels: 29 (BIO tagging scheme)
  • Max Sequence Length: 512 tokens

Training Data

This model was trained on:

  • Base Dataset: CoNLL-2003 Named Entity Recognition dataset
  • Extended Data: 69 custom annotated examples
  • Entity Types: All 14 entity types with diverse examples
  • Training Approach: Fine-tuning from bert-base-NER

Use Cases

  • Medical Records: Extract patient information, medications, contact details
  • Business Documents: Identify companies, locations, contact information
  • Personal Data: Extract names, addresses, phone numbers, emails
  • Geographic Data: Identify locations, states, countries, territories
  • Demographic Analysis: Extract ethnicity, race, geographic information

Limitations

  • Language: English only
  • Domain: May perform better on domains similar to training data
  • Entity Boundaries: May occasionally misclassify entity boundaries

Citation

@misc{extended-bert-base-ner,
  title={Extended BERT-base-NER: Multi-domain Named Entity Recognition},
  author={BikashML},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/BikashML/extended-bert-base-ner}
}

License

This model is licensed under the MIT License.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support