metadata
base_model: roberta-base
datasets:
- YurtsAI/named_entity_recognition_document_context
language:
- en
library_name: span-marker
metrics:
- precision
- recall
- f1
pipeline_tag: token-classification
tags:
- span-marker
- token-classification
- ner
- named-entity-recognition
- generated_from_span_marker_trainer
widget:
- text: >-
We have Kanye West, Beyoncé, and Taylor Swift performing at the beachside
park on the island of Maui.
- text: >-
This book, published by Epic Games and sponsored by the University of
Hawaii, features recipes inspired by the popular game League of Legends
and a foreword by renowned food scholar, Dr. Thomas Johnson, a professor
at Harvard University.
- text: >-
The National Institute of Technology has partnered with CafeCorp to
provide a menu planning template for businesses in the downtown area.
- text: >-
The marketing efforts for the Chicago Bulls basketball team in Wrigley
Park were a huge success, with 80% of attendees speaking Spanish.
- text: >-
The most important thing was to try using the coconut oil from a tiny
store near the river, and a sprinkle of Japanese spices I learned from my
friend who speaks fluent Japanese.
model-index:
- name: >-
SpanMarker with roberta-base on
YurtsAI/named_entity_recognition_document_context
results:
- task:
type: token-classification
name: Named Entity Recognition
dataset:
name: Unknown
type: YurtsAI/named_entity_recognition_document_context
split: eval
metrics:
- type: f1
value: 0.3902777777777778
name: F1
- type: precision
value: 0.6189427312775331
name: Precision
- type: recall
value: 0.28498985801217036
name: Recall
SpanMarker with roberta-base on YurtsAI/named_entity_recognition_document_context
This is a SpanMarker model trained on the YurtsAI/named_entity_recognition_document_context dataset that can be used for Named Entity Recognition. This SpanMarker model uses roberta-base as the underlying encoder.
Model Details
Model Description
- Model Type: SpanMarker
- Encoder: roberta-base
- Maximum Sequence Length: 256 tokens
- Maximum Entity Length: 11 words
- Training Dataset: YurtsAI/named_entity_recognition_document_context
- Language: en
Model Sources
- Repository: SpanMarker on GitHub
- Thesis: SpanMarker For Named Entity Recognition
Model Labels
Label | Examples |
---|---|
art-broadcastprogram | "television program", "Origin of the Gods", "reality show" |
art-film | "a video of a successful grant proposal", "'The Matrix '", "film crew" |
art-music | "a new album by Beyoncé", "Yesterday by The Beatles", "favorite music CD" |
art-other | "art therapy", "play", "Mona Lisa" |
art-painting | "vibrant street art scene", "through art", "painting" |
art-writtenart | "'The Lost Gods '", "Book 1", "environmental science book" |
building-airport | "airport", "major airport", "an airport" |
building-hospital | "New York hospital", "local hospital", "hospital" |
building-hotel | "hotel", "new hotel in Austin", "a giant hotel" |
building-library | "new library", "library", "new , state-of-the-art library" |
building-other | "10-story building", "headquarters building", "factory building" |
building-restaurant | "new restaurant", "our upscale restaurant", "restaurant" |
building-sportsfacility | "sports facility", "Union Park Sports Complex", "city 's sports center" |
building-theater | "the local theater", "theater in downtown", "theater" |
datetime-absolute | "January 10 , 2020", "January 17 , 2025 at 14:00", "March 25th" |
datetime-authored | "2023-02-22", "2019-04-15", "2020-02-15" |
datetime-range | "2010-2015", "Q4 2019", "Friday to Sunday" |
datetime-relative | "next week 's appointment", "last Saturday", "next week" |
event-attack/battle/war/militaryconflict | "attacks/wars", "The", "A" |
event-disaster | "My", "To", "disaster" |
event-election | "the election for the mayor", "upcoming election", "election season" |
event-other | "conference", "annual 4th of july BBQ", "charity gala" |
event-protest | "protest", "protest last saturday", "protest rally" |
event-sportsevent | "sports event", "annual tennis tournament", "biggest sports event of the year" |
location-bodiesofwater | "ocean", "Lake Como", "Lake Michigan" |
location-gpe | "Italy", "Texas", "city" |
location-island | "Island Radio", "Caribbean island", "island" |
location-mountain | "mountain terrain", "the mountain", "mountain" |
location-other | "low-lying areas of the city", "advertising hub", "backyard" |
location-park | "park", "location-park", "the park" |
location-road/railway/highway/transit | "Greyhound network", "road", "train journey" |
organization-company | "local company", "Verizon", "a company" |
organization-education | "Harvard University", "UW", "University of Arizona" |
organization-government/governmentagency | "Red Cross", "local government", "SEC" |
organization-media/newspaper | "The New York Times", "media organizations", "Army Times" |
organization-other | "Cognizant", "Better World Foundation", "conservation organization" |
organization-politicalparty | "Spaceship of Progress Party", "Libertarian Party", "Green Party" |
organization-religion | "local church", "the power of prayer", "diamatists" |
organization-showorganization | "Royal Shakespeare Company", "Earth 's Edge Theater Company", "Cosmic Theater group" |
organization-sportsleague | "International Swimming Federation", "NBA league", "NFL" |
organization-sportsteam | "soccer team", "Syracuse Orange football team", "Seattle Seahawks" |
other-astronomything | "latest discoveries in the field of astronomy", "Galactic Conference Best Recipe Award-winning recipe book", "astronomy camp" |
other-award | "other-award", "annual tech show awards", "Nobel Peace Prize" |
other-biologything | "salmon 's gene for cold adaptation", "terrain", "the forces that drive you" |
other-chemicalthing | "Overall", "The", "In" |
other-currency | "US dollars", "Japanese Yen", "$ 500,000" |
other-disease | "malaria", "type 1 diabetes", "the common cold" |
other-educationaldegree | "master 's degree", "thesis", "Ph.D in food science" |
other-god | "Peter Pan", "divine", "Zeus the god" |
other-language | "English", "Amharic", "Sanskrit" |
other-law | "legislation", "professorial separation laws", "Clean Air Act" |
other-livingthing | "We", "To", "flowers" |
other-medical | "antibiotics", "medical treatment", "necessary testing protocols" |
person-actor | "Emma Stone", "Dr. Steven Spielberg", "Jennifer Lawrence" |
person-artist/author | "Chuck Close", "artist 's new album", "Jane Smith" |
person-athlete | "athlete friend", "LeBron James", "John and Sally" |
person-director | "John Oliver", "favorite director", "Dr. Johnson" |
person-other | "your", "HR representative", "therapist or counselor" |
person-politician | "To", "At", "Secretary of State" |
person-scholar | "Dr. John Smith", "Dr. Johnson", "a scholar of comparative religion" |
person-soldier | "veterans", "the brave soldiers", "a soldier" |
product-airplane | "Cessna 172", "company 's fleet of private airplanes", "airline" |
product-car | "leased car", "your car", "car" |
product-food | "StarBites", "food truck business", "ice cream" |
product-game | "the 'Train to Nowhere ' game", "board game", "screen protector" |
product-other | "new medicine", "acting software", "table" |
product-ship | "research ship", "ship", "a ship" |
product-software | "software", "instruction manual", "pizza ordering app" |
product-train | "Universal Sonicator", "train", "the train" |
product-weapon | "Flip Flops", "Sno Blaster", "SecurityFirst" |
Evaluation
Metrics
Label | Precision | Recall | F1 |
---|---|---|---|
all | 0.6189 | 0.2850 | 0.3903 |
art-broadcastprogram | 0.0 | 0.0 | 0.0 |
art-film | 0.0 | 0.0 | 0.0 |
art-music | 0.6667 | 0.2 | 0.3077 |
art-other | 0.0 | 0.0 | 0.0 |
art-painting | 0.0 | 0.0 | 0.0 |
art-writtenart | 0.0 | 0.0 | 0.0 |
building-airport | 0.7143 | 0.7692 | 0.7407 |
building-hospital | 0.6667 | 0.7778 | 0.7179 |
building-hotel | 0.7857 | 0.6875 | 0.7333 |
building-library | 0.8182 | 0.75 | 0.7826 |
building-other | 0.0 | 0.0 | 0.0 |
building-restaurant | 0.8571 | 0.375 | 0.5217 |
building-sportsfacility | 0.6667 | 0.5 | 0.5714 |
building-theater | 0.9 | 0.5625 | 0.6923 |
datetime-absolute | 0.3333 | 0.0769 | 0.125 |
datetime-authored | 0.55 | 0.8462 | 0.6667 |
datetime-range | 0.75 | 0.5 | 0.6 |
datetime-relative | 0.0 | 0.0 | 0.0 |
event-attack/battle/war/militaryconflict | 0.8 | 0.2857 | 0.4211 |
event-disaster | 0.5385 | 0.5 | 0.5185 |
event-election | 0.75 | 0.5 | 0.6 |
event-other | 0.0 | 0.0 | 0.0 |
event-protest | 0.5455 | 0.4615 | 0.5000 |
event-sportsevent | 0.625 | 0.3846 | 0.4762 |
location-bodiesofwater | 0.8333 | 0.3571 | 0.5 |
location-gpe | 0.375 | 0.2143 | 0.2727 |
location-island | 0.7143 | 0.3333 | 0.4545 |
location-mountain | 0.5882 | 0.625 | 0.6061 |
location-other | 0.0 | 0.0 | 0.0 |
location-park | 0.6667 | 0.5 | 0.5714 |
location-road/railway/highway/transit | 0.8 | 0.5333 | 0.64 |
organization-company | 0.0 | 0.0 | 0.0 |
organization-education | 0.3077 | 0.2857 | 0.2963 |
organization-government/governmentagency | 0.25 | 0.0909 | 0.1333 |
organization-media/newspaper | 0.5833 | 0.4667 | 0.5185 |
organization-other | 1.0 | 0.0769 | 0.1429 |
organization-politicalparty | 0.75 | 0.2727 | 0.4000 |
organization-religion | 1.0 | 0.3077 | 0.4706 |
organization-showorganization | 0.75 | 0.25 | 0.375 |
organization-sportsleague | 0.8571 | 0.4286 | 0.5714 |
organization-sportsteam | 0.4286 | 0.5 | 0.4615 |
other-astronomything | 0.0 | 0.0 | 0.0 |
other-award | 1.0 | 0.2143 | 0.3529 |
other-biologything | 0.0 | 0.0 | 0.0 |
other-chemicalthing | 0.4 | 0.3077 | 0.3478 |
other-currency | 1.0 | 0.2143 | 0.3529 |
other-disease | 0.5714 | 0.3077 | 0.4 |
other-educationaldegree | 0.5833 | 0.5833 | 0.5833 |
other-god | 0.8 | 0.2222 | 0.3478 |
other-language | 0.8 | 0.2857 | 0.4211 |
other-law | 0.6667 | 0.5 | 0.5714 |
other-livingthing | 0.0 | 0.0 | 0.0 |
other-medical | 0.0 | 0.0 | 0.0 |
person-actor | 0.3448 | 0.5 | 0.4082 |
person-artist/author | 0.6667 | 0.1429 | 0.2353 |
person-athlete | 0.6667 | 0.2353 | 0.3478 |
person-director | 0.2 | 0.0714 | 0.1053 |
person-other | 0.0 | 0.0 | 0.0 |
person-politician | 0.6667 | 0.0952 | 0.1667 |
person-scholar | 0.4118 | 0.4667 | 0.4375 |
person-soldier | 0.0 | 0.0 | 0.0 |
product-airplane | 0.75 | 0.3333 | 0.4615 |
product-car | 1.0 | 0.2143 | 0.3529 |
product-food | 0.0 | 0.0 | 0.0 |
product-game | 1.0 | 0.1333 | 0.2353 |
product-other | 0.5 | 0.0909 | 0.1538 |
product-ship | 0.75 | 0.3 | 0.4286 |
product-software | 1.0 | 0.4167 | 0.5882 |
product-train | 0.5556 | 0.3571 | 0.4348 |
product-weapon | 0.3333 | 0.0625 | 0.1053 |
Uses
Direct Use for Inference
from span_marker import SpanMarkerModel
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("YurtsAI/named_entity_recognition_document_context")
# Run inference
entities = model.predict("We have Kanye West, Beyoncé, and Taylor Swift performing at the beachside park on the island of Maui.")
Downstream Use
You can finetune this model on your own dataset.
Click to expand
from span_marker import SpanMarkerModel, Trainer
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("YurtsAI/named_entity_recognition_document_context")
# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003
# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("YurtsAI/named_entity_recognition_document_context-finetuned")
Training Details
Training Set Metrics
Training set | Min | Median | Max |
---|---|---|---|
Sentence length | 1 | 18.4126 | 309 |
Entities per sentence | 0 | 0.9794 | 5 |
Training Hyperparameters
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 4
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 3
Training Results
Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
---|---|---|---|---|---|---|
0.4322 | 500 | 0.0503 | 0.0 | 0.0 | 0.0 | 0.8898 |
0.8643 | 1000 | 0.0435 | 1.0 | 0.0010 | 0.0020 | 0.8900 |
1.2965 | 1500 | 0.0383 | 0.2841 | 0.0254 | 0.0466 | 0.8908 |
1.7286 | 2000 | 0.0326 | 0.5556 | 0.0710 | 0.1259 | 0.8951 |
2.1608 | 2500 | 0.0294 | 0.5806 | 0.1826 | 0.2778 | 0.9032 |
2.5929 | 3000 | 0.0278 | 0.6259 | 0.2698 | 0.3770 | 0.9109 |
Framework Versions
- Python: 3.12.2
- SpanMarker: 1.5.0
- Transformers: 4.41.2
- PyTorch: 2.3.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1
Citation
BibTeX
@software{Aarsen_SpanMarker,
author = {Aarsen, Tom},
license = {Apache-2.0},
title = {{SpanMarker for Named Entity Recognition}},
url = {https://github.com/tomaarsen/SpanMarkerNER}
}