Edit model card

SpanMarker with bert-base-cased on FewNERD

This is a SpanMarker model trained on the FewNERD dataset that can be used for Named Entity Recognition. This SpanMarker model uses bert-base-cased as the underlying encoder.

Model Details

Model Description

  • Model Type: SpanMarker
  • Encoder: bert-base-cased
  • Maximum Sequence Length: 256 tokens
  • Maximum Entity Length: 8 words
  • Training Dataset: FewNERD
  • Language: en
  • License: cc-by-sa-4.0

Model Sources

Model Labels

Label Examples
art-broadcastprogram "Street Cents", "Corazones", "The Gale Storm Show : Oh , Susanna"
art-film "Bosch", "L'Atlantide", "Shawshank Redemption"
art-music "Atkinson , Danko and Ford ( with Brockie and Hilton )", "Champion Lover", "Hollywood Studio Symphony"
art-other "Aphrodite of Milos", "Venus de Milo", "The Today Show"
art-painting "Production/Reproduction", "Touit", "Cofiwch Dryweryn"
art-writtenart "Imelda de ' Lambertazzi", "Time", "The Seven Year Itch"
building-airport "Luton Airport", "Newark Liberty International Airport", "Sheremetyevo International Airport"
building-hospital "Hokkaido University Hospital", "Yeungnam University Hospital", "Memorial Sloan-Kettering Cancer Center"
building-hotel "The Standard Hotel", "Radisson Blu Sea Plaza Hotel", "Flamingo Hotel"
building-library "British Library", "Berlin State Library", "Bayerische Staatsbibliothek"
building-other "Communiplex", "Alpha Recording Studios", "Henry Ford Museum"
building-restaurant "Fatburger", "Carnegie Deli", "Trumbull"
building-sportsfacility "Glenn Warner Soccer Facility", "Boston Garden", "Sports Center"
building-theater "Pittsburgh Civic Light Opera", "Sanders Theatre", "National Paris Opera"
event-attack/battle/war/militaryconflict "Easter Offensive", "Vietnam War", "Jurist"
event-disaster "the 1912 North Mount Lyell Disaster", "1693 Sicily earthquake", "1990s North Korean famine"
event-election "March 1898 elections", "1982 Mitcham and Morden by-election", "Elections to the European Parliament"
event-other "Eastwood Scoring Stage", "Union for a Popular Movement", "Masaryk Democratic Movement"
event-protest "French Revolution", "Russian Revolution", "Iranian Constitutional Revolution"
event-sportsevent "National Champions", "World Cup", "Stanley Cup"
location-GPE "Mediterranean Basin", "the Republic of Croatia", "Croatian"
location-bodiesofwater "Atatürk Dam Lake", "Norfolk coast", "Arthur Kill"
location-island "Laccadives", "Staten Island", "new Samsat district"
location-mountain "Salamander Glacier", "Miteirya Ridge", "Ruweisat Ridge"
location-other "Northern City Line", "Victoria line", "Cartuther"
location-park "Gramercy Park", "Painted Desert Community Complex Historic District", "Shenandoah National Park"
location-road/railway/highway/transit "Friern Barnet Road", "Newark-Elizabeth Rail Link", "NJT"
organization-company "Dixy Chicken", "Texas Chicken", "Church 's Chicken"
organization-education "MIT", "Belfast Royal Academy and the Ulster College of Physical Education", "Barnard College"
organization-government/governmentagency "Congregazione dei Nobili", "Diet", "Supreme Court"
organization-media/newspaper "TimeOut Melbourne", "Clash", "Al Jazeera"
organization-other "Defence Sector C", "IAEA", "4th Army"
organization-politicalparty "Shimpotō", "Al Wafa ' Islamic", "Kenseitō"
organization-religion "Jewish", "Christian", "UPCUSA"
organization-showorganization "Lizzy", "Bochumer Symphoniker", "Mr. Mister"
organization-sportsleague "China League One", "First Division", "NHL"
organization-sportsteam "Tottenham", "Arsenal", "Luc Alphand Aventures"
other-astronomything "Zodiac", "Algol", "`` Caput Larvae ''"
other-award "GCON", "Order of the Republic of Guinea and Nigeria", "Grand Commander of the Order of the Niger"
other-biologything "N-terminal lipid", "BAR", "Amphiphysin"
other-chemicalthing "uranium", "carbon dioxide", "sulfur"
other-currency "$", "Travancore Rupee", "lac crore"
other-disease "French Dysentery Epidemic of 1779", "hypothyroidism", "bladder cancer"
other-educationaldegree "Master", "Bachelor", "BSc ( Hons ) in physics"
other-god "El", "Fujin", "Raijin"
other-language "Breton-speaking", "English", "Latin"
other-law "Thirty Years ' Peace", "Leahy–Smith America Invents Act ( AIA", "United States Freedom Support Act"
other-livingthing "insects", "monkeys", "patchouli"
other-medical "Pediatrics", "amitriptyline", "pediatrician"
person-actor "Ellaline Terriss", "Tchéky Karyo", "Edmund Payne"
person-artist/author "George Axelrod", "Gaetano Donizett", "Hicks"
person-athlete "Jaguar", "Neville", "Tozawa"
person-director "Bob Swaim", "Richard Quine", "Frank Darabont"
person-other "Richard Benson", "Holden", "Campbell"
person-politician "William", "Rivière", "Emeric"
person-scholar "Stedman", "Wurdack", "Stalmine"
person-soldier "Helmuth Weidling", "Krukenberg", "Joachim Ziegler"
product-airplane "Luton", "Spey-equipped FGR.2s", "EC135T2 CPDS"
product-car "100EX", "Corvettes - GT1 C6R", "Phantom"
product-food "red grape", "yakiniku", "V. labrusca"
product-game "Airforce Delta", "Hardcore RPG", "Splinter Cell"
product-other "Fairbottom Bobs", "X11", "PDP-1"
product-ship "Congress", "Essex", "HMS `` Chinkara ''"
product-software "AmiPDF", "Apdf", "Wikipedia"
product-train "High Speed Trains", "55022", "Royal Scots Grey"
product-weapon "AR-15 's", "ZU-23-2M Wróbel", "ZU-23-2MR Wróbel II"

Uses

Direct Use

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-fewnerd-fine-super")
# Run inference
entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")

Downstream Use

You can finetune this model on your own dataset.

Click to expand
from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-fewnerd-fine-super")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-bert-base-fewnerd-fine-super-finetuned")

Training Details

Training Set Metrics

Training set Min Median Max
Sentence length 1 24.4945 267
Entities per sentence 0 2.5832 88

Training Hyperparameters

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.9.16
  • SpanMarker: 1.3.1.dev
  • Transformers : 4.29.2
  • PyTorch: 2.0.1+cu118
  • Datasets: 2.14.3
  • Tokenizers: 0.13.2
Downloads last month
528
Safetensors
Model size
108M params
Tensor type
I64
·
F32
·

Finetuned from

Dataset used to train tomaarsen/span-marker-bert-base-fewnerd-fine-super

Collection including tomaarsen/span-marker-bert-base-fewnerd-fine-super

Evaluation results