tomaarsen's picture
tomaarsen HF staff
Fix broken URLs
2e7094f
metadata
language:
  - en
license: cc-by-sa-4.0
library_name: span-marker
tags:
  - span-marker
  - token-classification
  - ner
  - named-entity-recognition
  - generated_from_span_marker_trainer
datasets:
  - DFKI-SLT/few-nerd
metrics:
  - f1
  - recall
  - precision
pipeline_tag: token-classification
widget:
  - text: >-
      Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic
      to Paris.
    example_title: Amelia Earhart
  - text: >-
      Leonardo di ser Piero da Vinci painted the Mona Lisa based on Italian
      noblewoman Lisa del Giocondo.
    example_title: Leonardo da Vinci
base_model: bert-base-cased
model-index:
  - name: >-
      SpanMarker w. bert-base-cased on finegrained, supervised FewNERD by Tom
      Aarsen
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        dataset:
          name: finegrained, supervised FewNERD
          type: DFKI-SLT/few-nerd
          config: supervised
          split: test
          revision: 2e3e727c63604fbfa2ff4cc5055359c84fe5ef2c
        metrics:
          - type: f1
            value: 0.7053
            name: F1
          - type: precision
            value: 0.7101
            name: Precision
          - type: recall
            value: 0.7005
            name: Recall

SpanMarker with bert-base-cased on FewNERD

This is a SpanMarker model trained on the FewNERD dataset that can be used for Named Entity Recognition. This SpanMarker model uses bert-base-cased as the underlying encoder.

Model Details

Model Description

  • Model Type: SpanMarker
  • Encoder: bert-base-cased
  • Maximum Sequence Length: 256 tokens
  • Maximum Entity Length: 8 words
  • Training Dataset: FewNERD
  • Language: en
  • License: cc-by-sa-4.0

Model Sources

Model Labels

Label Examples
art-broadcastprogram "Street Cents", "Corazones", "The Gale Storm Show : Oh , Susanna"
art-film "Bosch", "L'Atlantide", "Shawshank Redemption"
art-music "Atkinson , Danko and Ford ( with Brockie and Hilton )", "Champion Lover", "Hollywood Studio Symphony"
art-other "Aphrodite of Milos", "Venus de Milo", "The Today Show"
art-painting "Production/Reproduction", "Touit", "Cofiwch Dryweryn"
art-writtenart "Imelda de ' Lambertazzi", "Time", "The Seven Year Itch"
building-airport "Luton Airport", "Newark Liberty International Airport", "Sheremetyevo International Airport"
building-hospital "Hokkaido University Hospital", "Yeungnam University Hospital", "Memorial Sloan-Kettering Cancer Center"
building-hotel "The Standard Hotel", "Radisson Blu Sea Plaza Hotel", "Flamingo Hotel"
building-library "British Library", "Berlin State Library", "Bayerische Staatsbibliothek"
building-other "Communiplex", "Alpha Recording Studios", "Henry Ford Museum"
building-restaurant "Fatburger", "Carnegie Deli", "Trumbull"
building-sportsfacility "Glenn Warner Soccer Facility", "Boston Garden", "Sports Center"
building-theater "Pittsburgh Civic Light Opera", "Sanders Theatre", "National Paris Opera"
event-attack/battle/war/militaryconflict "Easter Offensive", "Vietnam War", "Jurist"
event-disaster "the 1912 North Mount Lyell Disaster", "1693 Sicily earthquake", "1990s North Korean famine"
event-election "March 1898 elections", "1982 Mitcham and Morden by-election", "Elections to the European Parliament"
event-other "Eastwood Scoring Stage", "Union for a Popular Movement", "Masaryk Democratic Movement"
event-protest "French Revolution", "Russian Revolution", "Iranian Constitutional Revolution"
event-sportsevent "National Champions", "World Cup", "Stanley Cup"
location-GPE "Mediterranean Basin", "the Republic of Croatia", "Croatian"
location-bodiesofwater "Atatürk Dam Lake", "Norfolk coast", "Arthur Kill"
location-island "Laccadives", "Staten Island", "new Samsat district"
location-mountain "Salamander Glacier", "Miteirya Ridge", "Ruweisat Ridge"
location-other "Northern City Line", "Victoria line", "Cartuther"
location-park "Gramercy Park", "Painted Desert Community Complex Historic District", "Shenandoah National Park"
location-road/railway/highway/transit "Friern Barnet Road", "Newark-Elizabeth Rail Link", "NJT"
organization-company "Dixy Chicken", "Texas Chicken", "Church 's Chicken"
organization-education "MIT", "Belfast Royal Academy and the Ulster College of Physical Education", "Barnard College"
organization-government/governmentagency "Congregazione dei Nobili", "Diet", "Supreme Court"
organization-media/newspaper "TimeOut Melbourne", "Clash", "Al Jazeera"
organization-other "Defence Sector C", "IAEA", "4th Army"
organization-politicalparty "Shimpotō", "Al Wafa ' Islamic", "Kenseitō"
organization-religion "Jewish", "Christian", "UPCUSA"
organization-showorganization "Lizzy", "Bochumer Symphoniker", "Mr. Mister"
organization-sportsleague "China League One", "First Division", "NHL"
organization-sportsteam "Tottenham", "Arsenal", "Luc Alphand Aventures"
other-astronomything "Zodiac", "Algol", "`` Caput Larvae ''"
other-award "GCON", "Order of the Republic of Guinea and Nigeria", "Grand Commander of the Order of the Niger"
other-biologything "N-terminal lipid", "BAR", "Amphiphysin"
other-chemicalthing "uranium", "carbon dioxide", "sulfur"
other-currency "$", "Travancore Rupee", "lac crore"
other-disease "French Dysentery Epidemic of 1779", "hypothyroidism", "bladder cancer"
other-educationaldegree "Master", "Bachelor", "BSc ( Hons ) in physics"
other-god "El", "Fujin", "Raijin"
other-language "Breton-speaking", "English", "Latin"
other-law "Thirty Years ' Peace", "Leahy–Smith America Invents Act ( AIA", "United States Freedom Support Act"
other-livingthing "insects", "monkeys", "patchouli"
other-medical "Pediatrics", "amitriptyline", "pediatrician"
person-actor "Ellaline Terriss", "Tchéky Karyo", "Edmund Payne"
person-artist/author "George Axelrod", "Gaetano Donizett", "Hicks"
person-athlete "Jaguar", "Neville", "Tozawa"
person-director "Bob Swaim", "Richard Quine", "Frank Darabont"
person-other "Richard Benson", "Holden", "Campbell"
person-politician "William", "Rivière", "Emeric"
person-scholar "Stedman", "Wurdack", "Stalmine"
person-soldier "Helmuth Weidling", "Krukenberg", "Joachim Ziegler"
product-airplane "Luton", "Spey-equipped FGR.2s", "EC135T2 CPDS"
product-car "100EX", "Corvettes - GT1 C6R", "Phantom"
product-food "red grape", "yakiniku", "V. labrusca"
product-game "Airforce Delta", "Hardcore RPG", "Splinter Cell"
product-other "Fairbottom Bobs", "X11", "PDP-1"
product-ship "Congress", "Essex", "HMS `` Chinkara ''"
product-software "AmiPDF", "Apdf", "Wikipedia"
product-train "High Speed Trains", "55022", "Royal Scots Grey"
product-weapon "AR-15 's", "ZU-23-2M Wróbel", "ZU-23-2MR Wróbel II"

Uses

Direct Use

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-fewnerd-fine-super")
# Run inference
entities = model.predict("Amelia Earhart flew her single engine Lockheed Vega 5B across the Atlantic to Paris.")

Downstream Use

You can finetune this model on your own dataset.

Click to expand
from span_marker import SpanMarkerModel, Trainer

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("tomaarsen/span-marker-bert-base-fewnerd-fine-super")

# Specify a Dataset with "tokens" and "ner_tag" columns
dataset = load_dataset("conll2003") # For example CoNLL2003

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("tomaarsen/span-marker-bert-base-fewnerd-fine-super-finetuned")

Training Details

Training Set Metrics

Training set Min Median Max
Sentence length 1 24.4945 267
Entities per sentence 0 2.5832 88

Training Hyperparameters

  • learning_rate: 5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training Hardware

  • On Cloud: No
  • GPU Model: 1 x NVIDIA GeForce RTX 3090
  • CPU Model: 13th Gen Intel(R) Core(TM) i7-13700K
  • RAM Size: 31.78 GB

Framework Versions

  • Python: 3.9.16
  • SpanMarker: 1.3.1.dev
  • Transformers : 4.29.2
  • PyTorch: 2.0.1+cu118
  • Datasets: 2.14.3
  • Tokenizers: 0.13.2