SpanMarker

This is a SpanMarker model that can be used for Named Entity Recognition.

Model Details

Details are here - https://iahlt.github.io/arabic_ner/

Model Description

  • Model Type: SpanMarker
  • Maximum Sequence Length: 512 tokens
  • Maximum Entity Length: 150 words

Tags

ANG - Any named language (Hebrew, Arabic, English, French, etc.)
DUC - A branded product, objects, vehicles, medicines, foods, etc. (Apple, BMW, Coca-Cola, etc.)
EVE - Any named event (Olympics, World Cup, etc.)
FAC - Any named facility, building, airport, etc. (Eiffel Tower, Ben Gurion Airport, etc.)
GPE - Geo-political entity, nation states, counties, cities, etc.
INFORMAL - Informal language (slang)
LOC - Non-GPE locations, geographical regions, mountain ranges, bodies of water, etc.
ORG - Companies, agencies, institutions, political parties, etc.
PER - People, including fictional.
TIMEX - Time expression, absolute or relative dates or periods.
TTL - Any named title, position, profession, etc. (President, Prime Minister, etc.)
WOA - Any named work of art (books, movies, songs, etc.)
MISC - Miscellaneous entities, that do not belong to the previous categories

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("iahlt/xlm-roberta-base-ar-ner-flat")
entities = model.predict(<text>)
print(entities)

Training Details

Framework Versions

  • Python: 3.10.12
  • SpanMarker: 1.5.0
  • Transformers: 4.35.2
  • PyTorch: 2.1.0+cu121
  • Datasets: 2.16.1
  • Tokenizers: 0.15.1

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}
Downloads last month
16
Safetensors
Model size
278M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Collection including iahlt/xlm-roberta-base-ar-ner-flat