SpanMarker

This is a SpanMarker model trained on the imvladikon/nemo_corpus dataset that can be used for Named Entity Recognition.

Model Details

Model Description

Model Type: SpanMarker
Maximum Sequence Length: 512 tokens
Maximum Entity Length: 100 words
Training Dataset: imvladikon/nemo_corpus

Model Sources

Repository: SpanMarker on GitHub
Thesis: SpanMarker For Named Entity Recognition

Model Labels

Label	Examples
ANG	"יידיש", "גרמנית", "אנגלית"
DUC	"דינמיט", "סובארו", "מרצדס"
EVE	"מצדה", "הצהרת בלפור", "ה שואה"
FAC	"ברזילי", "כלא עזה", "תל - ה שומר"
GPE	"ה שטחים", "שפרעם", "רצועת עזה"
LOC	"שייח רדואן", "גיבאליה", "חאן יונס"
ORG	"כך", "ה ארץ", "מרחב ה גליל"
PER	"רמי רהב", "נימר חוסיין", "איברהים נימר חוסיין"
WOA	"קיטש ו מוות", "קדיש", "ה ארץ"

Evaluation

Metrics

Label	Precision	Recall	F1
all	0.7577	0.7114	0.7338
ANG	0.0	0.0	0.0
DUC	0.0	0.0	0.0
FAC	0.0	0.0	0.0
GPE	0.7085	0.8103	0.7560
LOC	0.5714	0.1951	0.2909
ORG	0.7460	0.6912	0.7176
PER	0.8301	0.8052	0.8175
WOA	0.0	0.0	0.0

Uses

Direct Use for Inference

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("iahlt/span-marker-alephbert-small-nemo-mt-he")
# Run inference
entities = model.predict("יו\"ר ועדת ה נוער נתן סלובטיק אמר ש ה שחקנים של אנחנו לא משתלבים ב אירופה.")
entities

Using spacy

pip install spacy_udpipe

import spacy
from spacy.lang.he import Hebrew
import spacy_udpipe

spacy_udpipe.download("he") # download public udpipe model, but possible to use any your spacy model
nlp = spacy_udpipe.load("he")
nlp.add_pipe("span_marker", config={"model": "iahlt/span-marker-alephbert-small-nemo-mt-he"})

text = "יו\"ר ועדת הנוער נתן סלובטיק אמר שהשחקנים של אנחנו לא משתלבים באירופה."
doc = nlp(text)
print([(entity, entity.label_) for entity in doc.ents])
# [(ועדת הנוער, 'ORG'), (נתן סלובטיק, 'PER'), (אירופה, 'GPE')]

Training Details

Training Set Metrics

Training set	Min	Median	Max
Sentence length	1	25.4427	117
Entities per sentence	0	1.2472	20

Training Hyperparameters

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 4
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 4
mixed_precision_training: Native AMP

Evaluation results

	0
eval_loss	0.00487611
eval_overall_precision	0.822917
eval_overall_recall	0.791583
eval_overall_f1	0.806946
eval_overall_accuracy	0.969029

Test results

	0
test_loss	0.00652107
test_overall_precision	0.747289
test_overall_recall	0.73927
test_overall_f1	0.743258
test_overall_accuracy	0.960126

Framework Versions

Python: 3.10.12
SpanMarker: 1.5.0
Transformers: 4.35.2
PyTorch: 2.1.0+cu118
Datasets: 2.15.0
Tokenizers: 0.15.0

Citation

BibTeX

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}

iahlt
/

span-marker-alephbert-small-nemo-mt-he

SpanMarker

Model Details

Model Description

Model Sources

Model Labels

Evaluation

Metrics

Uses

Direct Use for Inference

Using spacy

Training Details

Training Set Metrics

Training Hyperparameters

Evaluation results

Test results

Framework Versions

Citation

BibTeX

Dataset used to train iahlt/span-marker-alephbert-small-nemo-mt-he

Space using iahlt/span-marker-alephbert-small-nemo-mt-he 1

Collection including iahlt/span-marker-alephbert-small-nemo-mt-he

Hebrew NER

Evaluation results