Fusion NER Models

Here you can find NER models for Fusion project!

Table of content:

NER Models:

Here you can find a description on each of our models. Each row contains the model nickname, training description, model path (LINK), source dataset (with LINK), base model and entity types.

model name	model description	model path	datasets	link to dataset	base model	entity types	trainer
Basic	Basic training on IAHALT	FusioNER/Basic_IAHALT	IAHALT	FusioNER/Basic	HeRo	classic[4]	Etzion
Vitaly	Vitaly training on IAHALT (with BI-BI problem)	FusioNER/Vitaly_NER	IAHALT	FusioNER/Vitaly	HeRo	classic[4]	Vitaly
Name-Sentences	Training on IAHALT + Name-Sentences	FusioNER/Name-Sentences	IAHALT	FusioNER/Name_Sentences	HeRo	classic[4]	Etzion
Entity-Injection	Training on IAHALT + Entity-Injection	FusioNER/Entity-Injection	IAHALT	FusioNER/Entity_Injection	HeRo	classic[4]	Etzion
Smart_Injection	Training on IAHALT + Name-Sentences + Entity-Injection	FusioNER/Smart_Injection	IAHALT	FusioNER/Smart_Injection	HeRo	classic[4]	Etzion
NEMO	Basic training on NEMO dataset	FusioNER/Nemo	NEMO	FusioNER/NEMO	HeRo	classic[4]	Etzion
IAHALT_and_NEMO	Basic training on IAHALT + NEMO	FusioNER/IAHALT_and_NEMO	IAHALT + NEMO	FusioNER/IAHALT_and_NEMO	HeRo	classic[4]	Etzion
IAHALT_and_NEMO_PP	Training on IAHALT + NEMO + Name-Sentences + Entity-Injection	FusioNER/IAHALT_and_NEMO_and_PP	IAHALT + NEMO	FusioNER/IAHALT_and_NEMO_PP	HeRo	classic[4]	Etzion
Animals	Training on IAHALT + Entity-Injection (of animals names as PER entities)	FusioNER/Animals	IAHALT	FusioNER/Animals	HeRo	classic[4]	Etzion
PRS-Injection	Training on IAHALT + Entity-Injection (of PRS names as PER entities)	FusioNER/PRS-Injection	IAHALT	FusioNER/PRS_locations	HeRo	classic[4]	Etzion
DICTA_Basic	Training the DICTA model on the basic IAHALT dataset	FusioNER/Dicta_Small_Basic	IAHALT	FusioNER/Smart_Injection	DICTA	classic[4]	Etzion
DICTA_Small_Smart	Training the DICTA model on IAHALT + Name-Sentences + Entity-Injection] dataset	FusioNER/Dicta_Small_Smart	IAHALT	FusioNER/Smart_Injection	DICTA	classic[4]	Etzion
DICTA_basic_NER	Training the DICTA-ner model on the basic IAHALT dataset	FusioNER/DICTA_basic	IAHALT	FusioNER/Basic	DICTA-ner	classic[4]	Etzion
DICTA_smart_NER	Training the DICTA-ner model on IAHALT + Name-Sentences + Entity-Injection] dataset	FusioNER/DICTA_Smart	IAHALT	FusioNER/Smart_Injection	DICTA-ner	classic[4]	Etzion
DICTA_Large_Smart	Training the DICTA Large model on IAHALT + Name-Sentences + Entity-Injection] dataset	FusioNER/Dicta_Large_Smart	IAHALT	FusioNER/Smart_Injection	DICTA Large	classic[4]	Etzion
TEC_NER	Basic technology NER model	FusioNER/tec_ner	TEC_NER	FusioNER/tec_ner	base model	TEC	Yehoshua

Results

We test our models on the IAHALT test set. We also check another models, such as DictaBert and HeBert. This is the performence results:

Model name	Precision	Recall	F1 - Score	Time (in seconds)
IAHALT_and_NEMO_PP	0.714	0.353	0.461	83.128
HeBert	0.574	0.474	0.494	86.483
NEMO	0.553	0.51	0.525	81.422
IAHALT_and_NEMO	0.692	0.678	0.684	83.702
Vitaly	0.883	0.794	0.836	83.773
DictaBert	0.916	0.834	0.872	70.465
DICTA_large	0.917	0.845	0.879	206.251
Name-Sentences	0.895	0.865	0.879	82.674
Basic	0.897	0.866	0.881	84.479
Smart_Injection	0.898	0.867	0.881	82.253
DICTA_Basic	0.903	0.875	0.888	69.419
DICTA_Large_Smart	0.904	0.875	0.889	204.324
DICTA_Small_Smart	0.904	0.875	0.889	70.29

According to the results, we recommend to use DICTA_Small_Smart model.

Hebrew NLP models

You can find in the table Hebrew NLP models:

Model name	Link	Creator
HeNLP/HeRo	https://huggingface.co/HeNLP/HeRo	Vitaly Shalumov and Harel Haskey
dicta-il/dictabert	https://huggingface.co/dicta-il/dictabert	Shaltiel Shmidman and Avi Shmidman and Moshe Koppel
dicta-il/dictabert-large	https://huggingface.co/dicta-il/dictabert-large	Shaltiel Shmidman and Avi Shmidman and Moshe Koppel
avichr/heBERT	https://huggingface.co/avichr/heBERT	Avihay Chriqui and Inbal Yahav

Footnotes

[1] Name-Sentences:

Adding to the corpus sentences that contain only the entity we want the network to learn.

[2] Entity-Injection:

Replace a tagged entity in the original corpus with a new entity. By using, this method, the model can learn new entities (not labels!) which the model not extracted before.

[3] BI-BI Problem:

Building training corpus when entities from the same type appear in sequence, labeled as continuations of one another. For example, the text "הארי פוטר ורון וויזלי" would tagged as SINGLE entity. That problem prevent the model to extract entities correctly.

[4] Classic:

The classic NER types:

entity type	full name	examples
PER	Person	אדולף היטלר, רודולף הס, מרדכי אנילביץ
GPE	Geopolitical Entity	גרמניה, פולין, ברלין, וורשה
LOC	Location	מזרח אירופה, אגן הים התיכון, הגליל
FAC	Facility	אוושוויץ, מגדלי התאומים, נתב"ג 2000, רחוב קפלן
ORG	Organization	המפלגה הנאצית, חברת גוגל, ממשלת חוף השנהב
TIMEX	Time Expression	1945, שנת 1993, יום השואה, שנות ה-90
EVE	Event	השואה, מלחמת העולם השנייה, שלטון האפרטהייד
TTL	Title	פיהרר, קיסר, מנכ"ל
ANG	Language	עברית, ערבית, גרמנית
DUC	Product	פייסבוק, F-16, תנובה
WOA	Work of Art	דו"ח מבקר המדינה, עיתון הארץ, הארי פוטר, תיק 2000,
MISC	Miscellaneous	קורונה, התו הירוק, מדלית זהב, ביטקוין

Datasets for English NER (for cleaning wrong entities for english texts):

MIT License