flair-uk-ner

Model description

flair-uk-pos is a Flair model that is ready to use for part-of-speech (upos) tagging. It is based on flair embeddings, that I've trained for Ukrainian language (available here and here) and has superior performance and a very small size (just 72mb!).

Results:

F-score (micro) 0.9793
F-score (macro) 0.9275
Accuracy 0.9793

By class:	precision	recall	f1-score	support
NOUN	0.9857	0.9851	0.9854	4549
PUNCT	0.9984	1.0000	0.9992	3097
ADJ	0.9772	0.9852	0.9812	1959
ADP	0.9956	0.9968	0.9962	1584
VERB	0.9891	0.9910	0.9900	1552
ADV	0.9630	0.9118	0.9367	714
CCONJ	0.9685	0.9746	0.9715	630
PROPN	0.9279	0.9472	0.9375	625
DET	0.9729	0.9698	0.9713	629
PRON	0.9706	0.9631	0.9669	515
PART	0.9235	0.8693	0.8956	375
NUM	0.9722	0.9804	0.9763	357
SCONJ	0.8768	0.9577	0.9154	260
AUX	0.8906	0.9500	0.9194	120
X	0.9833	0.9593	0.9712	123
SYM	1.0000	0.7059	0.8276	17
INTJ	0.5556	0.5000	0.5263	10
accuracy			0.9793	17116
macro avg	0.9383	0.9204	0.9275	17116
weighted avg	0.9794	0.9793	0.9792	17116

The model was fine-tuned on the Ukrainian (UD) corpus, released by the non-profit organization Institute for Ukrainian. Training code is also available here.

Usage demo

from flair.data import Sentence
from flair.models import SequenceTagger

from pprint import pprint

tagger = SequenceTagger.load("dchaplinsky/flair-uk-pos")

sentence = Sentence("Я люблю Україну. Моє імʼя Марія Шевченко, я навчаюся в Київській політехніці.")

tagger.predict(sentence)

print(sentence)
print('---')

print('The following POS tags are found:')

pos_items = []
for label in sentence.get_labels():
    all_labels = []
    keys = label.data_point.annotation_layers.keys()
    
    for key in keys:
        all_labels.extend(
            [
                {'label': label.value, 'score': round(label.score, 4)}
                for label in label.data_point.get_labels(key)
                if label.data_point == label
            ]
        )

    pos_items.append({
        'text': label.data_point.text,
        'all_labels': all_labels,
    })

pprint(pos_items)

# Result:
"""
Sentence: "Я люблю Україну . Моє імʼя Марія Шевченко , я навчаюся в Київській політехніці ." → ["Я"/PRON, "люблю"/VERB, "Україну"/PROPN, "."/PUNCT, "Моє"/DET, "імʼя"/NOUN, "Марія"/PROPN, "Шевченко"/PROPN, ","/PUNCT, "я"/PRON, "навчаюся"/VERB, "в"/ADP, "Київській"/ADJ, "політехніці"/NOUN, "."/PUNCT]
---
The following POS tags are found:
[{'all_labels': [{'label': 'PRON', 'score': 1.0}], 'text': 'Я'},
 {'all_labels': [{'label': 'VERB', 'score': 1.0}], 'text': 'люблю'},
 {'all_labels': [{'label': 'PROPN', 'score': 1.0}], 'text': 'Україну'},
 {'all_labels': [{'label': 'PUNCT', 'score': 1.0}], 'text': '.'},
 {'all_labels': [{'label': 'DET', 'score': 0.9999}], 'text': 'Моє'},
 {'all_labels': [{'label': 'NOUN', 'score': 1.0}], 'text': 'імʼя'},
 {'all_labels': [{'label': 'PROPN', 'score': 1.0}], 'text': 'Марія'},
 {'all_labels': [{'label': 'PROPN', 'score': 1.0}], 'text': 'Шевченко'},
 {'all_labels': [{'label': 'PUNCT', 'score': 1.0}], 'text': ','},
 {'all_labels': [{'label': 'PRON', 'score': 1.0}], 'text': 'я'},
 {'all_labels': [{'label': 'VERB', 'score': 1.0}], 'text': 'навчаюся'},
 {'all_labels': [{'label': 'ADP', 'score': 1.0}], 'text': 'в'},
 {'all_labels': [{'label': 'ADJ', 'score': 1.0}], 'text': 'Київській'},
 {'all_labels': [{'label': 'NOUN', 'score': 1.0}], 'text': 'політехніці'},
 {'all_labels': [{'label': 'PUNCT', 'score': 1.0}], 'text': '.'}]
"""

lang-uk
/

flair-uk-pos

flair-uk-ner

Model description

Usage demo

Evaluation results