Aleksi Sahala
update model
17882b8

Neo-Assyrian model for BabyLemmatizer

Total data set size ca. 330k words (including lacunae). Consists of all Oracc texts labeled as Neo-Assyrian. Based on Oracc.

Evaluation results

Neural Net Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.49   ±0.00    97.49
Lemmatizer      95.38   ±0.00    95.38
Combined        94.28   ±0.00    94.28
POS-tagger OOV  90.45   ±0.00    90.45
Lemmatizer OOV  71.21   ±0.00    71.21
Combined   OOV  69.64   ±0.00    69.64
-----------------------------------------------
OOV input rate  9.51             9.51



Post-correct Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.49   ±0.00    97.49
Lemmatizer      95.44   ±0.00    95.44
Combined        94.34   ±0.00    94.34
POS-tagger OOV  90.45   ±0.00    90.45
Lemmatizer OOV  71.21   ±0.00    71.21
Combined   OOV  69.64   ±0.00    69.64
-----------------------------------------------
OOV input rate  9.51             9.51