Second Millennium Babylonian model for BabyLemmatizer
Total data set size ca. 120k words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the second millennium BCE.
See model Babylonian-1st for the first millennium Babylonian.
Evaluation results
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 97.85 ±0.00 97.85
Lemmatizer 94.58 ±0.00 94.58
Combined 93.87 ±0.00 93.87
POS-tagger OOV 91.94 ±0.00 91.94
Lemmatizer OOV 71.33 ±0.00 71.33
Combined OOV 69.91 ±0.00 69.91
-----------------------------------------------
OOV input rate 13.04 13.04
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 97.85 ±0.00 97.85
Lemmatizer 94.59 ±0.00 94.59
Combined 93.88 ±0.00 93.88
POS-tagger OOV 91.94 ±0.00 91.94
Lemmatizer OOV 71.33 ±0.00 71.33
Combined OOV 69.91 ±0.00 69.91
-----------------------------------------------
OOV input rate 13.04 13.04