Second Millennium Babylonian model for BabyLemmatizer

Total data set size ca. 120k words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the second millennium BCE.

See model Babylonian-1st for the first millennium Babylonian.

Evaluation results

Neural Net Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.85   ±0.00    97.85
Lemmatizer      94.58   ±0.00    94.58
Combined        93.87   ±0.00    93.87
POS-tagger OOV  91.94   ±0.00    91.94
Lemmatizer OOV  71.33   ±0.00    71.33
Combined   OOV  69.91   ±0.00    69.91
-----------------------------------------------
OOV input rate  13.04            13.04



Post-correct Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.85   ±0.00    97.85
Lemmatizer      94.59   ±0.00    94.59
Combined        93.88   ±0.00    93.88
POS-tagger OOV  91.94   ±0.00    91.94
Lemmatizer OOV  71.33   ±0.00    71.33
Combined   OOV  69.91   ±0.00    69.91
-----------------------------------------------
OOV input rate  13.04            13.04