Aleksi Sahala
init model
a12967b
# Second Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 120k words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the second millennium BCE.
See model Babylonian-1st for the first millennium Babylonian.
## Evaluation results
```
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 97.85 ±0.00 97.85
Lemmatizer 94.58 ±0.00 94.58
Combined 93.87 ±0.00 93.87
POS-tagger OOV 91.94 ±0.00 91.94
Lemmatizer OOV 71.33 ±0.00 71.33
Combined OOV 69.91 ±0.00 69.91
-----------------------------------------------
OOV input rate 13.04 13.04
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 97.85 ±0.00 97.85
Lemmatizer 94.59 ±0.00 94.59
Combined 93.88 ±0.00 93.88
POS-tagger OOV 91.94 ±0.00 91.94
Lemmatizer OOV 71.33 ±0.00 71.33
Combined OOV 69.91 ±0.00 69.91
-----------------------------------------------
OOV input rate 13.04 13.04
```