Aleksi Sahala
init model
fc29d6d
# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres.
See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian).
## Evaluation results
```
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 96.84 ±0.00 96.84
Lemmatizer 95.23 ±0.00 95.23
Combined 93.91 ±0.00 93.91
POS-tagger OOV 87.41 ±0.00 87.41
Lemmatizer OOV 71.78 ±0.00 71.78
Combined OOV 69.63 ±0.00 69.63
-----------------------------------------------
OOV input rate 6.63 6.63
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 96.84 ±0.00 96.84
Lemmatizer 95.36 ±0.00 95.36
Combined 94.04 ±0.00 94.04
POS-tagger OOV 87.41 ±0.00 87.41
Lemmatizer OOV 71.78 ±0.00 71.78
Combined OOV 69.63 ±0.00 69.63
-----------------------------------------------
OOV input rate 6.63 6.63
```