# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer) Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres. See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian). ## Evaluation results ``` Neural Net Evaluation COMPONENT AVG CI MODEL0 POS-tagger 96.84 ±0.00 96.84 Lemmatizer 95.23 ±0.00 95.23 Combined 93.91 ±0.00 93.91 POS-tagger OOV 87.41 ±0.00 87.41 Lemmatizer OOV 71.78 ±0.00 71.78 Combined OOV 69.63 ±0.00 69.63 ----------------------------------------------- OOV input rate 6.63 6.63 Post-correct Evaluation COMPONENT AVG CI MODEL0 POS-tagger 96.84 ±0.00 96.84 Lemmatizer 95.36 ±0.00 95.36 Combined 94.04 ±0.00 94.04 POS-tagger OOV 87.41 ±0.00 87.41 Lemmatizer OOV 71.78 ±0.00 71.78 Combined OOV 69.63 ±0.00 69.63 ----------------------------------------------- OOV input rate 6.63 6.63 ```