# Sumerian models for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer) These models use indexed logo-syllabic tokenization and require BabyLemmatizer 2.1. Consists of two models, sumerian-lit for literary Sumerian and sumerian-adm for Administrative Sumerian. ```sumerian-adm``` consists of all Sumerian Early Dynastic, Old Babylonian, Old Akkadian, Ebla and Lagaš II administrative texts in Oracc's ePSD2 corpus, consisting 570k words. ```sumerian-lit``` consists of all Sumerian literary texts from Oracc comprising 268k words. ## Evaluation results for administrative model ``` Neural Net Evaluation COMPONENT AVG CI MODEL0 POS-tagger 96.48 ±0.00 96.48 Lemmatizer 95.39 ±0.00 95.39 Combined 94.42 ±0.00 94.42 POS-tagger OOV 82.03 ±0.00 82.03 Lemmatizer OOV 71.87 ±0.00 71.87 Combined OOV 68.00 ±0.00 68.00 ----------------------------------------------- OOV input rate 5.44 5.44 Post-correct Evaluation COMPONENT AVG CI MODEL0 POS-tagger 96.48 ±0.00 96.48 Lemmatizer 95.42 ±0.00 95.42 Combined 94.44 ±0.00 94.44 POS-tagger OOV 82.03 ±0.00 82.03 Lemmatizer OOV 71.87 ±0.00 71.87 Combined OOV 68.00 ±0.00 68.00 ----------------------------------------------- OOV input rate 5.44 5.44 ``` ## Evaluation results for literary model ``` Neural Net Evaluation COMPONENT AVG CI MODEL0 POS-tagger 94.00 ±0.00 94.00 Lemmatizer 93.71 ±0.00 93.71 Combined 91.37 ±0.00 91.37 POS-tagger OOV 82.61 ±0.00 82.61 Lemmatizer OOV 80.87 ±0.00 80.87 Combined OOV 74.54 ±0.00 74.54 ----------------------------------------------- OOV input rate 19.04 19.04 Post-correct Evaluation COMPONENT AVG CI MODEL0 POS-tagger 94.00 ±0.00 94.00 Lemmatizer 93.70 ±0.00 93.70 Combined 91.36 ±0.00 91.36 POS-tagger OOV 82.61 ±0.00 82.61 Lemmatizer OOV 80.87 ±0.00 80.87 Combined OOV 74.54 ±0.00 74.54 ----------------------------------------------- OOV input rate 19.04 19.04 ```