Aleksi Sahala
init model
17bc223
# Sumerian models for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
These models use indexed logo-syllabic tokenization and require BabyLemmatizer 2.1. Consists of two models, sumerian-lit for literary Sumerian and sumerian-adm for Administrative Sumerian.
```sumerian-adm``` consists of all Sumerian Early Dynastic, Old Babylonian, Old Akkadian, Ebla and Lagaš II administrative texts in Oracc's ePSD2 corpus, consisting 570k words.
```sumerian-lit``` consists of all Sumerian literary texts from Oracc comprising 268k words.
## Evaluation results for administrative model
```
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 96.48 ±0.00 96.48
Lemmatizer 95.39 ±0.00 95.39
Combined 94.42 ±0.00 94.42
POS-tagger OOV 82.03 ±0.00 82.03
Lemmatizer OOV 71.87 ±0.00 71.87
Combined OOV 68.00 ±0.00 68.00
-----------------------------------------------
OOV input rate 5.44 5.44
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 96.48 ±0.00 96.48
Lemmatizer 95.42 ±0.00 95.42
Combined 94.44 ±0.00 94.44
POS-tagger OOV 82.03 ±0.00 82.03
Lemmatizer OOV 71.87 ±0.00 71.87
Combined OOV 68.00 ±0.00 68.00
-----------------------------------------------
OOV input rate 5.44 5.44
```
## Evaluation results for literary model
```
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 94.00 ±0.00 94.00
Lemmatizer 93.71 ±0.00 93.71
Combined 91.37 ±0.00 91.37
POS-tagger OOV 82.61 ±0.00 82.61
Lemmatizer OOV 80.87 ±0.00 80.87
Combined OOV 74.54 ±0.00 74.54
-----------------------------------------------
OOV input rate 19.04 19.04
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 94.00 ±0.00 94.00
Lemmatizer 93.70 ±0.00 93.70
Combined 91.36 ±0.00 91.36
POS-tagger OOV 82.61 ±0.00 82.61
Lemmatizer OOV 80.87 ±0.00 80.87
Combined OOV 74.54 ±0.00 74.54
-----------------------------------------------
OOV input rate 19.04 19.04
```