File size: 1,142 Bytes
a12967b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
# Second Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 120k words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the second millennium BCE.
See model Babylonian-1st for the first millennium Babylonian.
## Evaluation results
```
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 97.85 ±0.00 97.85
Lemmatizer 94.58 ±0.00 94.58
Combined 93.87 ±0.00 93.87
POS-tagger OOV 91.94 ±0.00 91.94
Lemmatizer OOV 71.33 ±0.00 71.33
Combined OOV 69.91 ±0.00 69.91
-----------------------------------------------
OOV input rate 13.04 13.04
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 97.85 ±0.00 97.85
Lemmatizer 94.59 ±0.00 94.59
Combined 93.88 ±0.00 93.88
POS-tagger OOV 91.94 ±0.00 91.94
Lemmatizer OOV 71.33 ±0.00 71.33
Combined OOV 69.91 ±0.00 69.91
-----------------------------------------------
OOV input rate 13.04 13.04
```
|