File size: 1,287 Bytes
fc29d6d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres.
See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian).
## Evaluation results
```
Neural Net Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 96.84 ±0.00 96.84
Lemmatizer 95.23 ±0.00 95.23
Combined 93.91 ±0.00 93.91
POS-tagger OOV 87.41 ±0.00 87.41
Lemmatizer OOV 71.78 ±0.00 71.78
Combined OOV 69.63 ±0.00 69.63
-----------------------------------------------
OOV input rate 6.63 6.63
Post-correct Evaluation
COMPONENT AVG CI MODEL0
POS-tagger 96.84 ±0.00 96.84
Lemmatizer 95.36 ±0.00 95.36
Combined 94.04 ±0.00 94.04
POS-tagger OOV 87.41 ±0.00 87.41
Lemmatizer OOV 71.78 ±0.00 71.78
Combined OOV 69.63 ±0.00 69.63
-----------------------------------------------
OOV input rate 6.63 6.63
```
|