|
# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer) |
|
Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres. |
|
|
|
See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian). |
|
|
|
## Evaluation results |
|
|
|
``` |
|
Neural Net Evaluation |
|
COMPONENT AVG CI MODEL0 |
|
POS-tagger 96.84 ±0.00 96.84 |
|
Lemmatizer 95.23 ±0.00 95.23 |
|
Combined 93.91 ±0.00 93.91 |
|
POS-tagger OOV 87.41 ±0.00 87.41 |
|
Lemmatizer OOV 71.78 ±0.00 71.78 |
|
Combined OOV 69.63 ±0.00 69.63 |
|
----------------------------------------------- |
|
OOV input rate 6.63 6.63 |
|
|
|
Post-correct Evaluation |
|
COMPONENT AVG CI MODEL0 |
|
POS-tagger 96.84 ±0.00 96.84 |
|
Lemmatizer 95.36 ±0.00 95.36 |
|
Combined 94.04 ±0.00 94.04 |
|
POS-tagger OOV 87.41 ±0.00 87.41 |
|
Lemmatizer OOV 71.78 ±0.00 71.78 |
|
Combined OOV 69.63 ±0.00 69.63 |
|
----------------------------------------------- |
|
OOV input rate 6.63 6.63 |
|
``` |
|
|