# Second Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 120k words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the second millennium BCE.

See model Babylonian-1st for the first millennium Babylonian.

## Evaluation results

```
Neural Net Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.85   ±0.00    97.85
Lemmatizer      94.58   ±0.00    94.58
Combined        93.87   ±0.00    93.87
POS-tagger OOV  91.94   ±0.00    91.94
Lemmatizer OOV  71.33   ±0.00    71.33
Combined   OOV  69.91   ±0.00    69.91
-----------------------------------------------
OOV input rate  13.04            13.04


Post-correct Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.85   ±0.00    97.85
Lemmatizer      94.59   ±0.00    94.59
Combined        93.88   ±0.00    93.88
POS-tagger OOV  91.94   ±0.00    91.94
Lemmatizer OOV  71.33   ±0.00    71.33
Combined   OOV  69.91   ±0.00    69.91
-----------------------------------------------
OOV input rate  13.04            13.04
```