File size: 1,142 Bytes
a12967b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Second Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 120k words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the second millennium BCE.

See model Babylonian-1st for the first millennium Babylonian.

## Evaluation results

```
Neural Net Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.85   ±0.00    97.85
Lemmatizer      94.58   ±0.00    94.58
Combined        93.87   ±0.00    93.87
POS-tagger OOV  91.94   ±0.00    91.94
Lemmatizer OOV  71.33   ±0.00    71.33
Combined   OOV  69.91   ±0.00    69.91
-----------------------------------------------
OOV input rate  13.04            13.04



Post-correct Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      97.85   ±0.00    97.85
Lemmatizer      94.59   ±0.00    94.59
Combined        93.88   ±0.00    93.88
POS-tagger OOV  91.94   ±0.00    91.94
Lemmatizer OOV  71.33   ±0.00    71.33
Combined   OOV  69.91   ±0.00    69.91
-----------------------------------------------
OOV input rate  13.04            13.04
```