File size: 1,287 Bytes
fc29d6d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres.

See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian).

## Evaluation results

```
Neural Net Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      96.84   ±0.00    96.84
Lemmatizer      95.23   ±0.00    95.23
Combined        93.91   ±0.00    93.91
POS-tagger OOV  87.41   ±0.00    87.41
Lemmatizer OOV  71.78   ±0.00    71.78
Combined   OOV  69.63   ±0.00    69.63
-----------------------------------------------
OOV input rate  6.63             6.63

Post-correct Evaluation
COMPONENT       AVG     CI       MODEL0
POS-tagger      96.84   ±0.00    96.84
Lemmatizer      95.36   ±0.00    95.36
Combined        94.04   ±0.00    94.04
POS-tagger OOV  87.41   ±0.00    87.41
Lemmatizer OOV  71.78   ±0.00    71.78
Combined   OOV  69.63   ±0.00    69.63
-----------------------------------------------
OOV input rate  6.63             6.63
```