Aleksi Sahala
commited on
Commit
•
17bc223
1
Parent(s):
c1e3982
init model
Browse files- README.md +59 -3
- sumerian-adm.tar.gz +3 -0
- sumerian-lit.tar.gz +3 -0
README.md
CHANGED
@@ -1,3 +1,59 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Sumerian models for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
|
2 |
+
|
3 |
+
These models use indexed logo-syllabic tokenization and require BabyLemmatizer 2.1. Consists of two models, sumerian-lit for literary Sumerian and sumerian-adm for Administrative Sumerian.
|
4 |
+
|
5 |
+
```sumerian-adm``` consists of all Sumerian Early Dynastic, Old Babylonian, Old Akkadian, Ebla and Lagaš II administrative texts in Oracc's ePSD2 corpus, consisting 570k words.
|
6 |
+
|
7 |
+
```sumerian-lit``` consists of all Sumerian literary texts from Oracc comprising 268k words.
|
8 |
+
|
9 |
+
## Evaluation results for administrative model
|
10 |
+
|
11 |
+
```
|
12 |
+
Neural Net Evaluation
|
13 |
+
COMPONENT AVG CI MODEL0
|
14 |
+
POS-tagger 96.48 ±0.00 96.48
|
15 |
+
Lemmatizer 95.39 ±0.00 95.39
|
16 |
+
Combined 94.42 ±0.00 94.42
|
17 |
+
POS-tagger OOV 82.03 ±0.00 82.03
|
18 |
+
Lemmatizer OOV 71.87 ±0.00 71.87
|
19 |
+
Combined OOV 68.00 ±0.00 68.00
|
20 |
+
-----------------------------------------------
|
21 |
+
OOV input rate 5.44 5.44
|
22 |
+
|
23 |
+
Post-correct Evaluation
|
24 |
+
COMPONENT AVG CI MODEL0
|
25 |
+
POS-tagger 96.48 ±0.00 96.48
|
26 |
+
Lemmatizer 95.42 ±0.00 95.42
|
27 |
+
Combined 94.44 ±0.00 94.44
|
28 |
+
POS-tagger OOV 82.03 ±0.00 82.03
|
29 |
+
Lemmatizer OOV 71.87 ±0.00 71.87
|
30 |
+
Combined OOV 68.00 ±0.00 68.00
|
31 |
+
-----------------------------------------------
|
32 |
+
OOV input rate 5.44 5.44
|
33 |
+
```
|
34 |
+
|
35 |
+
## Evaluation results for literary model
|
36 |
+
|
37 |
+
```
|
38 |
+
Neural Net Evaluation
|
39 |
+
COMPONENT AVG CI MODEL0
|
40 |
+
POS-tagger 94.00 ±0.00 94.00
|
41 |
+
Lemmatizer 93.71 ±0.00 93.71
|
42 |
+
Combined 91.37 ±0.00 91.37
|
43 |
+
POS-tagger OOV 82.61 ±0.00 82.61
|
44 |
+
Lemmatizer OOV 80.87 ±0.00 80.87
|
45 |
+
Combined OOV 74.54 ±0.00 74.54
|
46 |
+
-----------------------------------------------
|
47 |
+
OOV input rate 19.04 19.04
|
48 |
+
|
49 |
+
Post-correct Evaluation
|
50 |
+
COMPONENT AVG CI MODEL0
|
51 |
+
POS-tagger 94.00 ±0.00 94.00
|
52 |
+
Lemmatizer 93.70 ±0.00 93.70
|
53 |
+
Combined 91.36 ±0.00 91.36
|
54 |
+
POS-tagger OOV 82.61 ±0.00 82.61
|
55 |
+
Lemmatizer OOV 80.87 ±0.00 80.87
|
56 |
+
Combined OOV 74.54 ±0.00 74.54
|
57 |
+
-----------------------------------------------
|
58 |
+
OOV input rate 19.04 19.04
|
59 |
+
```
|
sumerian-adm.tar.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:486123fdb936f82ee19020eb251ee5d9414cd66a6dff642232745a766162f471
|
3 |
+
size 225529795
|
sumerian-lit.tar.gz
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c3c2fc27fcbc3696bcce1004b44c36fadcb52dda52c65301561853fd13790f9f
|
3 |
+
size 215265248
|