Aleksi Sahala commited on
Commit
17bc223
1 Parent(s): c1e3982

init model

Browse files
Files changed (3) hide show
  1. README.md +59 -3
  2. sumerian-adm.tar.gz +3 -0
  3. sumerian-lit.tar.gz +3 -0
README.md CHANGED
@@ -1,3 +1,59 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Sumerian models for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
2
+
3
+ These models use indexed logo-syllabic tokenization and require BabyLemmatizer 2.1. Consists of two models, sumerian-lit for literary Sumerian and sumerian-adm for Administrative Sumerian.
4
+
5
+ ```sumerian-adm``` consists of all Sumerian Early Dynastic, Old Babylonian, Old Akkadian, Ebla and Lagaš II administrative texts in Oracc's ePSD2 corpus, consisting 570k words.
6
+
7
+ ```sumerian-lit``` consists of all Sumerian literary texts from Oracc comprising 268k words.
8
+
9
+ ## Evaluation results for administrative model
10
+
11
+ ```
12
+ Neural Net Evaluation
13
+ COMPONENT AVG CI MODEL0
14
+ POS-tagger 96.48 ±0.00 96.48
15
+ Lemmatizer 95.39 ±0.00 95.39
16
+ Combined 94.42 ±0.00 94.42
17
+ POS-tagger OOV 82.03 ±0.00 82.03
18
+ Lemmatizer OOV 71.87 ±0.00 71.87
19
+ Combined OOV 68.00 ±0.00 68.00
20
+ -----------------------------------------------
21
+ OOV input rate 5.44 5.44
22
+
23
+ Post-correct Evaluation
24
+ COMPONENT AVG CI MODEL0
25
+ POS-tagger 96.48 ±0.00 96.48
26
+ Lemmatizer 95.42 ±0.00 95.42
27
+ Combined 94.44 ±0.00 94.44
28
+ POS-tagger OOV 82.03 ±0.00 82.03
29
+ Lemmatizer OOV 71.87 ±0.00 71.87
30
+ Combined OOV 68.00 ±0.00 68.00
31
+ -----------------------------------------------
32
+ OOV input rate 5.44 5.44
33
+ ```
34
+
35
+ ## Evaluation results for literary model
36
+
37
+ ```
38
+ Neural Net Evaluation
39
+ COMPONENT AVG CI MODEL0
40
+ POS-tagger 94.00 ±0.00 94.00
41
+ Lemmatizer 93.71 ±0.00 93.71
42
+ Combined 91.37 ±0.00 91.37
43
+ POS-tagger OOV 82.61 ±0.00 82.61
44
+ Lemmatizer OOV 80.87 ±0.00 80.87
45
+ Combined OOV 74.54 ±0.00 74.54
46
+ -----------------------------------------------
47
+ OOV input rate 19.04 19.04
48
+
49
+ Post-correct Evaluation
50
+ COMPONENT AVG CI MODEL0
51
+ POS-tagger 94.00 ±0.00 94.00
52
+ Lemmatizer 93.70 ±0.00 93.70
53
+ Combined 91.36 ±0.00 91.36
54
+ POS-tagger OOV 82.61 ±0.00 82.61
55
+ Lemmatizer OOV 80.87 ±0.00 80.87
56
+ Combined OOV 74.54 ±0.00 74.54
57
+ -----------------------------------------------
58
+ OOV input rate 19.04 19.04
59
+ ```
sumerian-adm.tar.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:486123fdb936f82ee19020eb251ee5d9414cd66a6dff642232745a766162f471
3
+ size 225529795
sumerian-lit.tar.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c3c2fc27fcbc3696bcce1004b44c36fadcb52dda52c65301561853fd13790f9f
3
+ size 215265248