Aleksi Sahala commited on
Commit
fc29d6d
1 Parent(s): cdde2d2

init model

Browse files
Files changed (2) hide show
  1. README.md +30 -3
  2. babylonian-1st.tar.gz +3 -0
README.md CHANGED
@@ -1,3 +1,30 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
2
+ Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres.
3
+
4
+ See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian).
5
+
6
+ ## Evaluation results
7
+
8
+ ```
9
+ Neural Net Evaluation
10
+ COMPONENT AVG CI MODEL0
11
+ POS-tagger 96.84 ±0.00 96.84
12
+ Lemmatizer 95.23 ±0.00 95.23
13
+ Combined 93.91 ±0.00 93.91
14
+ POS-tagger OOV 87.41 ±0.00 87.41
15
+ Lemmatizer OOV 71.78 ±0.00 71.78
16
+ Combined OOV 69.63 ±0.00 69.63
17
+ -----------------------------------------------
18
+ OOV input rate 6.63 6.63
19
+
20
+ Post-correct Evaluation
21
+ COMPONENT AVG CI MODEL0
22
+ POS-tagger 96.84 ±0.00 96.84
23
+ Lemmatizer 95.36 ±0.00 95.36
24
+ Combined 94.04 ±0.00 94.04
25
+ POS-tagger OOV 87.41 ±0.00 87.41
26
+ Lemmatizer OOV 71.78 ±0.00 71.78
27
+ Combined OOV 69.63 ±0.00 69.63
28
+ -----------------------------------------------
29
+ OOV input rate 6.63 6.63
30
+ ```
babylonian-1st.tar.gz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6f4ee4dd2e551aece7d3a24ad6adc4a48ff729a7e6431970e6c270d90c620ac4
3
+ size 256815651