asahala
/

BabyLemmatizer-Babylonian-1st

Model card Files Files and versions Community

Aleksi Sahala commited on Sep 7, 2023

Commit

fc29d6d

•

1 Parent(s): cdde2d2

init model

Files changed (2) hide show

README.md +30 -3
babylonian-1st.tar.gz +3 -0

README.md CHANGED Viewed

@@ -1,3 +1,30 @@
----
-license: cc-by-4.0
----

+# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
+Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres.
+See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian).
+## Evaluation results
+```
+Neural Net Evaluation
+COMPONENT       AVG     CI       MODEL0
+POS-tagger      96.84   ±0.00    96.84
+Lemmatizer      95.23   ±0.00    95.23
+Combined        93.91   ±0.00    93.91
+POS-tagger OOV  87.41   ±0.00    87.41
+Lemmatizer OOV  71.78   ±0.00    71.78
+Combined   OOV  69.63   ±0.00    69.63
+-----------------------------------------------
+OOV input rate  6.63             6.63
+Post-correct Evaluation
+COMPONENT       AVG     CI       MODEL0
+POS-tagger      96.84   ±0.00    96.84
+Lemmatizer      95.36   ±0.00    95.36
+Combined        94.04   ±0.00    94.04
+POS-tagger OOV  87.41   ±0.00    87.41
+Lemmatizer OOV  71.78   ±0.00    71.78
+Combined   OOV  69.63   ±0.00    69.63
+-----------------------------------------------
+OOV input rate  6.63             6.63
+```

babylonian-1st.tar.gz ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f4ee4dd2e551aece7d3a24ad6adc4a48ff729a7e6431970e6c270d90c620ac4
+size 256815651