asahala
/

BabyLemmatizer-Babylonian-1st

Model card Files Files and versions Community

BabyLemmatizer-Babylonian-1st / README.md

Aleksi Sahala

init model

fc29d6d 10 months ago

|

raw history blame contribute delete

No virus

1.29 kB

	# First Millennium Babylonian model for [BabyLemmatizer](https://github.com/asahala/BabyLemmatizer)
	Total data set size ca. 1.3M words (including lacunae). Consists of all Oracc texts labeled as any variant of Babylonian or Akkadian in the first millennium BCE. Neo-Assyrian excluded. OOV rate is fairly low but the data set is very varied and comprises all different text genres.

	See model Babylonian-2nd for Middle Babylonian (and in general second millennium Babylonian).

	## Evaluation results

	```
	Neural Net Evaluation
	COMPONENT AVG CI MODEL0
	POS-tagger 96.84 ±0.00 96.84
	Lemmatizer 95.23 ±0.00 95.23
	Combined 93.91 ±0.00 93.91
	POS-tagger OOV 87.41 ±0.00 87.41
	Lemmatizer OOV 71.78 ±0.00 71.78
	Combined OOV 69.63 ±0.00 69.63
	-----------------------------------------------
	OOV input rate 6.63 6.63

	Post-correct Evaluation
	COMPONENT AVG CI MODEL0
	POS-tagger 96.84 ±0.00 96.84
	Lemmatizer 95.36 ±0.00 95.36
	Combined 94.04 ±0.00 94.04
	POS-tagger OOV 87.41 ±0.00 87.41
	Lemmatizer OOV 71.78 ±0.00 71.78
	Combined OOV 69.63 ±0.00 69.63
	-----------------------------------------------
	OOV input rate 6.63 6.63
	```