kredor
/

punctuate-all

Token Classification

Model card Files Files and versions Community

punctuate-all / README.md

Orlin's picture

Update README.md

0fe3701 verified about 1 year ago

|

history blame contribute delete

1.37 kB

	---
	license: mit
	datasets:
	- wmt/europarl
	metrics:
	- f1
	- recall
	- precision
	---
	This is based on [Oliver Guhr's work](https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large). The difference is that it is a finetuned xlm-roberta-base instead of an xlm-roberta-large and on twelve languages instead of four. The languages are: English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portugese, Slovak, Slovenian.

	----- report -----

	precision recall f1-score support

	0 0.99 0.99 0.99 73317475
	. 0.94 0.95 0.95 4484845
	, 0.86 0.86 0.86 6100650
	? 0.88 0.85 0.86 136479
	- 0.60 0.29 0.39 233630
	: 0.71 0.49 0.58 152424

	accuracy 0.98 84425503
	macro avg 0.83 0.74 0.77 84425503
	weighted avg 0.98 0.98 0.98 84425503


	----- confusion matrix -----

	t/p 0 . , ? - :
	0 1.0 0.0 0.0 0.0 0.0 0.0
	. 0.0 1.0 0.0 0.0 0.0 0.0
	, 0.1 0.0 0.9 0.0 0.0 0.0
	? 0.0 0.1 0.0 0.8 0.0 0.0
	- 0.1 0.1 0.5 0.0 0.3 0.0
	: 0.0 0.3 0.1 0.0 0.0 0.5