uvegesistvan
/

Hun_RoBERTa_large_Plain

Model card Files Files and versions Metrics Training metrics Community

Hun_RoBERTa_large_Plain / README.md

uvegesistvan's picture

Update README.md

4719f52 verified 3 months ago

|

1.84 kB

	---
	license: cc-by-nc-4.0
	language:
	- hu
	metrics:
	- accuracy
	- f1
	model-index:
	- name: Hun_RoBERTa_large_Plain
	results:
	- task:
	type: text-classification
	metrics:
	- type: accuracy
	value: 0.79
	- type: f1
	value: 0.79
	widget:
	- text: "A tanúsítvány meghatározott adatainak a 2008/118/EK irányelv IV. fejezete szerinti szállításához szükséges adminisztratív okmányban..."
	example_title: "Incomprehensible"
	- text: "Az AEO-engedély birtokosainak listáján – keresésre – megjelenő információk: az engedélyes neve, az engedélyt kibocsátó ország..."
	example_title: "Comprehensible"

	---

	## Model description

	Cased fine-tuned XLM-RoBERTa-large model for Hungarian, trained to classify sentences based on their plain language comprehensibility.

	## Intended uses & limitations

	The model is designed to classify sentences as either "comprehensible" or "not comprehensible" (according to Plain Language guidelines):
	* Label_0 - "comprehensible"
	* Label_1 - "not comprehensible"

	## Training

	Fine-tuned version of the original `xlm-roberta-large` model, trained on a dataset of Hungarian legal and administrative texts.

	## Eval results

	\| Class \| Precision \| Recall \| F-Score \|
	\| ----- \| --------- \| ------ \| ------- \|
	\| Comprehensible / Label_0 \| 0.76 \| 0.86 \| 0.81 \|
	\| Not comprehensible / Label_1 \| 0.83 \| 0.72 \| 0.77 \|
	\| accuracy \| \| \| 0.79 \|
	\| macro avg \| 0.80 \| 0.79 \| 0.79 \|
	\| weighted avg \| 0.79 \| 0.79 \| 0.79 \|

	## Usage

	```py
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("uvegesistvan/Hun_RoBERTa_large_Plain")
	model = AutoModelForSequenceClassification.from_pretrained("uvegesistvan/Hun_RoBERTa_large_Plain")
	```