poltextlab
/

HunEmBERT8

Text Classification

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

HunEmBERT8 / README.md

poltextlab's picture

Update README.md

20fbdf8 verified 8 months ago

|

history blame contribute delete

2.48 kB

	---
	license: apache-2.0
	language:
	- hu
	metrics:
	- accuracy
	model-index:
	- name: huBERTPlain
	results:
	- task:
	type: text-classification
	metrics:
	- type: f1
	value: 0.77
	---

	## Model description

	Cased fine-tuned BERT model for Hungarian, trained on (manuallay anniated) parliamentary pre-agenda speeches scraped from `parlament.hu`.

	## Intended uses & limitations

	The model can be used as any other (cased) BERT model. It has been tested recognizing emotions at the sentence level in (parliamentary) pre-agenda speeches, where:
	* 'Label_0': Neutral
	* 'Label_1': Fear
	* 'Label_2': Sadness
	* 'Label_3': Anger
	* 'Label_4': Disgust
	* 'Label_5': Success
	* 'Label_6': Joy
	* 'Label_7': Trust

	## Training

	Fine-tuned version of the original huBERT model (`SZTAKI-HLT/hubert-base-cc`), trained on HunEmPoli corpus.

	\| Category \| Count \| Ratio \| Sentiment \| Count \| Ratio \|
	\| -------- \| ----- \| ------ \| --------- \| ----- \| ------ \|
	\| Neutral \| 351 \| 1.85% \| Neutral \| 351 \| 1.85% \|
	\| Fear \| 162 \| 0.85% \| Negative \| 11180 \| 58.84% \|
	\| Sadness \| 4258 \| 22.41% \|
	\| Anger \| 643 \| 3.38% \|
	\| Disgust \| 6117 \| 32.19% \|
	\| Success \| 6602 \| 34.74% \| Positive \| 7471 \| 39.32% \|
	\| Joy \| 441 \| 2.32% \|
	\| Trust \| 428 \| 2.25% \|
	\| Sum \| 19002 \| \| \| \| \|

	## Eval results

	\| Class \| Precision \| Recall \| F-Score \|
	\|-----\|------------\|------------\|------\|
	\| Fear \| 0.625 \| 0.625 \| 0.625 \|
	\| Sadness \| 0.8535 \| 0.6291 \| 0.7243 \|
	\| Anger \| 0.7857 \| 0.3437 \| 0.4782 \|
	\| Disgust \| 0.7154 \| 0.8790 \| 0.7888 \|
	\| Success \| 0.8579 \| 0.8683 \| 0.8631 \|
	\| Joy \| 0.549 \| 0.6363 \| 0.5894 \|
	\| Trust \| 0.4705 \| 0.5581 \| 0.5106 \|
	\| Macro AVG \| 0.7134 \| 0.6281 \| 0.6497 \|
	\| Weighted AVG \| 0.791 \| 0.7791 \| 0.7743 \|


	## Usage

	```py
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	tokenizer = AutoTokenizer.from_pretrained("poltextlab/HunEmBERT8")
	model = AutoModelForSequenceClassification.from_pretrained("poltextlab/HunEmBERT8")
	```

	### BibTeX entry and citation info

	If you use the model, please cite the following paper:

	Bibtex:
	```bibtex
	@ARTICLE{10149341,
	author={{"U}veges, Istv{\'a}n and Ring, Orsolya},
	journal={IEEE Access},
	title={HunEmBERT: a fine-tuned BERT-model for classifying sentiment and emotion in political communication},
	year={2023},
	volume={11},
	number={},
	pages={60267-60278},
	doi={10.1109/ACCESS.2023.3285536}
	}
	```