ssa-perin / README.md

erikve

Updated model card

4c29217 verified 6 months ago

preview code

raw

history blame contribute delete

No virus

5.75 kB

	---
	license: apache-2.0
	datasets:
	- ltg/norec
	language:
	- 'no'
	pipeline_tag: token-classification


	model-index:
	- name: SSA-Perin
	results:
	- task:
	type: structured sentiment analysis
	dataset:
	name: NoReC
	type: NoReC
	metrics:
	- name: Unlabeled sentiment tuple F1
	type: Unlabeled sentiment tuple F1
	value: 44.12%
	- name: Target F1
	type: Target F1
	value: 56.44%
	- name: Relative polarity precision
	type: Relative polarity precision
	value: 93.19%
	---

	# Model Card for SSA-PERIN for Norwegian


	## Model Details

	We here release a pretrained model (and an easy-to-run wrapper) for structured sentiment analysis (SSA) of Norwegian text, trained on the [NoReC_fine](https://github.com/ltgoslo/norec_fine) dataset. It implements a method described in the paper [Direct parsing to sentiment graphs](https://aclanthology.org/2022.acl-short.51/) by Samuel et al. 2022 which demonstrated how a graph-based semantic parser (PERIN) can be applied to the task of structured sentiment analysis, directly predicting sentiment graphs from text.


	### Model Description

	- Developed by: The [SANT](https://www.mn.uio.no/ifi/english/research/projects/sant/) project (Sentiment Analysis for Norwegian Text) at [the Language Technology Group](https://www.mn.uio.no/ifi/english/research/groups/ltg/) (LTG) at the University of Oslo.
	- Funded by: [SANT](https://www.mn.uio.no/ifi/english/research/projects/sant/) is funded by the Research Council of Norway
	- Language(s): Norwegian (Bokmål/Nynorsk)
	- License: [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)

	### Model Sources

	- Paper: [Direct parsing to sentiment graphs](https://aclanthology.org/2022.acl-short.51/) by Samuel et al. published at ACL 2022
	- Repository: The scripts used for training can be found on the [github](https://github.com/jerbarnes/direct_parsing_to_sent_graph) repository accompanying the paper of Samuel et al. (2022) above.
	- Demo: To see a demo of how it works, you can try the model in our [Hugging Face Space](https://huggingface.co/spaces/ltg/ssa-perin).
	- Limitations The training data is based on professional reviews covering multiple domains, but the model may not necessarily generalize to other text types or domains.


	## How to Get Started with the Model


	The model will attempt to identify the following components for a given sentence it deems to be sentiment-bearing: _source expressions_ (the opinion holder), _target expressions_ (what the opinion is directed towards), _polar expressions_ (the part of the text indicating that an opinion is expressed), and finally the _polarity_ (positive or negative). For more information about how these categories are defined in the training data, please see the paper [A Fine-grained Sentiment Dataset for Norwegian](https://aclanthology.org/2020.lrec-1.618/) by Øvrelid et al. 2020. For each identified expression, the character offsets in the text are also provided.

	Here is an example showing how to use the model for predicting such sentiment tuples:

	```python
	>>> import model_wrapper
	>>> model = model_wrapper.PredictionModel()
	>>> model.predict(['vi liker svart kaffe'])
	[{'sent_id': '0',
	'text': 'vi liker svart kaffe',
	'opinions': [{'Source': [['vi'], ['0:2']],
	'Target': [['svart', 'kaffe'], ['9:14', '15:20']],
	'Polar_expression': [['liker'], ['3:8']],
	'Polarity': 'Positive'}]}]
	```

	## Training Details

	### Training Data

	The model is trained on [NoReC_fine](https://github.com/ltgoslo/norec_fine), a dataset for fine-grained sentiment analysis in Norwegian, based on a subset of documents from the [Norwegian Review Corpus](https://huggingface.co/datasets/ltg/norec) (NoReC) which constists of professionally authored reviews from multiple news-sources and across a wide variety of domains, including literature, games, music, products, movies and more.

	- Paper: [A Fine-grained Sentiment Dataset for Norwegian](https://aclanthology.org/2020.lrec-1.618/) by L. Øvrelid, P. Mæhlum, J. Barnes, and E Velldal, in the Proceedings of the 12th Language Resources and Evaluation Conference, Marseille, 2020
	- Repository: [https://github.com/ltgoslo/norec_fine](https://github.com/ltgoslo/norec_fine)


	### Model Configuration and Training Hyperparameters

	The method proposed by Samuel et al. (2022) suggests three different ways to encode sentiment graphs: "node-centric", "labeled-edge", and "opinion-tuple".
	The model released here uses the following configuration:
	- "labeled-edge" graph encoding,
	- no character-level embeddings,
	- all other hyperparameters are set to [default values](https://github.com/jerbarnes/direct_parsing_to_sent_graph/blob/main/perin/config/edge_norec.yaml),
	- trained on top of underlying masked language model [NorBERT 2](https://huggingface.co/ltg/norbert2).

	## Evaluation

	The model achieves the following results on the held-out test set of NoReC_fine (see the paper for description the metrics):

	- Unlabeled sentiment tuple F1: 0.434
	- Target F1: 0.541
	- Relative polarity precision: 0.926


	## Citation

	If you use this model in your academic work, please quote the following paper:
	```bibtex
	@inproceedings{samuel2022,
	title={Direct parsing to sentiment graphs},
	author={David Samuel and Jeremy Barnes and Robin Kurtz and
	Stephan Oepen and Lilja Øvrelid and Erik Velldal},
	year={2022},
	booktitle = "Proceedings of the 60th Annual Meeting of
	the Association for Computational Linguistics",
	address = "Dublin, Ireland"
	}
	```

	## Model Card Authors
	Erik Velldal and Larisa Kolesnichenko