Utilitarian Deberta 01

Model description

This is a Deberta model fine-tuned on for computing utility estimates of experiences, represented in first-person sentences. It was trained from human-annotated pairwise utility comparisons, from the ETHICS dataset.

Intended use

The main use case is the computation of utility estimates of first-person text scenarios.

Limitations

The model was only trained on a limited number of scenarios, and only on first-person sentences. It does not have the capability of interpreting highly complex or unusual scenarios, and it does not have hard guarantees on its domain of accuracy.

How to use

The model receives a sentence describing a scenario in first-person, and outputs a scalar representing a utility estimate.

Training data

The training data is the train split from the Utilitarianism part of the ETHICS dataset.

Training procedure

Training can be reproduced by executing the training procedure from tune.py as follows:

python tune.py --ngpus 1 --model microsoft/deberta-v3-large --learning_rate 1e-5 --batch_size 16 --nepochs 2

Evaluation results

The model achieves 92.2% accuracy on The Moral Uncertainty Research Competition, which consists of a subset of the ETHICS dataset.