This is a Deberta model fine-tuned on for computing utility estimates of experiences, represented in first-person sentences. It was trained from human-annotated pairwise utility comparisons, from the ETHICS dataset.
The main use case is the computation of utility estimates of first-person text scenarios.
The model was only trained on a limited number of scenarios, and only on first-person sentences. It does not have the capability of interpreting highly complex or unusual scenarios, and it does not have hard guarantees on its domain of accuracy.
The model receives a sentence describing a scenario in first-person, and outputs a scalar representing a utility estimate.
The training data is the train split from the Utilitarianism part of the ETHICS dataset.
Training can be reproduced by executing the training procedure from
tune.py as follows:
python tune.py --ngpus 1 --model microsoft/deberta-v3-large --learning_rate 1e-5 --batch_size 16 --nepochs 2
The model achieves 92.2% accuracy on The Moral Uncertainty Research Competition, which consists of a subset of the ETHICS dataset.
- Downloads last month