Conditional Utilitarian Deberta 01

Model description

Intended use

The main use case is the computation of utility estimates of first-person and third-person text scenarios, under extra contextual information. The person's utility to evaluate can be specified in the context.

Limitations

The model was trained on only ~10000 general utility examples and ~800 conditional utility examples, so it should be expected to have limited performance.

It does not have the capability of interpreting highly complex or unusual scenarios, and it does not have hard guarantees on its domain of accuracy.

How to use

Given a scenario S under a context C, and the model U, one computes the estimated conditional utility with U(f'{C} | {S}') - U(C).

In addition, you should specify the person for whom to evaluate utility. The model was trained using the phrases f"I care only about {person}'s utility." and "I care only about my own utility.".

Training data

The first training data is the train split from the Utilitarianism part of the ETHICS dataset.

The second training data consists of ~800 crowdsourced examples of triples (S, C0, C1) consisting of one scenario and two possible contexts, where U(S | C0) > U(S | C1).

Both of these sets are converted from the first person to the third person using GPT3.

Training procedure

DeBERTa-v3-large was fine-tuned the model over the training data, with a learning rate of 1e-5, a batch size of 16, and for 1 epoch.

The training procedure generally follows tune.py. In addition to the ranked pairs of both first and third person scenarios, the examples were included to apply the following restrictions:

First person examples where you care about your own utility and the corresponding third person example where the subject's utility is cared about should have the same utility.
Third person examples where you care about your own utility and first person examples where you care about a random person's utility (not in the scenario) should each have zero utility.

Evaluation results

The model achieves ~80% accuracy over the ethics test set, from the same distribution as the training data.