Conditional Utilitarian Roberta 01

Model description

This is a Roberta-based model. It was first fine-tuned on for computing utility estimates of experiences (see utilitarian-roberta-01. It was then further fine-tuned on 160 examples of pairwise comparisons of conditional utilities.

Intended use

The main use case is the computation of utility estimates of first-person text scenarios, under extra contextual information.

Limitations

The model was fine-tuned on only 160 examples, so it should be expected to have limited performance.

Further, while the base model was trained on ~10000 examples, they are still restricted, and only on first-person sentences. It does not have the capability of interpreting highly complex or unusual scenarios, and it does not have hard guarantees on its domain of accuracy.

How to use

Given a scenario S under a context C, and the model U, one computes the estimated conditional utility with U(f'{C} {S}') - U(C).

Training data

The first training data is the train split from the Utilitarianism part of the ETHICS dataset.

The second training data consists of 160 crowdsourced examples of triples (S, C0, C1) consisting of one scenario and two possible contexts, where U(S | C0) > U(S | C1).

Training procedure

Starting from utilitarian-roberta-01, we fine-tune the model over the training data of 160 examples, with a learning rate of 1e-5, a batch size of 8, and for 2 epochs.

Evaluation results

The model achieves ~70% accuracy over 40 crowdsourced examples, from the same distribution as the training data.