Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Published on Jun 2
· Submitted by akhaliq on Jun 4
#2 Paper of the day


Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number (<10) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants (N=16). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.


There's a plain english summary of the paper here - feedback from the authors is welcome!

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite in a model to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite in a dataset to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite in a Space to link it from this page.

Collections including this paper 14