arxiv:2305.11206

LIMA: Less Is More for Alignment

Published on May 18, 2023

· Submitted by

akhaliq on May 21, 2023

#1 Paper of the day

Upvote

Authors:

Pengfei Liu ,

Jiao Sun ,

Xuezhe Ma ,

Omer Levy

Abstract

Large language models are trained in two stages: (1) unsupervised pretraining from raw text, to learn general-purpose representations, and (2) large scale instruction tuning and reinforcement learning, to better align to end tasks and user preferences. We measure the relative importance of these two stages by training LIMA, a 65B parameter LLaMa language model fine-tuned with the standard supervised loss on only 1,000 carefully curated prompts and responses, without any reinforcement learning or human preference modeling. LIMA demonstrates remarkably strong performance, learning to follow specific response formats from only a handful of examples in the training data, including complex queries that range from planning trip itineraries to speculating about alternate history. Moreover, the model tends to generalize well to unseen tasks that did not appear in the training data. In a controlled human study, responses from LIMA are either equivalent or strictly preferred to GPT-4 in 43% of cases; this statistic is as high as 58% when compared to Bard and 65% versus DaVinci003, which was trained with human feedback. Taken together, these results strongly suggest that almost all knowledge in large language models is learned during pretraining, and only limited instruction tuning data is necessary to teach models to produce high quality output.

View arXiv page View PDF Add to collection

Community

NabeelZ

May 23, 2023

What is a distraction and why is happening again and again

znsoft

May 23, 2023

•

edited May 23, 2023

Where can we download the 1000 data?
Or you guys plan to open this trained model?

bair330

May 26, 2023

How is this different from few-shot? except it's 1000-shot.

saratchinni

Jun 9, 2023

They are actually fine-tuning(Supervised learning) on this small 1000-curated dataset, which is different from k-shot where we put some examples in a prompt.

saratchinni

Jun 9, 2023

The link to the dataset is: https://huggingface.co/datasets/GAIR/lima/

taeshahn

Jun 29, 2023

You can also find Korean translated version of LIMA dataset HERE!

davodavo

Jul 3, 2023

Thanks for sharing the paper! If I'm not mistaken the model described in the paper is not public yet, will the model be made public in the near future?