arxiv:2407.06071

From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty

Published on Jul 8

· Submitted by

Mivg on Jul 10

Upvote

Authors:

Maor Ivgi ,

Ori Yoran ,

Jonathan Berant ,

Mor Geva

Abstract

Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.

View arXiv page View PDF Add to collection

Community

Mivg

Paper author Paper submitter Jul 10

What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning.

We categorize fallback behaviors into types: sequence repetitions, degenerate text, and hallucinations. By pushing models towards uncertainty, we analyze their emergence across different model sizes, architectures, pretraining token counts, and instruction-following training.

We find that the more advanced an LLM is (more parameters, longer pretraining, or instruction-tuning), the more complex its fallback behaviors, shifting from sequence repetitions to degenerate text and then to hallucinations.

Even the best-performing models show this order within a single generation. As they try to recall more facts about a topic, they move from generating hallucinations to degenerate text, then to sequence repetitions.

Interestingly, common decoding techniques like random temperature sampling can reduce some behaviors (like sequence repetitions) but increase harder-to-detect hallucinations.

We also find evidence that this shift is continuous, with models becoming more degenerate with generation length, as measured by proportion of unique tokens in the sequence and compared to human baseline on the same topics.

nielsr

Jul 12

Congrats @Mivg on this work! Are you planning to upload the dataset to the hub?

If so, here's how to link it to this paper: https://huggingface.co/docs/hub/en/datasets-cards#linking-a-paper

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2407.06071 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.06071 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.06071 in a Space README.md to link it from this page.