From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty
Abstract
Large language models (LLMs) often exhibit undesirable behaviors, such as hallucinations and sequence repetitions. We propose to view these behaviors as fallbacks that models exhibit under uncertainty, and investigate the connection between them. We categorize fallback behaviors -- sequence repetitions, degenerate text, and hallucinations -- and extensively analyze them in models from the same family that differ by the amount of pretraining tokens, parameter count, or the inclusion of instruction-following training. Our experiments reveal a clear and consistent ordering of fallback behaviors, across all these axes: the more advanced an LLM is (i.e., trained on more tokens, has more parameters, or instruction-tuned), its fallback behavior shifts from sequence repetitions, to degenerate text, and then to hallucinations. Moreover, the same ordering is observed throughout a single generation, even for the best-performing models; as uncertainty increases, models shift from generating hallucinations to producing degenerate text and then sequence repetitions. Lastly, we demonstrate that while common decoding techniques, such as random sampling, might alleviate some unwanted behaviors like sequence repetitions, they increase harder-to-detect hallucinations.
Community
What do LLMs do when they are uncertain? We found that the stronger the LLM, the more it hallucinates and the less it loops! This pattern extends to sampling methods and instruction tuning.
We categorize fallback behaviors into types: sequence repetitions, degenerate text, and hallucinations. By pushing models towards uncertainty, we analyze their emergence across different model sizes, architectures, pretraining token counts, and instruction-following training.
We find that the more advanced an LLM is (more parameters, longer pretraining, or instruction-tuning), the more complex its fallback behaviors, shifting from sequence repetitions to degenerate text and then to hallucinations.
Even the best-performing models show this order within a single generation. As they try to recall more facts about a topic, they move from generating hallucinations to degenerate text, then to sequence repetitions.
Interestingly, common decoding techniques like random temperature sampling can reduce some behaviors (like sequence repetitions) but increase harder-to-detect hallucinations.
We also find evidence that this shift is continuous, with models becoming more degenerate with generation length, as measured by proportion of unique tokens in the sequence and compared to human baseline on the same topics.
Congrats @Mivg on this work! Are you planning to upload the dataset to the hub?
If so, here's how to link it to this paper: https://huggingface.co/docs/hub/en/datasets-cards#linking-a-paper
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper