Papers
arxiv:2402.10790

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Published on Feb 16
· Featured in Daily Papers on Feb 19

Abstract

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

Community

Great paper

Why does "QA3: Three Supporting Facts" have between 2 and 320 facts?

For such a large variance, the description is misleading, and it should be broken up into ranges to make it easier to see the distribution.

Or am I missing something?

·
Paper author

Thank you for the feedback!
All QA* tasks are based on the bAbI dataset. Some rare samples of qa3 do indeed have a large total number of facts, but most of them have less than 100.
However for qa3 only 3 supporting facts are needed to answer the question, other ones act as distractors. Supporting facts in the task context are still like a needle in a haystack.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

just curious are the authors planning on releasing the code this time? It's been 2 years since the first RMT paper and their's still no working implmentations in the community.

·

I am also interested in the code

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.10790 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 21