arxiv:2402.10790

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Published on Feb 16

· Submitted by

akhaliq on Feb 19

#2 Paper of the day

Upvote

Authors:

Yuri Kuratov ,

Aydar Bulatov ,

Petr Anokhin ,

Dmitry Sorokin ,

Artyom Sorokin ,

Mikhail Burtsev

Abstract

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to 10^4 elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to 10^7 elements. This achievement marks a substantial leap, as it is by far the longest input processed by any open neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

View arXiv page View PDF Add to collection

Community

MichaelBarryUK

Feb 19

Great paper

Why does "QA3: Three Supporting Facts" have between 2 and 320 facts?

For such a large variance, the description is misleading, and it should be broken up into ranges to make it easier to see the distribution.

Or am I missing something?

booydar

Paper author Feb 20

Thank you for the feedback!
All QA* tasks are based on the bAbI dataset. Some rare samples of qa3 do indeed have a large total number of facts, but most of them have less than 100.
However for qa3 only 3 supporting facts are needed to answer the question, other ones act as distractors. Supporting facts in the task context are still like a needle in a haystack.

librarian-bot

Feb 20

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

paisleypark

Feb 21

just curious are the authors planning on releasing the code this time? It's been 2 years since the first RMT paper and their's still no working implmentations in the community.