arxiv:2404.07979

LLoCO: Learning Long Contexts Offline

Published on Apr 11

· Submitted by

akhaliq on Apr 12

Upvote

Authors:

Sijun Tan ,

Xiuyu Li ,

Shishir Patil ,

Ziyang Wu ,

Tianjun Zhang ,

Kurt Keutzer ,

Joseph E. Gonzalez ,

Abstract

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM to create a concise representation of the original context and efficiently retrieve relevant information to answer questions accurately. We introduce LLoCO, a technique that combines context compression, retrieval, and parameter-efficient finetuning using LoRA. Our approach extends the effective context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We evaluate our approach on several long-context question-answering datasets, demonstrating that LLoCO significantly outperforms in-context learning while using 30times fewer tokens during inference. LLoCO achieves up to 7.62times speed-up and substantially reduces the cost of long document question answering, making it a promising solution for efficient long context processing. Our code is publicly available at https://github.com/jeffreysijuntan/lloco.

View arXiv page View PDF Add to collection

Community

MichaelBarryUK

Apr 12

•

edited Apr 12

So to do QA on a book:

Summarise/Compress the book using a separate LLM
Store it in a vector database
Generate the answers to all the questions that you want to ask
Finetune it
Voila. You can now ask it questions...

It's a bit cumbersome, and for the use-case proscribed, it defeats it's own purpose (You have have to generate the QA pairs! In the real world, these don't exist yet, hence the reason for doing the QA in the first place)

I'm sure what you've built works great in certain circumstances (books like the Bible) but for real world on the fly use cases (newly released books, legal texts, confidential data etc) this is cracking a nut with a sledgehammer, only to find you already had a pocketful of cracked nuts.