arxiv:2407.12077

GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

Published on Jul 16

· Submitted by

Fareso on Jul 18

#3 Paper of the day

Authors:

,

Fares Obeid ,

Eric Alcaide ,

,

Abstract

We introduce GoldFinch, a hybrid Linear Attention/Transformer sequence model that uses a new technique to efficiently generate a highly compressed and reusable KV-Cache in linear time and space with respect to sequence length. GoldFinch stacks our new GOLD transformer on top of an enhanced version of the Finch (RWKV-6) architecture. We train up to 1.5B parameter class models of the Finch, Llama, and GoldFinch architectures, and find dramatically improved modeling performance relative to both Finch and Llama. Our cache size savings increase linearly with model layer count, ranging from 756-2550 times smaller than the traditional transformer cache for common sizes, enabling inference of extremely large context lengths even on limited hardware. Although autoregressive generation has O(n) time complexity per token because of attention, pre-fill computation of the entire initial cache state for a submitted context costs only O(1) time per token due to the use of a recurrent neural network (RNN) to generate this cache. We release our trained weights and training code under the Apache 2.0 license for community use.

View arXiv page View PDF Add to collection

Community

Fareso

Paper author Paper submitter 4 days ago

https://github.com/recursal/GoldFinch-paper

·

nielsr

2 days ago

Hi @Fareso congrats on this work!

Would you be able to link the model to this paper page? I opened a PR here: https://huggingface.co/recursal/GoldFinch-paper/discussions/1.

Also, note that we recommend pushing each checkpoint to a separate model repository, so that things like download stats work. Read more here:

uploading models: https://huggingface.co/docs/hub/models-uploading
making download stats work: https://huggingface.co/docs/hub/models-download-stats

Let me know if you need any help!

Cheers,
Niels

Fareso

Paper author Paper submitter 4 days ago

Fareso

Paper author Paper submitter 4 days ago

4 days ago

@librarian-bot recommend

·

4 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

ZhangRC

3 days ago

Apparently, this model receives more attention than Eagle/Finch

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2407.12077 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2407.12077 in a Space README.md to link it from this page.

Collections including this paper 6