SeerAttention/SeerAttention-Qwen2.5-7B-AttnGates

This repo only contains the AttnGates' weights for Qwen2.5-7B-Instruct Model.

SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.

Original Github Repo

https://github.com/microsoft/SeerAttention.

Evaluation Results

PG19 PPL

Density	8192 tokens (ppl)	16384 tokens (ppl)	32768 tokens (ppl)
0.10	10.19	9.73	9.59
0.20	9.78	9.53	9.46
0.30	9.67	9.46	9.41
0.40	9.63	9.43	9.39
0.50	9.60	9.42	9.38
1.00	9.58	9.41	9.38

LongBench

Task	0-4k (Dense / Sparse)	4-8k (Dense / Sparse)	8k+ (Dense / Sparse)
hotpotqa	56.86 / 55.65	52.74 / 52.14	55.59 / 55.65
trec	61.00 / 61.00	73.00 / 73.00	70.00 / 71.00
2wikimqa	50.74 / 50.57	48.59 / 48.51	31.51 / 31.66
multi_news	23.72 / 25.84	21.93 / 22.03	20.78 / 22.01
lcc	60.94 / 62.08	64.99 / 66.71	58.84 / 62.83
qasper	44.45 / 46.00	33.69 / 33.26	29.21 / 29.90
passage_count	20.00 / 19.00	7.000 / 7.000	8.000 / 7.000
passage_retrieval_en	97.00 / 97.00	89.00 / 88.00	81.14 / 81.83
triviaqa	88.02 / 86.02	87.82 / 87.99	88.98 / 88.27
samsum	41.38 / 41.97	39.00 / 39.85	45.72 / 45.34
gov_report	31.44 / 34.43	31.34 / 32.60	29.68 / 31.54
repobench-p	65.34 / 65.58	61.06 / 62.66	57.17 / 57.07
multifieldqa_en	57.50 / 56.02	46.61 / 46.33	50.16 / 49.34
averaged score	53.72 / 53.94	50.52 / 50.78	48.21 / 48.73
averaged density	0.842	0.624	0.379

SeerAttention
/

SeerAttention-Qwen2.5-7B-AttnGates

Original Github Repo

Evaluation Results

PG19 PPL

LongBench

Model tree for SeerAttention/SeerAttention-Qwen2.5-7B-AttnGates