This repo only contains the AttnGates' weights for Qwen2.5-7B-Instruct Model.
SeerAttention introduces learnable AttnGate modules to accelerate the computationally intensive prefill stage of long-context large language models (LLMs) via dynamic block-level sparsity. The AttnGates are trained in a parameter-efficient self-distillation framework, where they learn to mimic the 2D max-pooled attention patterns of the original frozen model, preserving its integrity while avoiding costly retraining. During inference, these gates generate block-sparse binary masks by applying threshold/TopK to their learned soft scores, enabling efficient computation through a custom block-sparse FlashAttention kernel.
Original Github Repo
https://github.com/microsoft/SeerAttention.
Evaluation Results
PG19 PPL
Density | 8192 tokens (ppl) | 16384 tokens (ppl) | 32768 tokens (ppl) |
---|---|---|---|
0.10 | 10.19 | 9.73 | 9.59 |
0.20 | 9.78 | 9.53 | 9.46 |
0.30 | 9.67 | 9.46 | 9.41 |
0.40 | 9.63 | 9.43 | 9.39 |
0.50 | 9.60 | 9.42 | 9.38 |
1.00 | 9.58 | 9.41 | 9.38 |
LongBench
Task | 0-4k (Dense / Sparse) | 4-8k (Dense / Sparse) | 8k+ (Dense / Sparse) |
---|---|---|---|
hotpotqa | 56.86 / 55.65 | 52.74 / 52.14 | 55.59 / 55.65 |
trec | 61.00 / 61.00 | 73.00 / 73.00 | 70.00 / 71.00 |
2wikimqa | 50.74 / 50.57 | 48.59 / 48.51 | 31.51 / 31.66 |
multi_news | 23.72 / 25.84 | 21.93 / 22.03 | 20.78 / 22.01 |
lcc | 60.94 / 62.08 | 64.99 / 66.71 | 58.84 / 62.83 |
qasper | 44.45 / 46.00 | 33.69 / 33.26 | 29.21 / 29.90 |
passage_count | 20.00 / 19.00 | 7.000 / 7.000 | 8.000 / 7.000 |
passage_retrieval_en | 97.00 / 97.00 | 89.00 / 88.00 | 81.14 / 81.83 |
triviaqa | 88.02 / 86.02 | 87.82 / 87.99 | 88.98 / 88.27 |
samsum | 41.38 / 41.97 | 39.00 / 39.85 | 45.72 / 45.34 |
gov_report | 31.44 / 34.43 | 31.34 / 32.60 | 29.68 / 31.54 |
repobench-p | 65.34 / 65.58 | 61.06 / 62.66 | 57.17 / 57.07 |
multifieldqa_en | 57.50 / 56.02 | 46.61 / 46.33 | 50.16 / 49.34 |
averaged score | 53.72 / 53.94 | 50.52 / 50.78 | 48.21 / 48.73 |
averaged density | 0.842 | 0.624 | 0.379 |
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.