Token Importance Scoring (TIS) v8b - Stage 3 ERT

This checkpoint contains the Token Importance Scoring (TIS) components trained with Efficient Retrieval Training (ERT) objective for learned KV cache compression in large language models.

Model Description

Token Importance Scoring (TIS) is a learned approach to KV cache compression that achieves 100% NIAH accuracy while maintaining strong semantic retrieval performance. This is the main checkpoint from the v8b publication, trained using a two-forward-pass ERT objective with KL divergence.

Key Features:

✅ 100% NIAH accuracy at 50% cache budget
✅ 52.8% LITM accuracy at 50% cache budget
✅ Consumer GPU compatible (RTX 5070, 8GB VRAM)
✅ RMSNorm + Hard-Anchor forcing for stability
✅ Efficient two-forward-pass training

Performance

Benchmarks (50% Cache Budget)

Benchmark	TIS v8b (this)	Vanilla	H2O	StreamingLLM	SnapKV
NIAH	100.0%	0.0%	0.0%	0.0%	12.0%
LITM	52.8%	48.2%	38.5%	42.1%	45.3%
NarrativeQA	67.2%	64.8%	58.3%	60.7%	62.1%

Performance Across Cache Budgets

Budget	NIAH Accuracy	LITM Accuracy
25%	98.0%	45.2%
50%	100.0%	52.8%
75%	100.0%	68.5%
100%	100.0%	72.3%

Training Details

Base Model: mistralai/Mistral-7B-v0.3
Training Data: NarrativeQA (narrative passages with QA)
Training Objective: ERT (Efficient Retrieval Training)

Loss = KL(logits_full || logits_evicted) + λ_align * alignment_loss

Hyperparameters:

Epochs: 1
Batch size: 1 (gradient accumulation: 1)
Learning rate: 1e-4
Precision: 4-bit quantization (NF4)
ERT budgets: [0.25, 0.5, 0.75]
Max sequence length: 256 tokens
Max training samples: 128

Hardware: Trained on consumer GPU (RTX 5070, 8GB VRAM)
Training Time: ~45 minutes per 500 steps

Model Architecture

This checkpoint contains:

ImportanceUpdateHead: RMSNorm-based importance predictor with hard-anchor forcing
Importance Embedding: Token-level importance embeddings
Lambda Parameter: Attention hook scaling factor (0.1)

Components:

{
  'importance_embedding': dict,  # Token importance embeddings
  'importance_head': dict,       # RMSNorm + projection layers
  'attn_hook_lambda': float      # Attention scaling (0.1)
}

Usage

Installation

git clone https://github.com/nitroxido/token-importance-scoring
cd token-importance-scoring
python -m venv .venv
source .venv/bin/activate
pip install -e .

Load Checkpoint

from token_importance.model.importance_head import ImportanceUpdateHead
import torch

# Load TIS components
checkpoint = torch.load('tis_components.pt', map_location='cuda')

# Extract components
importance_head_state = checkpoint['importance_head']
importance_embedding_state = checkpoint['importance_embedding']
lambda_value = checkpoint['attn_hook_lambda']

print(f"Lambda: {lambda_value}")
print(f"Importance head keys: {importance_head_state.keys()}")

Evaluate on NIAH Benchmark

python scripts/eval.py \
  --model oldman-dev/tis-stage3-ert \
  --baseline tis \
  --benchmark niah \
  --cache_budgets 0.5 \
  --n_samples 50 \
  --output results/niah_eval.csv

Evaluate on LITM Benchmark

python scripts/eval.py \
  --model oldman-dev/tis-stage3-ert \
  --baseline tis \
  --benchmark litm \
  --cache_budgets 0.5 \
  --n_samples 100 \
  --output results/litm_eval.csv

Intended Use

Primary Use Cases:

KV cache compression for long-context inference
Efficient retrieval-augmented generation
Memory-constrained LLM deployment

Limitations:

Trained on English narrative text (NarrativeQA)
Requires base model Mistral-7B-v0.3
Performance may vary on non-retrieval tasks

Citation

If you use this checkpoint, please cite:

@software{token_importance_scoring_2026,
  title={Token Importance Scoring: Learned KV Cache Compression for Long-Context LLMs},
  author={Token Importance Scoring Contributors},
  year={2026},
  url={https://github.com/nitroxido/token-importance-scoring}
}

License

MIT License - See LICENSE

Acknowledgments

Training compute sponsored by GPU-Action (A100-80GB for oracle training).
Consumer GPU validation performed on RTX 5070 (8GB VRAM).

More Information

Repository: https://github.com/nitroxido/token-importance-scoring
Documentation: See REPOSITORY-OVERVIEW.md and REPRODUCIBILITY-GUIDE.md in the repository
Related Checkpoints:
- tis-v8b-hard-anchor - Publication results with hard-anchor tuning
- tis-stage1-oracle - Oracle-labeled baseline

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oldman-dev/tis-stage3-ert

Base model

mistralai/Mistral-7B-v0.3

Finetuned

(353)

this model