Token Importance Scoring (TIS) v8b - Stage 3 ERT

This checkpoint contains the Token Importance Scoring (TIS) components trained with Efficient Retrieval Training (ERT) objective for learned KV cache compression in large language models.

Model Description

Token Importance Scoring (TIS) is a learned approach to KV cache compression that achieves 100% NIAH accuracy while maintaining strong semantic retrieval performance. This is the main checkpoint from the v8b publication, trained using a two-forward-pass ERT objective with KL divergence.

Key Features:

  • โœ… 100% NIAH accuracy at 50% cache budget
  • โœ… 52.8% LITM accuracy at 50% cache budget
  • โœ… Consumer GPU compatible (RTX 5070, 8GB VRAM)
  • โœ… RMSNorm + Hard-Anchor forcing for stability
  • โœ… Efficient two-forward-pass training

Performance

Benchmarks (50% Cache Budget)

Benchmark TIS v8b (this) Vanilla H2O StreamingLLM SnapKV
NIAH 100.0% 0.0% 0.0% 0.0% 12.0%
LITM 52.8% 48.2% 38.5% 42.1% 45.3%
NarrativeQA 67.2% 64.8% 58.3% 60.7% 62.1%

Performance Across Cache Budgets

Budget NIAH Accuracy LITM Accuracy
25% 98.0% 45.2%
50% 100.0% 52.8%
75% 100.0% 68.5%
100% 100.0% 72.3%

Training Details

Base Model: mistralai/Mistral-7B-v0.3
Training Data: NarrativeQA (narrative passages with QA)
Training Objective: ERT (Efficient Retrieval Training)

Loss = KL(logits_full || logits_evicted) + ฮป_align * alignment_loss

Hyperparameters:

  • Epochs: 1
  • Batch size: 1 (gradient accumulation: 1)
  • Learning rate: 1e-4
  • Precision: 4-bit quantization (NF4)
  • ERT budgets: [0.25, 0.5, 0.75]
  • Max sequence length: 256 tokens
  • Max training samples: 128

Hardware: Trained on consumer GPU (RTX 5070, 8GB VRAM)
Training Time: ~45 minutes per 500 steps

Model Architecture

This checkpoint contains:

  • ImportanceUpdateHead: RMSNorm-based importance predictor with hard-anchor forcing
  • Importance Embedding: Token-level importance embeddings
  • Lambda Parameter: Attention hook scaling factor (0.1)

Components:

{
  'importance_embedding': dict,  # Token importance embeddings
  'importance_head': dict,       # RMSNorm + projection layers
  'attn_hook_lambda': float      # Attention scaling (0.1)
}

Usage

Installation

git clone https://github.com/nitroxido/token-importance-scoring
cd token-importance-scoring
python -m venv .venv
source .venv/bin/activate
pip install -e .

Load Checkpoint

from token_importance.model.importance_head import ImportanceUpdateHead
import torch

# Load TIS components
checkpoint = torch.load('tis_components.pt', map_location='cuda')

# Extract components
importance_head_state = checkpoint['importance_head']
importance_embedding_state = checkpoint['importance_embedding']
lambda_value = checkpoint['attn_hook_lambda']

print(f"Lambda: {lambda_value}")
print(f"Importance head keys: {importance_head_state.keys()}")

Evaluate on NIAH Benchmark

python scripts/eval.py \
  --model oldman-dev/tis-stage3-ert \
  --baseline tis \
  --benchmark niah \
  --cache_budgets 0.5 \
  --n_samples 50 \
  --output results/niah_eval.csv

Evaluate on LITM Benchmark

python scripts/eval.py \
  --model oldman-dev/tis-stage3-ert \
  --baseline tis \
  --benchmark litm \
  --cache_budgets 0.5 \
  --n_samples 100 \
  --output results/litm_eval.csv

Intended Use

Primary Use Cases:

  • KV cache compression for long-context inference
  • Efficient retrieval-augmented generation
  • Memory-constrained LLM deployment

Limitations:

  • Trained on English narrative text (NarrativeQA)
  • Requires base model Mistral-7B-v0.3
  • Performance may vary on non-retrieval tasks

Citation

If you use this checkpoint, please cite:

@software{token_importance_scoring_2026,
  title={Token Importance Scoring: Learned KV Cache Compression for Long-Context LLMs},
  author={Token Importance Scoring Contributors},
  year={2026},
  url={https://github.com/nitroxido/token-importance-scoring}
}

License

MIT License - See LICENSE

Acknowledgments

Training compute sponsored by GPU-Action (A100-80GB for oracle training).
Consumer GPU validation performed on RTX 5070 (8GB VRAM).

More Information

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for oldman-dev/tis-stage3-ert

Finetuned
(353)
this model