Instructions to use oldman-dev/tis-stage3-ert with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use oldman-dev/tis-stage3-ert with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("oldman-dev/tis-stage3-ert", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Token Importance Scoring (TIS) v8b - Stage 3 ERT
This checkpoint contains the Token Importance Scoring (TIS) components trained with Efficient Retrieval Training (ERT) objective for learned KV cache compression in large language models.
Model Description
Token Importance Scoring (TIS) is a learned approach to KV cache compression that achieves 100% NIAH accuracy while maintaining strong semantic retrieval performance. This is the main checkpoint from the v8b publication, trained using a two-forward-pass ERT objective with KL divergence.
Key Features:
- โ 100% NIAH accuracy at 50% cache budget
- โ 52.8% LITM accuracy at 50% cache budget
- โ Consumer GPU compatible (RTX 5070, 8GB VRAM)
- โ RMSNorm + Hard-Anchor forcing for stability
- โ Efficient two-forward-pass training
Performance
Benchmarks (50% Cache Budget)
| Benchmark | TIS v8b (this) | Vanilla | H2O | StreamingLLM | SnapKV |
|---|---|---|---|---|---|
| NIAH | 100.0% | 0.0% | 0.0% | 0.0% | 12.0% |
| LITM | 52.8% | 48.2% | 38.5% | 42.1% | 45.3% |
| NarrativeQA | 67.2% | 64.8% | 58.3% | 60.7% | 62.1% |
Performance Across Cache Budgets
| Budget | NIAH Accuracy | LITM Accuracy |
|---|---|---|
| 25% | 98.0% | 45.2% |
| 50% | 100.0% | 52.8% |
| 75% | 100.0% | 68.5% |
| 100% | 100.0% | 72.3% |
Training Details
Base Model: mistralai/Mistral-7B-v0.3
Training Data: NarrativeQA (narrative passages with QA)
Training Objective: ERT (Efficient Retrieval Training)
Loss = KL(logits_full || logits_evicted) + ฮป_align * alignment_loss
Hyperparameters:
- Epochs: 1
- Batch size: 1 (gradient accumulation: 1)
- Learning rate: 1e-4
- Precision: 4-bit quantization (NF4)
- ERT budgets: [0.25, 0.5, 0.75]
- Max sequence length: 256 tokens
- Max training samples: 128
Hardware: Trained on consumer GPU (RTX 5070, 8GB VRAM)
Training Time: ~45 minutes per 500 steps
Model Architecture
This checkpoint contains:
- ImportanceUpdateHead: RMSNorm-based importance predictor with hard-anchor forcing
- Importance Embedding: Token-level importance embeddings
- Lambda Parameter: Attention hook scaling factor (0.1)
Components:
{
'importance_embedding': dict, # Token importance embeddings
'importance_head': dict, # RMSNorm + projection layers
'attn_hook_lambda': float # Attention scaling (0.1)
}
Usage
Installation
git clone https://github.com/nitroxido/token-importance-scoring
cd token-importance-scoring
python -m venv .venv
source .venv/bin/activate
pip install -e .
Load Checkpoint
from token_importance.model.importance_head import ImportanceUpdateHead
import torch
# Load TIS components
checkpoint = torch.load('tis_components.pt', map_location='cuda')
# Extract components
importance_head_state = checkpoint['importance_head']
importance_embedding_state = checkpoint['importance_embedding']
lambda_value = checkpoint['attn_hook_lambda']
print(f"Lambda: {lambda_value}")
print(f"Importance head keys: {importance_head_state.keys()}")
Evaluate on NIAH Benchmark
python scripts/eval.py \
--model oldman-dev/tis-stage3-ert \
--baseline tis \
--benchmark niah \
--cache_budgets 0.5 \
--n_samples 50 \
--output results/niah_eval.csv
Evaluate on LITM Benchmark
python scripts/eval.py \
--model oldman-dev/tis-stage3-ert \
--baseline tis \
--benchmark litm \
--cache_budgets 0.5 \
--n_samples 100 \
--output results/litm_eval.csv
Intended Use
Primary Use Cases:
- KV cache compression for long-context inference
- Efficient retrieval-augmented generation
- Memory-constrained LLM deployment
Limitations:
- Trained on English narrative text (NarrativeQA)
- Requires base model Mistral-7B-v0.3
- Performance may vary on non-retrieval tasks
Citation
If you use this checkpoint, please cite:
@software{token_importance_scoring_2026,
title={Token Importance Scoring: Learned KV Cache Compression for Long-Context LLMs},
author={Token Importance Scoring Contributors},
year={2026},
url={https://github.com/nitroxido/token-importance-scoring}
}
License
MIT License - See LICENSE
Acknowledgments
Training compute sponsored by GPU-Action (A100-80GB for oracle training).
Consumer GPU validation performed on RTX 5070 (8GB VRAM).
More Information
- Repository: https://github.com/nitroxido/token-importance-scoring
- Documentation: See REPOSITORY-OVERVIEW.md and REPRODUCIBILITY-GUIDE.md in the repository
- Related Checkpoints:
- tis-v8b-hard-anchor - Publication results with hard-anchor tuning
- tis-stage1-oracle - Oracle-labeled baseline
Model tree for oldman-dev/tis-stage3-ert
Base model
mistralai/Mistral-7B-v0.3