Token Importance Scoring (TIS) - Stage 1 Oracle

This checkpoint contains the oracle-labeled Token Importance Scoring (TIS) components trained with ground-truth importance labels from full-context model outputs.

Model Description

This is the Stage 1 Oracle checkpoint that demonstrates the theoretical performance ceiling for TIS. It uses oracle labels (ground truth from full-context runs) for training, providing a reference baseline for learned importance scoring.

Key Features:

✅ Oracle training with ground-truth labels
✅ 100% NIAH accuracy at all cache budgets (with oracle labels)
✅ Theoretical upper bound for TIS performance
✅ Useful for ablation studies and understanding TIS limits

Performance

Oracle Performance (with ground-truth labels):

NIAH @ all budgets: 100% (by definition)
Provides upper bound for learned methods

Note: This checkpoint is trained on oracle labels, so it represents the best possible performance achievable if importance scores were perfectly predicted.

Training Details

Base Model: mistralai/Mistral-7B-v0.3
Training Data: NarrativeQA with oracle importance labels
Training Stage: 1 (Supervised oracle training)

Hyperparameters:

Epochs: 2
Batch size: 4 (gradient accumulation: 8)
Learning rate: 1e-4
Precision: BFloat16
LoRA: r=16, alpha=32
Max sequence length: 2,048 tokens
Weight alignment: 0.1
Weight robustness: 0.0

Training Objective:

Loss = LM_loss + λ_align * alignment_loss

Where oracle labels are derived from full-context forward passes.

Model Architecture

This checkpoint contains:

ImportanceUpdateHead: Supervised importance predictor
Importance Embedding: Token-level importance embeddings
Lambda Parameter: Attention hook scaling factor (0.1)

Components:

{
  'importance_embedding': dict,  # Token importance embeddings
  'importance_head': dict,       # Supervised predictor (6 keys)
  'attn_hook_lambda': float      # Attention scaling (0.1)
}

Usage

Installation

git clone https://github.com/nitroxido/token-importance-scoring
cd token-importance-scoring
python -m venv .venv
source .venv/bin/activate
pip install -e .

Load Checkpoint

from token_importance.model.importance_head import ImportanceUpdateHead
import torch

# Load TIS components
checkpoint = torch.load('tis_components.pt', map_location='cuda')

# Extract components
importance_head_state = checkpoint['importance_head']
importance_embedding_state = checkpoint['importance_embedding']
lambda_value = checkpoint['attn_hook_lambda']

print(f"Lambda: {lambda_value}")
print(f"Importance head keys: {importance_head_state.keys()}")

Evaluate Oracle Performance

# Note: Oracle evaluation requires running full-context passes to generate labels
python scripts/eval.py \
  --model oldman-dev/tis-stage1-oracle \
  --baseline tis \
  --benchmark niah \
  --cache_budgets 0.5 \
  --n_samples 50 \
  --output results/oracle_eval.csv

Intended Use

Primary Use Cases:

Understanding theoretical performance ceiling for TIS
Ablation studies comparing oracle vs. learned methods
Research reference for importance scoring limits

Not Recommended For:

Production deployment (use tis-stage3-ert instead)
Real-world applications (requires oracle labels at inference time)

Limitations:

Trained on oracle labels (not practical for real inference)
Serves as research baseline, not production model
Performance ceiling depends on oracle label quality

Comparison with Learned Methods

Checkpoint	Training	NIAH @ 50%	LITM @ 50%	Practical?
tis-stage1-oracle (this)	Oracle labels	100% (oracle)	-	❌ Research only
tis-stage3-ert	ERT learned	100%	52.8%	✅ Production
tis-v8b-hard-anchor	Hard-anchor	68%	-	✅ Production

Key Insight: Stage 3 ERT achieves the oracle's 100% NIAH performance without requiring oracle labels, making it suitable for production use.

Citation

If you use this checkpoint, please cite:

@software{token_importance_scoring_2026,
  title={Token Importance Scoring: Learned KV Cache Compression for Long-Context LLMs},
  author={Token Importance Scoring Contributors},
  year={2026},
  url={https://github.com/nitroxido/token-importance-scoring}
}

License

MIT License - See LICENSE

Acknowledgments

Training compute sponsored by GPU-Action.

More Information

Repository: https://github.com/nitroxido/token-importance-scoring
Documentation: See REPOSITORY-OVERVIEW.md and REPRODUCIBILITY-GUIDE.md in the repository
Related Checkpoints:
- tis-stage3-ert - Main production checkpoint (100% NIAH, no oracle needed)
- tis-v8b-hard-anchor - Hard-anchor tuned checkpoint

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oldman-dev/tis-stage1-oracle

Base model

mistralai/Mistral-7B-v0.3

Finetuned

(353)

this model