Token Importance Scoring (TIS) - Stage 1 Oracle

This checkpoint contains the oracle-labeled Token Importance Scoring (TIS) components trained with ground-truth importance labels from full-context model outputs.

Model Description

This is the Stage 1 Oracle checkpoint that demonstrates the theoretical performance ceiling for TIS. It uses oracle labels (ground truth from full-context runs) for training, providing a reference baseline for learned importance scoring.

Key Features:

  • โœ… Oracle training with ground-truth labels
  • โœ… 100% NIAH accuracy at all cache budgets (with oracle labels)
  • โœ… Theoretical upper bound for TIS performance
  • โœ… Useful for ablation studies and understanding TIS limits

Performance

Oracle Performance (with ground-truth labels):

  • NIAH @ all budgets: 100% (by definition)
  • Provides upper bound for learned methods

Note: This checkpoint is trained on oracle labels, so it represents the best possible performance achievable if importance scores were perfectly predicted.

Training Details

Base Model: mistralai/Mistral-7B-v0.3
Training Data: NarrativeQA with oracle importance labels
Training Stage: 1 (Supervised oracle training)

Hyperparameters:

  • Epochs: 2
  • Batch size: 4 (gradient accumulation: 8)
  • Learning rate: 1e-4
  • Precision: BFloat16
  • LoRA: r=16, alpha=32
  • Max sequence length: 2,048 tokens
  • Weight alignment: 0.1
  • Weight robustness: 0.0

Training Objective:

Loss = LM_loss + ฮป_align * alignment_loss

Where oracle labels are derived from full-context forward passes.

Model Architecture

This checkpoint contains:

  • ImportanceUpdateHead: Supervised importance predictor
  • Importance Embedding: Token-level importance embeddings
  • Lambda Parameter: Attention hook scaling factor (0.1)

Components:

{
  'importance_embedding': dict,  # Token importance embeddings
  'importance_head': dict,       # Supervised predictor (6 keys)
  'attn_hook_lambda': float      # Attention scaling (0.1)
}

Usage

Installation

git clone https://github.com/nitroxido/token-importance-scoring
cd token-importance-scoring
python -m venv .venv
source .venv/bin/activate
pip install -e .

Load Checkpoint

from token_importance.model.importance_head import ImportanceUpdateHead
import torch

# Load TIS components
checkpoint = torch.load('tis_components.pt', map_location='cuda')

# Extract components
importance_head_state = checkpoint['importance_head']
importance_embedding_state = checkpoint['importance_embedding']
lambda_value = checkpoint['attn_hook_lambda']

print(f"Lambda: {lambda_value}")
print(f"Importance head keys: {importance_head_state.keys()}")

Evaluate Oracle Performance

# Note: Oracle evaluation requires running full-context passes to generate labels
python scripts/eval.py \
  --model oldman-dev/tis-stage1-oracle \
  --baseline tis \
  --benchmark niah \
  --cache_budgets 0.5 \
  --n_samples 50 \
  --output results/oracle_eval.csv

Intended Use

Primary Use Cases:

  • Understanding theoretical performance ceiling for TIS
  • Ablation studies comparing oracle vs. learned methods
  • Research reference for importance scoring limits

Not Recommended For:

  • Production deployment (use tis-stage3-ert instead)
  • Real-world applications (requires oracle labels at inference time)

Limitations:

  • Trained on oracle labels (not practical for real inference)
  • Serves as research baseline, not production model
  • Performance ceiling depends on oracle label quality

Comparison with Learned Methods

Checkpoint Training NIAH @ 50% LITM @ 50% Practical?
tis-stage1-oracle (this) Oracle labels 100% (oracle) - โŒ Research only
tis-stage3-ert ERT learned 100% 52.8% โœ… Production
tis-v8b-hard-anchor Hard-anchor 68% - โœ… Production

Key Insight: Stage 3 ERT achieves the oracle's 100% NIAH performance without requiring oracle labels, making it suitable for production use.

Citation

If you use this checkpoint, please cite:

@software{token_importance_scoring_2026,
  title={Token Importance Scoring: Learned KV Cache Compression for Long-Context LLMs},
  author={Token Importance Scoring Contributors},
  year={2026},
  url={https://github.com/nitroxido/token-importance-scoring}
}

License

MIT License - See LICENSE

Acknowledgments

Training compute sponsored by GPU-Action.

More Information

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for oldman-dev/tis-stage1-oracle

Finetuned
(353)
this model