Instructions to use oldman-dev/tis-stage1-oracle with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use oldman-dev/tis-stage1-oracle with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("oldman-dev/tis-stage1-oracle", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Token Importance Scoring (TIS) - Stage 1 Oracle
This checkpoint contains the oracle-labeled Token Importance Scoring (TIS) components trained with ground-truth importance labels from full-context model outputs.
Model Description
This is the Stage 1 Oracle checkpoint that demonstrates the theoretical performance ceiling for TIS. It uses oracle labels (ground truth from full-context runs) for training, providing a reference baseline for learned importance scoring.
Key Features:
- โ Oracle training with ground-truth labels
- โ 100% NIAH accuracy at all cache budgets (with oracle labels)
- โ Theoretical upper bound for TIS performance
- โ Useful for ablation studies and understanding TIS limits
Performance
Oracle Performance (with ground-truth labels):
- NIAH @ all budgets: 100% (by definition)
- Provides upper bound for learned methods
Note: This checkpoint is trained on oracle labels, so it represents the best possible performance achievable if importance scores were perfectly predicted.
Training Details
Base Model: mistralai/Mistral-7B-v0.3
Training Data: NarrativeQA with oracle importance labels
Training Stage: 1 (Supervised oracle training)
Hyperparameters:
- Epochs: 2
- Batch size: 4 (gradient accumulation: 8)
- Learning rate: 1e-4
- Precision: BFloat16
- LoRA: r=16, alpha=32
- Max sequence length: 2,048 tokens
- Weight alignment: 0.1
- Weight robustness: 0.0
Training Objective:
Loss = LM_loss + ฮป_align * alignment_loss
Where oracle labels are derived from full-context forward passes.
Model Architecture
This checkpoint contains:
- ImportanceUpdateHead: Supervised importance predictor
- Importance Embedding: Token-level importance embeddings
- Lambda Parameter: Attention hook scaling factor (0.1)
Components:
{
'importance_embedding': dict, # Token importance embeddings
'importance_head': dict, # Supervised predictor (6 keys)
'attn_hook_lambda': float # Attention scaling (0.1)
}
Usage
Installation
git clone https://github.com/nitroxido/token-importance-scoring
cd token-importance-scoring
python -m venv .venv
source .venv/bin/activate
pip install -e .
Load Checkpoint
from token_importance.model.importance_head import ImportanceUpdateHead
import torch
# Load TIS components
checkpoint = torch.load('tis_components.pt', map_location='cuda')
# Extract components
importance_head_state = checkpoint['importance_head']
importance_embedding_state = checkpoint['importance_embedding']
lambda_value = checkpoint['attn_hook_lambda']
print(f"Lambda: {lambda_value}")
print(f"Importance head keys: {importance_head_state.keys()}")
Evaluate Oracle Performance
# Note: Oracle evaluation requires running full-context passes to generate labels
python scripts/eval.py \
--model oldman-dev/tis-stage1-oracle \
--baseline tis \
--benchmark niah \
--cache_budgets 0.5 \
--n_samples 50 \
--output results/oracle_eval.csv
Intended Use
Primary Use Cases:
- Understanding theoretical performance ceiling for TIS
- Ablation studies comparing oracle vs. learned methods
- Research reference for importance scoring limits
Not Recommended For:
- Production deployment (use tis-stage3-ert instead)
- Real-world applications (requires oracle labels at inference time)
Limitations:
- Trained on oracle labels (not practical for real inference)
- Serves as research baseline, not production model
- Performance ceiling depends on oracle label quality
Comparison with Learned Methods
| Checkpoint | Training | NIAH @ 50% | LITM @ 50% | Practical? |
|---|---|---|---|---|
| tis-stage1-oracle (this) | Oracle labels | 100% (oracle) | - | โ Research only |
| tis-stage3-ert | ERT learned | 100% | 52.8% | โ Production |
| tis-v8b-hard-anchor | Hard-anchor | 68% | - | โ Production |
Key Insight: Stage 3 ERT achieves the oracle's 100% NIAH performance without requiring oracle labels, making it suitable for production use.
Citation
If you use this checkpoint, please cite:
@software{token_importance_scoring_2026,
title={Token Importance Scoring: Learned KV Cache Compression for Long-Context LLMs},
author={Token Importance Scoring Contributors},
year={2026},
url={https://github.com/nitroxido/token-importance-scoring}
}
License
MIT License - See LICENSE
Acknowledgments
Training compute sponsored by GPU-Action.
More Information
- Repository: https://github.com/nitroxido/token-importance-scoring
- Documentation: See REPOSITORY-OVERVIEW.md and REPRODUCIBILITY-GUIDE.md in the repository
- Related Checkpoints:
- tis-stage3-ert - Main production checkpoint (100% NIAH, no oracle needed)
- tis-v8b-hard-anchor - Hard-anchor tuned checkpoint
Model tree for oldman-dev/tis-stage1-oracle
Base model
mistralai/Mistral-7B-v0.3