ContextCrumb-32M

ContextCrumb-32M is a 32M parameter token-classification model for deletion-only context compression. It predicts whether each input token should be kept or deleted so text can be shortened before being sent to LLMs or agents.

This repository is private while packaging and documentation are being stabilized.

Labels

DELETE
KEEP

Usage

Recommended usage is through the contextcrumb Python package:

from contextcrumb import ContextCompressor

compressor = ContextCompressor()
result = compressor.compress(
    "ContextCrumb deletes low-value words while preserving useful context."
)
print(result.text)

The package loads the ONNX artifacts in onnx/ by default, so users do not need PyTorch or Transformers for normal inference. The original model.safetensors checkpoint remains available for Torch/Transformers workflows.

Golden adaptive cutoff mode is the default:

result = compressor.compress(text)
print(result.text)
print(result.stats["golden_cutoff"])

Golden mode keeps at least one third of word-like tokens by default, so an extreme probability gap does not delete nearly all context. Use target_keep_ratio for an explicit lower fixed budget.

Raw Transformers loading also works:

from transformers import AutoModelForTokenClassification, AutoTokenizer

model_id = "ymao20/contextcrumb-32m"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id)

Intended Use

Use this model for experimental context compression, prompt shortening, and agent memory preprocessing. Review outputs before using it in high-stakes settings because deletion can remove important nuance.

Base Model

Fine-tuned from jhu-clsp/ettin-encoder-32m.

Downloads last month: 57

Safetensors

Model size

32M params

Tensor type

F32

Model tree for ymao20/contextcrumb-32m

Base model

jhu-clsp/ettin-encoder-32m

Finetuned

(21)

this model