chronos-residual

A foundation-model forecast-residual maneuver detector for maneuver-detect. It forecasts a satellite's mean-element series with the pretrained amazon/chronos-bolt-small (the chronos backend), standardises the forecast residual, and flags the inter-elset gaps where the realised series departs from the forecast beyond a per-orbit-class threshold; the same vis-viva / Gauss physics inversion the classical detector uses then recovers the Δv magnitude and maneuver type for each detection (the model forecasts, the physics inverts).

How to use

The bundle is fetched from this repo automatically on first use — cached on disk, no weights at install time; inference is CPU-capable and uses a GPU when one is present. The forecaster needs the optional [foundation] extra:

# pip install "maneuver-detect[foundation]"
from maneuver_detect import detect, datasets

history = datasets.tle_history(norad_id=25544, start="2024-01-01")
maneuvers = detect(history, model="chronos-residual")
# DataFrame columns: epoch, confidence, type, delta_v_estimate, plus provenance

Model description

Recipe: forecast-residual thresholding — a pretrained time-series foundation model replaces the classical detector's hand-built quiet-dynamics prior with a learned conditional forecast; the standardised residual is thresholded per orbit class (the detectability floor in residual units), non-maximum-suppressed, and inverted for Δv/type.
Forecaster: amazon/chronos-bolt-small (chronos), revision main, rolling one-step context 64. Mode: zero-shot.
Licence: the forecaster checkpoint is Apache-2.0, confirmed at the pinned revision; a fine-tune inherits that licence.
Inference: CPU-capable; the forecaster is fetched from the Hub at runtime, not vendored.

Calibrated operating point

Per-orbit-class detection threshold in standardized-residual units:

Class	Residual-z threshold
LEO	3.00
MEO	3.00
GEO	3.00
IGSO	3.00
HEO	3.00

Training data

Calibrated and scored on the maneuver-detect labelled dataset v0.3.0 (astro-tools/maneuver-detect), versioned in lockstep with this bundle. The dataset is distributed recipe-first (operator labels + a pinned reconstruction recipe + a content-hash manifest; the raw Space-Track element history is never redistributed) and partitioned by the frozen, leak-free temporal-holdout splits — novel satellites scored in novel eras. Zero-shot uses no training data beyond the forecaster's own pretraining; a fine-tuned variant specialises the quiet-dynamics prior on the training split only.

Evaluation

Held-out test split — recall/precision at 1 false alarm(s)/satellite-year over the above-floor population (95% CI). Operating pt is the per-class confidence cut admitted within that false-alarm budget (in the detector's calibrated confidence units). Type acc is the share of above-floor true positives whose maneuver type is correct.

Class	Recall	Precision	Operating pt	Above-floor labels	Type acc
LEO	0.35	0.89	1.00	71	0.88
MEO	0.43	0.24	1.00	14	—
GEO	0.00	0.00	1.00	71	—
IGSO	1.00	0.75	0.96	3	—
HEO	—	—	—	0	—

This detector emits calibrated confidence: the raw score is mapped through a temperature (T = 1.000) fit on the val split only, so a confidence of p means about a fraction p of detections at that confidence are real. A split-conformal predictor (marginal coverage 90%) accompanies it for prediction sets. Per-orbit-class expected calibration error (ECE) of the calibrated confidence:

Class	ECE
LEO	0.687
MEO	0.743
GEO	0.516
IGSO	0.571
HEO	0.000

The per-class reliability diagrams and the calibrated per-class operating points are published in the benchmark documentation.

The benchmark scores precision/recall at a fixed false-alarm rate per orbit class over the above-floor population, with per-class type confusion, via the deterministic scorer. Performance is sharply data-quality-stratified: well-tracked modern satellites reach literature-level recall, while noisy historical series are bounded by the TLE detectability floor.

Intended use and limitations

Use: post-hoc detection of orbital maneuvers from public TLE history for space-situational- awareness research and as a reproducible benchmark baseline.
Not a maneuver predictor (it detects maneuvers that already happened), not real-time, and not an orbit-determination engine.
Detectability floor: maneuvers below the per-object TLE detectability floor are not reported; recall on noisy historical series is fundamentally limited by TLE data quality, not the model.
MEO/GEO labels are epoch-only (no Δv), so the Δv estimate is most meaningful on the Δv-labelled LEO core.

Provenance

Dataset version (lockstep): v0.3.0
Forecaster: amazon/chronos-bolt-small @ main

field	value
`calibration`	{'recall': 0.7238805970149254, 'by_threshold': {1.0: 0.6940298507462687, 1.5: 0.7089552238805971, 2.0: 0.7089552238805971, 2.5: 0.7089552238805971, 3.0: 0.7238805970149254, 3.5: 0.6567164179104478, 4.0: 0.6567164179104478, 4.5: 0.6567164179104478, 5.0: 0.6567164179104478, 6.0: 0.6343283582089553}}
`dataset_version`	0.3.0
`mode`	zero-shot

License

Detector artifacts (thresholds, fine-tune): MIT. The forecaster checkpoint: Apache-2.0 (its own terms). The dataset and authored artifacts are CC-BY-4.0; the raw Space-Track element history is never redistributed. See the repository for the full source terms, and CITATION.cff to cite.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

astro-tools
/

maneuver-detect-chronos-residual