chronos-residual
A foundation-model forecast-residual maneuver detector for
maneuver-detect. It forecasts a satellite's
mean-element series with the pretrained amazon/chronos-bolt-small (the chronos backend),
standardises the forecast residual, and flags the inter-elset gaps where the realised series departs
from the forecast beyond a per-orbit-class threshold; the same vis-viva / Gauss physics inversion
the
classical detector uses then recovers the ฮv magnitude and maneuver type for each detection (the
model forecasts, the physics inverts).
How to use
The bundle is fetched from this repo automatically on first use โ cached on disk, no weights at
install time; inference is CPU-capable and uses a GPU when one is present. The forecaster needs the
optional [foundation] extra:
# pip install "maneuver-detect[foundation]"
from maneuver_detect import detect, datasets
history = datasets.tle_history(norad_id=25544, start="2024-01-01")
maneuvers = detect(history, model="chronos-residual")
# DataFrame columns: epoch, confidence, type, delta_v_estimate, plus provenance
Model description
- Recipe: forecast-residual thresholding โ a pretrained time-series foundation model replaces the classical detector's hand-built quiet-dynamics prior with a learned conditional forecast; the standardised residual is thresholded per orbit class (the detectability floor in residual units), non-maximum-suppressed, and inverted for ฮv/type.
- Forecaster:
amazon/chronos-bolt-small(chronos), revisionmain, rolling one-step context 64. Mode: zero-shot. - Licence: the forecaster checkpoint is Apache-2.0, confirmed at the pinned revision; a fine-tune inherits that licence.
- Inference: CPU-capable; the forecaster is fetched from the Hub at runtime, not vendored.
Calibrated operating point
Per-orbit-class detection threshold in standardized-residual units:
| Class | Residual-z threshold |
|---|---|
| LEO | 3.00 |
| MEO | 3.00 |
| GEO | 3.00 |
| IGSO | 3.00 |
| HEO | 3.00 |
Training data
Calibrated and scored on the maneuver-detect labelled dataset v0.3.0
(astro-tools/maneuver-detect), versioned in lockstep
with this bundle. The dataset is distributed recipe-first (operator labels + a pinned reconstruction
recipe + a content-hash manifest; the raw Space-Track element history is never redistributed) and
partitioned by the frozen, leak-free temporal-holdout splits โ novel satellites scored in novel
eras.
Zero-shot uses no training data beyond the forecaster's own pretraining; a fine-tuned variant
specialises the quiet-dynamics prior on the training split only.
Evaluation
Held-out test split โ recall/precision at 1 false alarm(s)/satellite-year over the above-floor population (95% CI). Operating pt is the per-class confidence cut admitted within that false-alarm budget (in the detector's calibrated confidence units). Type acc is the share of above-floor true positives whose maneuver type is correct.
| Class | Recall | Precision | Operating pt | Above-floor labels | Type acc |
|---|---|---|---|---|---|
| LEO | 0.35 | 0.89 | 1.00 | 71 | 0.88 |
| MEO | 0.43 | 0.24 | 1.00 | 14 | โ |
| GEO | 0.00 | 0.00 | 1.00 | 71 | โ |
| IGSO | 1.00 | 0.75 | 0.96 | 3 | โ |
| HEO | โ | โ | โ | 0 | โ |
This detector emits calibrated confidence: the raw score is mapped through a temperature (T = 1.000) fit on the val split only, so a confidence of p means about a fraction p of detections at that confidence are real. A split-conformal predictor (marginal coverage 90%) accompanies it for prediction sets. Per-orbit-class expected calibration error (ECE) of the calibrated confidence:
| Class | ECE |
|---|---|
| LEO | 0.687 |
| MEO | 0.743 |
| GEO | 0.516 |
| IGSO | 0.571 |
| HEO | 0.000 |
The per-class reliability diagrams and the calibrated per-class operating points are published in the benchmark documentation.
The benchmark scores precision/recall at a fixed false-alarm rate per orbit class over the above-floor population, with per-class type confusion, via the deterministic scorer. Performance is sharply data-quality-stratified: well-tracked modern satellites reach literature-level recall, while noisy historical series are bounded by the TLE detectability floor.
Intended use and limitations
- Use: post-hoc detection of orbital maneuvers from public TLE history for space-situational- awareness research and as a reproducible benchmark baseline.
- Not a maneuver predictor (it detects maneuvers that already happened), not real-time, and not an orbit-determination engine.
- Detectability floor: maneuvers below the per-object TLE detectability floor are not reported; recall on noisy historical series is fundamentally limited by TLE data quality, not the model.
- MEO/GEO labels are epoch-only (no ฮv), so the ฮv estimate is most meaningful on the ฮv-labelled LEO core.
Provenance
- Dataset version (lockstep): v0.3.0
- Forecaster: amazon/chronos-bolt-small @ main
| field | value |
|---|---|
calibration |
{'recall': 0.7238805970149254, 'by_threshold': {1.0: 0.6940298507462687, 1.5: 0.7089552238805971, 2.0: 0.7089552238805971, 2.5: 0.7089552238805971, 3.0: 0.7238805970149254, 3.5: 0.6567164179104478, 4.0: 0.6567164179104478, 4.5: 0.6567164179104478, 5.0: 0.6567164179104478, 6.0: 0.6343283582089553}} |
dataset_version |
0.3.0 |
mode |
zero-shot |
License
Detector artifacts (thresholds, fine-tune): MIT. The forecaster checkpoint: Apache-2.0 (its
own terms). The dataset and authored artifacts are CC-BY-4.0; the raw Space-Track element
history
is never redistributed. See the
repository for the full source terms, and
CITATION.cff to cite.