Time-Anchor ModernBERT 32M

A compact time-series model built on a 32M-parameter ModernBERT backbone (33.4M parameters in total with the time-series input/output heads). One call returns three things:

πŸ“ˆ Quantile forecast Probabilistic forecasts at any requested quantile levels
🧭 Variable / time impact Normalized contribution of every input variable at every time step
πŸ“Œ Anchor forecast Forecasts conditioned on known future points you specify

Installation

pip install time-anchor

predict_time_anchor accepts a Hugging Face Hub model id or a local checkpoint directory. Source code for the package lives at K-Iwa/time-anchor.

Quantile Forecast

Quantile forecast

Hourly temperature in Tokyo (Open-Meteo archive), 64-hour holdout: median MAE 0.57 Β°C, with every actual value inside the q10–q90 band.

import pandas as pd
from time_anchor import predict_time_anchor

url = (
    "https://archive-api.open-meteo.com/v1/archive"
    "?latitude=35.69&longitude=139.69&start_date=2024-10-01&end_date=2024-12-31"
    "&hourly=temperature_2m,relative_humidity_2m,surface_pressure,wind_speed_10m&format=csv"
)
weather = pd.read_csv(url, skiprows=3)
temperature = weather.iloc[:, 1].astype("float32")

result = predict_time_anchor(
    "K-Iwa/time-anchor-modernbert-32m",
    target_context=temperature[:1440],
    prediction_length=64,
    quantile_levels=(0.1, 0.5, 0.9),
)
print(pd.DataFrame(result.forecast_rows))

Variable / Time-Step Impact

Normalized variable impact by time step

Per-hour contribution of each weather variable over a one-week Tokyo context. For each time_index, impact across all variables sums to 1.

result = predict_time_anchor(
    "K-Iwa/time-anchor-modernbert-32m",
    target_context=temperature[:168],
    explanatory_contexts=[weather.iloc[:168, i].astype("float32") for i in (2, 3, 4)],
    gaf={"enabled": True, "topk_time_steps": 0},
)
print(pd.DataFrame(result.variable_impact_rows))

Impact Validation

The impact scores were checked against synthetic data where the correct answer is known in advance.

Test 1 β€” do the shares match known mixing weights? The target is built as a weighted sum of three sine waves, and the same three waves are passed in as the explanatory series:

target(t) = w1*f1(t) + w2*f2(t) + w3*f3(t)

A wave with twice the weight contributes twice as much to the target, so its measured impact share should be about twice as large. That is what comes out:

True weights w1 / w2 / w3 Measured impact shares f1 / f2 / f3
0.60 / 0.30 / 0.10 0.55 / 0.33 / 0.12
0.10 / 0.30 / 0.60 0.26 / 0.30 / 0.44
0.33 / 0.33 / 0.33 0.33 / 0.29 / 0.39

The first two rows use the same three waves with the weights swapped, and the measured shares swap with them β€” the score reflects how much each series contributes, not which series it happens to be.

Test 2 β€” does the per-time-step impact follow changes over time? Here the target switches drivers mid-context: wave A alone drives the first half, wave B alone drives the second half. Each wave's per-time impact share is higher in the half it drives, and swapping A and B mirrors the result. One reading note: B's overall share is larger, and that is expected β€” impact explains the forecast, and the forecast continues from the end of the context, where B is the active driver.

Anchor Forecast

Forecast with user-specified anchors

Monthly airline passengers (Box & Jenkins), 24-month holdout: pinning six known months cuts the median forecast MAE from 46 to 10 thousand passengers.

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
passengers = pd.read_csv(url)["Passengers"].astype("float32")

result = predict_time_anchor(
    "K-Iwa/time-anchor-modernbert-32m",
    target_context=passengers[:-24],
    prediction_length=24,
    anchor={
        "mode": "observed",
        "positions": [4, 8, 12, 16, 20, 24],
        "values": [396, 559, 405, 461, 606, 432],
    },
)
print(pd.DataFrame(result.forecast_rows))

positions are 1-based forecast horizon steps. Anchor forecasts use the target history only; disable anchors when passing explanatory_contexts.

CLI

time-anchor-infer --checkpoint K-Iwa/time-anchor-modernbert-32m \
    --input data.csv --target-column Target \
    --exogenous-columns Feature1,Feature2,Feature3

Writes output/forecast.csv, output/variable_impact.csv, and output/result.json. Add --no-impact for forecast-only runs, or --anchor-mode observed --anchor-positions 12,24 --anchor-values 0.2,0.4 for anchor forecasts.

Model Details

Backbone ModernBERT encoder (10 layers, hidden 384, 6 heads)
Parameters 33.4M total = 31.9M ModernBERT backbone (the "32M" in the name) + 1.5M time-series heads (float32, safetensors)
Max context length 3,463 steps
Forecast horizon 64 steps
Trained quantiles 0.1 – 0.9 (nine levels)
Attribution Generalized Attention Flow (GAF) via barrier-method max-flow
Anchor modes observed, auto, forward, inverted, sparse, hierarchical, hybrid

References

License

Apache-2.0.

Figures use historical weather data from Open-Meteo (CC BY 4.0) and the classic airline passengers dataset (Box & Jenkins, 1976).

Downloads last month
502
Safetensors
Model size
33.4M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for K-Iwa/time-anchor-modernbert-32m