Time-Anchor ModernBERT 32M

A compact time-series model built on a 32M-parameter ModernBERT backbone (33.4M parameters in total with the time-series input/output heads). One call returns three things:


📈 Quantile forecast	Probabilistic forecasts at any requested quantile levels
🧭 Variable / time impact	Normalized contribution of every input variable at every time step
📌 Anchor forecast	Forecasts conditioned on known future points you specify

Installation

pip install time-anchor

predict_time_anchor accepts a Hugging Face Hub model id or a local checkpoint directory. Source code for the package lives at K-Iwa/time-anchor.

Quantile Forecast

Hourly temperature in Tokyo (Open-Meteo archive), 64-hour holdout: median MAE 0.57 °C, with every actual value inside the q10–q90 band.

import pandas as pd
from time_anchor import predict_time_anchor

url = (
    "https://archive-api.open-meteo.com/v1/archive"
    "?latitude=35.69&longitude=139.69&start_date=2024-10-01&end_date=2024-12-31"
    "&hourly=temperature_2m,relative_humidity_2m,surface_pressure,wind_speed_10m&format=csv"
)
weather = pd.read_csv(url, skiprows=3)
temperature = weather.iloc[:, 1].astype("float32")

result = predict_time_anchor(
    "K-Iwa/time-anchor-modernbert-32m",
    target_context=temperature[:1440],
    prediction_length=64,
    quantile_levels=(0.1, 0.5, 0.9),
)
print(pd.DataFrame(result.forecast_rows))

Variable / Time-Step Impact

Per-hour contribution of each weather variable over a one-week Tokyo context. For each time_index, impact across all variables sums to 1.

result = predict_time_anchor(
    "K-Iwa/time-anchor-modernbert-32m",
    target_context=temperature[:168],
    explanatory_contexts=[weather.iloc[:168, i].astype("float32") for i in (2, 3, 4)],
    gaf={"enabled": True, "topk_time_steps": 0},
)
print(pd.DataFrame(result.variable_impact_rows))

Impact Validation

The impact scores were checked against synthetic data where the correct answer is known in advance.

Test 1 — do the shares match known mixing weights? The target is built as a weighted sum of three sine waves, and the same three waves are passed in as the explanatory series:

target(t) = w1*f1(t) + w2*f2(t) + w3*f3(t)

A wave with twice the weight contributes twice as much to the target, so its measured impact share should be about twice as large. That is what comes out:

True weights `w1 / w2 / w3`	Measured impact shares `f1 / f2 / f3`
0.60 / 0.30 / 0.10	0.55 / 0.33 / 0.12
0.10 / 0.30 / 0.60	0.26 / 0.30 / 0.44
0.33 / 0.33 / 0.33	0.33 / 0.29 / 0.39

The first two rows use the same three waves with the weights swapped, and the measured shares swap with them — the score reflects how much each series contributes, not which series it happens to be.

Test 2 — does the per-time-step impact follow changes over time? Here the target switches drivers mid-context: wave A alone drives the first half, wave B alone drives the second half. Each wave's per-time impact share is higher in the half it drives, and swapping A and B mirrors the result. One reading note: B's overall share is larger, and that is expected — impact explains the forecast, and the forecast continues from the end of the context, where B is the active driver.

Anchor Forecast

Monthly airline passengers (Box & Jenkins), 24-month holdout: pinning six known months cuts the median forecast MAE from 46 to 10 thousand passengers.

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv"
passengers = pd.read_csv(url)["Passengers"].astype("float32")

result = predict_time_anchor(
    "K-Iwa/time-anchor-modernbert-32m",
    target_context=passengers[:-24],
    prediction_length=24,
    anchor={
        "mode": "observed",
        "positions": [4, 8, 12, 16, 20, 24],
        "values": [396, 559, 405, 461, 606, 432],
    },
)
print(pd.DataFrame(result.forecast_rows))

positions are 1-based forecast horizon steps. Anchor forecasts use the target history only; disable anchors when passing explanatory_contexts.

CLI

time-anchor-infer --checkpoint K-Iwa/time-anchor-modernbert-32m \
    --input data.csv --target-column Target \
    --exogenous-columns Feature1,Feature2,Feature3

Writes output/forecast.csv, output/variable_impact.csv, and output/result.json. Add --no-impact for forecast-only runs, or --anchor-mode observed --anchor-positions 12,24 --anchor-values 0.2,0.4 for anchor forecasts.

Model Details


Backbone	ModernBERT encoder (10 layers, hidden 384, 6 heads)
Parameters	33.4M total = 31.9M ModernBERT backbone (the "32M" in the name) + 1.5M time-series heads (float32, safetensors)
Max context length	3,463 steps
Forecast horizon	64 steps
Trained quantiles	0.1 – 0.9 (nine levels)
Attribution	Generalized Attention Flow (GAF) via barrier-method max-flow
Anchor modes	`observed`, `auto`, `forward`, `inverted`, `sparse`, `hierarchical`, `hybrid`

References

Ansari et al. Chronos: Learning the Language of Time Series. TMLR 2024 (OpenReview) — informs the forecasting/tokenization design.
Azarkhalili and Libbrecht. Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow. 2025 — the variable/time impact method (GAF).

License

Apache-2.0.

Figures use historical weather data from Open-Meteo (CC BY 4.0) and the classic airline passengers dataset (Box & Jenkins, 1976).

Downloads last month: 502

Safetensors

Model size

33.4M params

Tensor type

F32

Inference Providers NEW

Time Series Forecasting

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for K-Iwa/time-anchor-modernbert-32m

Generalized Attention Flow: Feature Attribution for Transformer Models via Maximum Flow

Paper • 2502.15765 • Published Feb 14, 2025

Chronos: Learning the Language of Time Series

Paper • 2403.07815 • Published Mar 12, 2024 • 50