TrueSea

WW3 point-forecast corrector (model output statistics).

LightGBM model that corrects NOAA WAVEWATCH III significant wave height at point outputs. Input: the model's own state at a station-hour (total Hs, 10 m wind, top-3 swell partitions as height/period/direction with directions encoded sin/cos, station lat/lon, cyclic month and hour). Partition heights are normalized so their square-root sum of squares equals total Hs, making the features consistent between hindcast PART files and operational bulletins. Output: corrected significant wave height.

Trained on 3.9M station-hours (181 NDBC stations, 2015-08 to 2019-05) pairing the multi_1 production hindcast point output with NDBC observed wave height. Early stopping uses a validation slice from the training period.

Evaluation

Protocol Test set raw WW3 RMSE MOS RMSE gain raw bias MOS bias
Time split (train 2015-16, val 2017) 2018-19, all stations 0.430 m 0.399 m +7% -0.104 m +0.002 m
Station split 25 held-out stations, all years 0.381 m 0.332 m +13% -0.113 m -0.002 m
Strict split held-out stations x 2018-19 0.442 m 0.424 m +4% -0.115 m -0.013 m
Live GFS-wave (June 2026, station 44025, lead < 72 h) 2,393 forecast hours 0.242 m 0.241 m 0% -0.075 m -0.074 m

Gains by observed quantile peak in the 50-99th percentile range (+8% to +17% on the station split). The live row applies the hindcast-trained model to operational GFS-wave bulletins with spectra-file winds; GFS-wave at this station and season runs near-unbiased, leaving little correction headroom in that window.

Files

  • mos_hs.safetensors โ€” the ensemble as flat node arrays (gbdt-flat-v1): feature (int32, -1 at leaves), threshold (f64, x <= t goes left), left/right (int32, tree-local child indices), value (f64 leaf values), tree_offset (int64 root of each tree). Feature names are in the file metadata. No LightGBM dependency needed:
import numpy as np
from safetensors.numpy import load_file

t = load_file("mos_hs.safetensors")

def predict(x):  # x: array of the 22 features in metadata order
    s = 0.0
    for off in t["tree_offset"]:
        n = int(off)
        while t["feature"][n] >= 0:
            child = t["left"][n] if x[t["feature"][n]] <= t["threshold"][n] else t["right"][n]
            n = int(off) + int(child)
        s += t["value"][n]
    return s
  • mos_hs.txt โ€” the same model in native LightGBM format: lightgbm.Booster(model_file="mos_hs.txt").

Both files encode identical trees (verified bit-exact at conversion).

Feature order: m_hs, m_wspd, wdir_sin, wdir_cos, p1_hs, p1_tp, p1_dir_sin, p1_dir_cos, p2_hs, p2_tp, p2_dir_sin, p2_dir_cos, p3_hs, p3_tp, p3_dir_sin, p3_dir_cos, lat, lon, month_sin, month_cos, hour_sin, hour_cos (partition heights normalized as above).

Training data: https://huggingface.co/datasets/phanerozoic/ww3-ndbc-pairs Raw archive: https://huggingface.co/datasets/phanerozoic/noaa-ww3-multi1-points

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support