FuXi-2.1
FuXi-2.1 is a global, deterministic machine-learning weather forecasting model developed by Fudan University & SAIS. It produces global forecasts at 0.25° resolution, on 6-hourly steps, out to 10 days.
FuXi-2.1 targets the defining failure mode of data-driven weather prediction: forecasts that blur into a smooth spatial average as lead time grows, erasing the small-scale structure that matters most for extremes. FuXi-2.1 produces markedly sharper fields whose spatial power spectra track observations across the full wavenumber range, while keeping deterministic skill (RMSE) comparable to FuXi-1.0 — and substantially improving extreme-event detection for heavy precipitation and strong wind.
This model is released as part of the FuXi Single collection.
Table of contents
What's new in 2.1
Relative to FuXi-1.0, FuXi-2.1 introduces:
- A flat Transformer backbone replacing FuXi-1.0's U-Transformer (ResNet downsample → Swin Transformer → upsample). FuXi-2.1 drops the U-shaped up/down-sampling in favour of a single full-resolution Transformer trunk.
- Rotary position embeddings (RoPE) inside the Swin windowed attention, replacing the learned relative-position bias used in FuXi-1.0.
- adaLN time conditioning that injects time-period information — forecast lead step, time-of-day and day-of-year phase — into every block, inspired by diffusion transformers in image generation.
- A variable-aware multi-head decoder that gives pressure-level, surface and derived variables their own specialised output heads.
The combined effect is sharper, spectrally faithful forecasts with no penalty on mean-error skill.
Quickstart
This repository ships the exported model (fuxi-2.1.pt2), normalization statistics
(mean.nc, std.nc), a sample pre-normalized input (input.nc), and minimal inference
code.
# 1. Install dependencies
pip install -r requirements.txt
# 2. End-to-end demo (inference + plots)
bash run.sh --model_dir . --input input.nc --steps 5
# Or run inference directly (40 steps = 10-day forecast):
python inference.py \
--model_dir . \
--input input.nc \
--output_dir ./output \
--steps 40 \
--forecast_time 2024092900
# 3. Plot selected channels
python plot.py --output_dir ./output --channels t2m z500 tp --discrete
Input — a NetCDF with a variable input of shape (time=2, channel=85, lat=721, lon=1440), z-score normalized, coordinates lat 90→−90 and lon 0→359.75. The provided
input.nc is a sample for 2024-09-29 00Z.
Output — each step saved as {output_dir}/{step:03d}.nc, shape (channel=85, lat=721, lon=1440) in physical units (denormalized), with a valid_time attribute. Steps are
1-based: 001.nc = +6 h, … 040.nc = +240 h (10 days).
GPU: the device is baked into the exported graph; load on CUDA. ~8 GB GPU memory is enough (model ~4 GB + recurrent state ~1.4 GB + working memory). Tested on A100, V100, RTX 3090/4090. See
variables.pyfor the full ordered channel list.
Model overview
Model description
FuXi-2.1 is a single Transformer. The global atmospheric state is split into patches and embedded into tokens, processed by a stack of windowed-attention blocks, and read out by a variable-aware multi-head decoder. The model is deterministic — one forward pass per step, with no adversarial or diffusion sampling at inference — and is rolled out autoregressively at 6-hourly steps.
- Developed by: Fudan University & SAIS
- Model type: Transformer (patch-embed → Swin attention with RoPE + adaLN → multi-head decoder)
- Forecast type: Global, deterministic, autoregressive
- License: CC BY 4.0
- Predecessor: FuXi-1.0
Architecture details
| Component | Specification |
|---|---|
| Backbone | Single Transformer trunk (no U-Net up/down-sampling) |
| Attention | Swin windowed attention |
| Position encoding | Rotary (RoPE, 1-D) |
| Normalisation / conditioning | adaLN, conditioned on lead step, time-of-day, day-of-year |
| Feed-forward | SwiGLU |
| Decoder | Variable-aware multi-head (pressure / surface / derived) |
| Input frames | 2 (states at t−6h and t₀) |
| Output | State at t+6h, rolled out autoregressively |
Model resolution
| Model | Horizontal resolution | Vertical resolution [pressure levels] (hPa) |
|---|---|---|
| FuXi-2.1 | 0.25° (721×1440) | 13: 50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000 |
Data details
Training data
FuXi-2.1 is trained and evaluated on ERA5 reanalysis at 0.25° resolution, 6-hourly.
- Training period: 2002–2023
- Test period: 2024 (held out)
Data parameters
FuXi-2.1 operates on 85 channels per time step: 65 pressure-level channels (5 variables × 13 levels) and 20 surface channels, plus static forcings supplied as constant inputs. Most channels are prognostic — the same channels are input and output and fed back during roll-out. Radiation fluxes and total precipitation are diagnostic outputs produced through a dedicated decoder head (they are predicted but not fed back as inputs).
Channel order (exact): the 65 pressure-level channels first (z, then t, u, v, q, each
over the 13 levels 50→1000 hPa), followed by the 20 surface channels:
msl, t2m, d2m, sst, ws10m, ws100m, u10m, v10m, u100m, v100m, lcc, mcc, hcc, tcc, ssr, ssrd, fdir, ttr, tcw, tp.
Pressure-level parameters (13 levels: 50–1000 hPa)
| Short name | Name | Units | Input/Output |
|---|---|---|---|
| z | Geopotential | m²·s⁻² | Both (prognostic) |
| t | Temperature | K | Both (prognostic) |
| u | Eastward wind | m·s⁻¹ | Both (prognostic) |
| v | Northward wind | m·s⁻¹ | Both (prognostic) |
| q | Specific humidity | kg·kg⁻¹ | Both (prognostic) |
Surface parameters (20)
| Short name | Name | Units | Input/Output |
|---|---|---|---|
| msl | Mean sea-level pressure | Pa | Both (prognostic) |
| t2m | 2 m temperature | K | Both (prognostic) |
| d2m | 2 m dewpoint temperature | K | Both (prognostic) |
| sst | Sea-surface temperature | K | Both (prognostic) |
| ws10m | 10 m wind speed | m·s⁻¹ | Both (prognostic) |
| ws100m | 100 m wind speed | m·s⁻¹ | Both (prognostic) |
| u10m | 10 m eastward wind | m·s⁻¹ | Both (prognostic) |
| v10m | 10 m northward wind | m·s⁻¹ | Both (prognostic) |
| u100m | 100 m eastward wind | m·s⁻¹ | Both (prognostic) |
| v100m | 100 m northward wind | m·s⁻¹ | Both (prognostic) |
| lcc | Low cloud cover | 0–1 | Both (prognostic) |
| mcc | Medium cloud cover | 0–1 | Both (prognostic) |
| hcc | High cloud cover | 0–1 | Both (prognostic) |
| tcc | Total cloud cover | 0–1 | Both (prognostic) |
| tcw | Total column water | kg·m⁻² | Both (prognostic) |
| ssr | Surface net solar radiation | J·m⁻² | Output (diagnostic) |
| ssrd | Surface solar radiation downwards | J·m⁻² | Output (diagnostic) |
| fdir | Total-sky direct solar radiation at surface | J·m⁻² | Output (diagnostic) |
| ttr | Top net thermal radiation | J·m⁻² | Output (diagnostic) |
| tp | Total precipitation | mm | Output (diagnostic) |
| Field | Level type | Input/Output |
|---|---|---|
| Land-sea mask, orography/geopotential, latitude/longitude encodings, time-of-day / day-of-year | Surface / static | Input (forcings) |
Evaluation
We compare FuXi-2.1 against FuXi-1.0 under an identical protocol: forecasts initialised from ERA5 and rolled out to 240 h in 6-hour steps. CSI is computed over land only, globally. These numbers come from a limited set of sample cases, not a full-year evaluation — they are indicative, and broader scorecards will follow.
Headline: RMSE stays comparable to FuXi-1.0 across variables, while structural and extreme-event scores improve substantially.
Precipitation — Critical Success Index (CSI)
| Threshold | FuXi-1.0 | FuXi-2.1 | Δ |
|---|---|---|---|
| ≥ 5 mm | 0.265 | 0.284 | +7.3% |
| ≥ 20 mm | 0.131 | 0.146 | +11.4% |
| ≥ 50 mm | 0.074 | 0.084 | +13.4% |
| ≥ 100 mm | 0.014 | 0.024 | +68.3% |
10 m wind speed — Critical Success Index (CSI)
| Threshold | FuXi-1.0 | FuXi-2.1 | Δ |
|---|---|---|---|
| ≥ 10.8 m·s⁻¹ | 0.544 | 0.571 | +4.8% |
| ≥ 24.5 m·s⁻¹ | 0.165 | 0.198 | +20.3% |
| ≥ 28.5 m·s⁻¹ | 0.000 | 0.044 | newly resolved |
The relative gain grows with event intensity, peaking at the extreme tail. At the 28.5 m·s⁻¹ wind threshold FuXi-1.0 scores zero — it never predicts such winds — whereas FuXi-2.1 attains a non-zero CSI. Spatial power spectra of FuXi-2.1 track the observed spectra across the full wavenumber range, in contrast to FuXi-1.0's high-wavenumber energy deficit.
Known limitations
- FuXi-2.1 is a deterministic model; it does not provide a calibrated ensemble spread.
- The CSI numbers reported here are computed on land only, over a limited set of sample cases rather than a full-year evaluation; treat them as indicative. Comprehensive global scorecards will be added.
- As with all ERA5-trained models, skill depends on the quality and resolution of the initial conditions.
Citation
If you use FuXi-2.1, please cite the FuXi series:
@article{chen2023fuxi,
title = {FuXi: a cascade machine learning forecasting system for 15-day global weather forecast},
author = {Chen, Lei and Zhong, Xiaohui and Zhang, Feng and Cheng, Yuan and Xu, Yimin and Qi, Yan and Li, Hao},
journal = {npj Climate and Atmospheric Science},
year = {2023},
volume = {6},
number = {1},
pages = {190}
}
Code: FuXi-1.0 — https://github.com/tpys/FuXi
© 2026 Fudan University & SAIS · FuXi Weather.