GeoMTConvNeXt — embed2heights Multi-Task Geospatial Model

GeoMTConvNeXt is a multi-task geospatial prediction model trained on the ESA/ITU GeoFM embed2heights benchmark (Belgium/Netherlands).

It predicts four geospatial tasks simultaneously from multi-source satellite embeddings: building cover · vegetation cover · water cover · height in metres.


Parameters	52 M
Backbone	ConvNeXt-Tiny (ImageNet-1K pretrained, fine-tuned)
Val score	0.468 (competition metric, fold 0)
Input resolution	256 × 256 px tiles

Architecture

Backbone adapter: A learned Conv2d(192→3) projects the concatenated AlphaEarth+Tessera pixel embeddings to pseudo-RGB, normalised to ImageNet statistics, making ImageNet pretrained weights directly applicable.

Patch-context injection: TerraMind and THOR patch embeddings (16×16 spatial resolution, 3072 total channels) are encoded and injected at the ConvNeXt bottleneck, providing global geographic context for locally ambiguous pixels.

Ordinal height regression: Height is predicted as a soft expectation over 64 uniformly spaced bins, making gradients smooth across the full height range and handling the highly skewed building height distribution.

Quickstart

pip install huggingface_hub numpy torch torchvision

Fast inference (embedding cache — 2 851 known tiles, O(1))

from inference import GeoMTConvNeXtInference
import numpy as np

model = GeoMTConvNeXtInference("Abdoul27/embed2heights-geoconvnext")

batch = {
    "alphaearth_emb"  : np.load("3001_BE.npz")["alphaearth_emb"],    # [64,  256, 256]
    "tessera_emb"     : np.load("3001_BE.npz")["tessera_emb"],        # [128, 256, 256]
    "terramind_s1_emb": np.load("3001_BE.npz")["terramind_s1_emb"],   # [768,  16,  16]
    "terramind_s2_emb": np.load("3001_BE.npz")["terramind_s2_emb"],   # [768,  16,  16]
    "thor_s1_emb"     : np.load("3001_BE.npz")["thor_s1_emb"],        # [768,  16,  16]
    "thor_s2_emb"     : np.load("3001_BE.npz")["thor_s2_emb"],        # [768,  16,  16]
}

pred = model(batch)

pred.building_cover     # [256, 256]  ∈ [0, 1]
pred.vegetation_cover   # [256, 256]  ∈ [0, 1]
pred.water_cover        # [256, 256]  ∈ [0, 1]
pred.height             # [256, 256]  in metres
pred.array              # [4, 256, 256]
pred.source             # "cache" | "model"

Full model inference (any tile, GPU recommended)

model = GeoMTConvNeXtInference(
    "Abdoul27/embed2heights-geoconvnext",
    device="cuda"
)
# Same call — automatically falls back to GeoMTConvNeXt forward pass
# for tiles not present in the cache
pred = model(batch)

Load model weights directly

import torch
from model import GeoMTConvNeXt

net = GeoMTConvNeXt(base=64, pretrained=False)
ck  = torch.load("model.pt", map_location="cpu", weights_only=True)
net.load_state_dict(ck["model"])
net.eval()

# batch: dict of torch.Tensors with batch dimension
out, h_logits, seg_logits, aux = net(batch)
# out: [B, 4, 256, 256]  — cover (sigmoid) + height (metres)

Local / offline use

model = GeoMTConvNeXtInference.from_local("path/to/repo/", device="cuda")
pred  = model(batch)

Repository contents

File	Size	Description
`model.py`	—	`GeoMTConvNeXt` architecture (self-contained, no project deps)
`model.pt`	~200 MB	Trained weights (best checkpoint, fold 0)
`predictions.npz`	337 MB	Embedding-signature cache for 2 851 tiles
`inference.py`	—	Unified inference interface (cache + model fallback)

Competition context

Trained for the ESA/ITU GeoFM embed2heights challenge (closes 2026-06-30).

Scoring metric:

0.25 × IoU_bld  +  0.15 × IoU_veg  +  0.15 × IoU_wtr
               +  0.25 × (1 − RMSE_bld / 3)
               +  0.20 × (1 − RMSE_veg / 5)

License

CC-BY-4.0.

Downloads last month: -; Downloads are not tracked for this model. How to track