README.md · allenai/satlas-pretrain at b93eea0c6a7e4ceea91827a5362a73675a801f23

metadata

license: apache-2.0

Overview

SatlasPretrain is a large-scale remote sensing image understanding dataset, intended for pre-training powerful foundation models on a variety of types of satellite and aerial images. It pairs remote sensing images with hundreds of millions of labels derived from OpenStreetMap, WorldCover, other existing datasets, and new manual annotation.

Quick links:

Dataset

The dataset is contained in the tar files in the dataset/ folder of this repository. Our Github repository contains details about the format of the dataset and how to use it, as well as pre-training code.

The dataset is released under ODC-BY.

Models

The models here are Swin Transformer and Resnet backbones pre-trained on the different types of remote sensing images in SatlasPretrain:

Sentinel-2
Sentinel-1
Landsat 8/9
0.5 - 2 m/pixel aerial imagery

The pre-trained backbones are expected to improve performance on a wide range of remote sensing and geospatial tasks, such as planetary and environmental monitoring. They have already been deployed to develop robust models for detecting solar farms, wind turbines, offshore platforms, and tree cover in Satlas, a platform for accessing global geospatial data generated by AI from satellite imagery.

See here for details on how to use the models and the expected inputs.

The model weights are released under ODC-BY.

Usage and Input Normalization

The backbones can be loaded for fine-tuning on downstream tasks:

import torch
import torchvision
model = torchvision.models.swin_transformer.swin_v2_b()
full_state_dict = torch.load('satlas-model-v1-highres.pth')
# Extract just the Swin backbone parameters from the full state dict.
swin_prefix = 'backbone.backbone.'
swin_state_dict = {k[len(swin_prefix):]: v for k, v in full_state_dict.items() if k.startswith(swin_prefix)}
model.load_state_dict(swin_state_dict)

They can also be easily initialized using the lightweight satlaspretrain_models package.

Code

The training code and SatlasPretrain dataset are at https://github.com/allenai/satlas/.

SatlasPretrain is a paper appearing at the International Conference on Computer Vision in October 2023.

Feedback

We welcome any feedback about the model or training data. To contact us, please open an issue on the SatlasPretrain github.