Mask R-CNN for Multi-Class Object Detection on NWPU-VHR-10

This model is a Mask R-CNN with a ResNet-50 FPN backbone, trained for multi-class object detection on very high resolution (VHR) remote sensing imagery using the NWPU-VHR-10 benchmark dataset.

Model Description

  • Architecture: Mask R-CNN with ResNet-50 FPN backbone (torchvision maskrcnn_resnet50_fpn)
  • Task: Multi-class object detection and instance segmentation
  • Domain: Very high resolution (VHR) remote sensing imagery
  • Training Framework: PyTorch + torchvision
  • Package: Trained using geoai-py

Training Data

The NWPU-VHR-10 dataset was constructed by researchers at Northwestern Polytechnical University (NWPU). It contains:

  • 800 VHR remote sensing images (650 positive with annotations, 150 negative)
  • 3,775 annotated object instances across 10 classes
  • Images cropped from Google Earth and the Vaihingen dataset
  • Manually annotated by domain experts

Classes

ID Class ID Class
1 airplane 6 basketball_court
2 ship 7 ground_track_field
3 storage_tank 8 harbor
4 baseball_diamond 9 bridge
5 tennis_court 10 vehicle

Data Split

Split Images Purpose
Train 509 (85%) Model training
Val 128 (15%) Evaluation

Training Details

Parameter Value
Backbone ResNet-50 FPN (ImageNet pretrained)
Optimizer SGD (momentum=0.9, weight_decay=0.0005)
Learning Rate 0.005 with StepLR (step=3, gamma=0.1)
Batch Size 4
Epochs 20
Input Channels 3 (RGB)
Seed 42

Evaluation Results

Evaluated on the validation set (128 images) using standard COCO detection metrics:

Metric Value
mAP@0.5 0.709
mAP@0.75 0.518
mAP@[0.5:0.95] 0.459

Per-Class AP@0.5

Class AP@0.5 Class AP@0.5
tennis_court 0.902 harbor 0.695
basketball_court 0.878 bridge 0.678
baseball_diamond 0.821 storage_tank 0.591
ground_track_field 0.807 vehicle 0.532
airplane 0.713 ship 0.470

Usage

With geoai

import geoai

# Run inference on a remote sensing image
result_path, inference_time, detections = geoai.multiclass_detection(
    input_path="your_image.tif",
    output_path="detections.tif",
    model_path="best_model.pth",
    num_classes=11,
    class_names=[
        "background", "airplane", "ship", "storage_tank",
        "baseball_diamond", "tennis_court", "basketball_court",
        "ground_track_field", "harbor", "bridge", "vehicle",
    ],
    window_size=512,
    overlap=256,
    confidence_threshold=0.5,
)

# Visualize results
geoai.visualize_multiclass_detections(
    image_path="your_image.tif",
    detections=detections,
    class_names=[
        "background", "airplane", "ship", "storage_tank",
        "baseball_diamond", "tennis_court", "basketball_court",
        "ground_track_field", "harbor", "bridge", "vehicle",
    ],
)

With PyTorch directly

import torch
import torchvision
from torchvision.models.detection import maskrcnn_resnet50_fpn

# Load model
model = maskrcnn_resnet50_fpn(weights=None, num_classes=11)
checkpoint = torch.load("best_model.pth", map_location="cpu")
model.load_state_dict(checkpoint)
model.eval()

# Run inference
from PIL import Image
from torchvision import transforms

img = Image.open("your_image.jpg").convert("RGB")
img_tensor = transforms.ToTensor()(img).unsqueeze(0)

with torch.no_grad():
    predictions = model(img_tensor)

# predictions[0] contains 'boxes', 'labels', 'scores', 'masks'

Files

File Description
best_model.pth Best model weights (by validation IoU)
class_info.json Class names and number of classes
training_summary.txt Training configuration summary

Notebook Example

For a complete end-to-end example including dataset download, training, evaluation, and inference, see the NWPU-VHR-10 Object Detection notebook.

Citation

If you use this model, please cite the NWPU-VHR-10 dataset:

@article{cheng2014multi,
  title={Multi-class geospatial object detection and geographic image classification based on collection of part detectors},
  author={Cheng, Gong and Han, Junwei and Zhou, Peicheng and Guo, Lei},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing},
  volume={98},
  pages={119--132},
  year={2014}
}

@article{cheng2016survey,
  title={A survey on object detection in optical remote sensing images},
  author={Cheng, Gong and Han, Junwei},
  journal={ISPRS Journal of Photogrammetry and Remote Sensing},
  volume={117},
  pages={11--28},
  year={2016}
}

License

This model is released under the MIT License. The NWPU-VHR-10 dataset is for research purposes only.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support