Mask R-CNN for Multi-Class Object Detection on NWPU-VHR-10
This model is a Mask R-CNN with a ResNet-50 FPN backbone, trained for multi-class object detection on very high resolution (VHR) remote sensing imagery using the NWPU-VHR-10 benchmark dataset.
Model Description
- Architecture: Mask R-CNN with ResNet-50 FPN backbone (torchvision
maskrcnn_resnet50_fpn) - Task: Multi-class object detection and instance segmentation
- Domain: Very high resolution (VHR) remote sensing imagery
- Training Framework: PyTorch + torchvision
- Package: Trained using geoai-py
Training Data
The NWPU-VHR-10 dataset was constructed by researchers at Northwestern Polytechnical University (NWPU). It contains:
- 800 VHR remote sensing images (650 positive with annotations, 150 negative)
- 3,775 annotated object instances across 10 classes
- Images cropped from Google Earth and the Vaihingen dataset
- Manually annotated by domain experts
Classes
| ID | Class | ID | Class |
|---|---|---|---|
| 1 | airplane | 6 | basketball_court |
| 2 | ship | 7 | ground_track_field |
| 3 | storage_tank | 8 | harbor |
| 4 | baseball_diamond | 9 | bridge |
| 5 | tennis_court | 10 | vehicle |
Data Split
| Split | Images | Purpose |
|---|---|---|
| Train | 509 (85%) | Model training |
| Val | 128 (15%) | Evaluation |
Training Details
| Parameter | Value |
|---|---|
| Backbone | ResNet-50 FPN (ImageNet pretrained) |
| Optimizer | SGD (momentum=0.9, weight_decay=0.0005) |
| Learning Rate | 0.005 with StepLR (step=3, gamma=0.1) |
| Batch Size | 4 |
| Epochs | 20 |
| Input Channels | 3 (RGB) |
| Seed | 42 |
Evaluation Results
Evaluated on the validation set (128 images) using standard COCO detection metrics:
Per-Class AP@0.5
| Class | AP@0.5 | Class | AP@0.5 |
|---|---|---|---|
| tennis_court | 0.902 | harbor | 0.695 |
| basketball_court | 0.878 | bridge | 0.678 |
| baseball_diamond | 0.821 | storage_tank | 0.591 |
| ground_track_field | 0.807 | vehicle | 0.532 |
| airplane | 0.713 | ship | 0.470 |
Usage
With geoai
import geoai
# Run inference on a remote sensing image
result_path, inference_time, detections = geoai.multiclass_detection(
input_path="your_image.tif",
output_path="detections.tif",
model_path="best_model.pth",
num_classes=11,
class_names=[
"background", "airplane", "ship", "storage_tank",
"baseball_diamond", "tennis_court", "basketball_court",
"ground_track_field", "harbor", "bridge", "vehicle",
],
window_size=512,
overlap=256,
confidence_threshold=0.5,
)
# Visualize results
geoai.visualize_multiclass_detections(
image_path="your_image.tif",
detections=detections,
class_names=[
"background", "airplane", "ship", "storage_tank",
"baseball_diamond", "tennis_court", "basketball_court",
"ground_track_field", "harbor", "bridge", "vehicle",
],
)
With PyTorch directly
import torch
import torchvision
from torchvision.models.detection import maskrcnn_resnet50_fpn
# Load model
model = maskrcnn_resnet50_fpn(weights=None, num_classes=11)
checkpoint = torch.load("best_model.pth", map_location="cpu")
model.load_state_dict(checkpoint)
model.eval()
# Run inference
from PIL import Image
from torchvision import transforms
img = Image.open("your_image.jpg").convert("RGB")
img_tensor = transforms.ToTensor()(img).unsqueeze(0)
with torch.no_grad():
predictions = model(img_tensor)
# predictions[0] contains 'boxes', 'labels', 'scores', 'masks'
Files
| File | Description |
|---|---|
best_model.pth |
Best model weights (by validation IoU) |
class_info.json |
Class names and number of classes |
training_summary.txt |
Training configuration summary |
Notebook Example
For a complete end-to-end example including dataset download, training, evaluation, and inference, see the NWPU-VHR-10 Object Detection notebook.
Citation
If you use this model, please cite the NWPU-VHR-10 dataset:
@article{cheng2014multi,
title={Multi-class geospatial object detection and geographic image classification based on collection of part detectors},
author={Cheng, Gong and Han, Junwei and Zhou, Peicheng and Guo, Lei},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume={98},
pages={119--132},
year={2014}
}
@article{cheng2016survey,
title={A survey on object detection in optical remote sensing images},
author={Cheng, Gong and Han, Junwei},
journal={ISPRS Journal of Photogrammetry and Remote Sensing},
volume={117},
pages={11--28},
year={2016}
}
License
This model is released under the MIT License. The NWPU-VHR-10 dataset is for research purposes only.