Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Paper β’ 2103.14030 β’ Published β’ 5
PyTorch implementation of Swin Transformer (Liu et al. 2021) trained on NWPU-RESISC45 satellite imagery dataset.
| Property | Value |
|---|---|
| Architecture | Swin Transformer (4 stages) |
| Dataset | NWPU-RESISC45 |
| Classes | 45 land use categories |
| Test Accuracy | 82% |
| Input Size | 224Γ224 |
| Embed Dim | 96 |
| Training Hardware | RTX 4050 6GB |
| Framework | PyTorch (from scratch) |
airplane, airport, baseball_diamond, basketball_court, beach, bridge, chaparral, church, circular_farmland, cloud, commercial_area, dense_residential, desert, forest, freeway, golf_course, ground_track_field, harbor, industrial_area, intersection, island, lake, meadow, medium_residential, mobile_home_park, mountain, overpass, palace, parking_lot, railway, railway_station, rectangular_farmland, river, roundabout, runway, sea_ice, ship, snowberg, sparse_residential, stadium, storage_tank, tennis_court, terrace, thermal_power_station, wetland
from huggingface_hub import hf_hub_download
import torch
from torchvision import transforms
from PIL import Image
checkpoint = torch.load(
hf_hub_download("Sathya77/swin-transformer-satellite", "swin_resisc45.pth"),
map_location='cpu'
)
model = SwinTransformer(embed_dim=96, num_classes=45)
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()
Try it here: Sathya77/swin-transformer-satellite