Restore HF model card with YAML frontmatter and full model index

Browse files

Files changed (1) hide show

README.md +188 -294

README.md CHANGED Viewed

@@ -1,358 +1,252 @@
-# 🚀 Mamba-Segmentation
-**Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
-### 🏆 The First Fair-Fight Benchmark for SSM vs. CNN vs. Transformer Backbones in Remote Sensing 🏆
-[![🏆 Venue](https://img.shields.io/badge/🏆_IGRAAS_2026-Accepted-brightgreen)](https://2026.ieeeigarss.org/)
-[![🐍 Python](https://img.shields.io/badge/🐍_Python-3.9-3776AB)](https://www.python.org/)
-[![🔥 PyTorch](https://img.shields.io/badge/🔥_PyTorch-2.0+-EE4C2C)](https://pytorch.org/)
-[![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)
-[![🤗 Weights](https://img.shields.io/badge/🤗_Weights-Hugging_Face-yellow)](https://huggingface.co/dineth18/Mamba-Segmentation)
-One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder — so the results finally mean something. SSMs dominate, scaling plateaus early, domain transfer is asymmetric, and boundaries are where every model breaks.
-Ready to see which backbone actually wins a fair fight? Let's go.
 ---
-[🔭 Overview](#-overview) • [✨ Why Controlled?](#-why-controlled-benchmarking-matters) • [🧠 Pipeline](#-the-controlled-pipeline) • [⚡ Quick Start](#-quick-start) • [🗂 Data](#-data-preparation) • [🚀 Train & Eval](#-train--evaluation) • [🔬 Analysis](#-analysis-scripts) • [📊 Results](#-results) • [🙏 Acknowledgements](#-acknowledgements) • [📜 Cite](#-citation)
 ---
-## 🔭 Overview
-Remote-sensing segmentation benchmarks have a fatal flaw: they change the backbone **and** the decoder **and** the loss **and** the schedule **and** the augmentations — all at once. The resulting numbers tell you who tuned harder, not which backbone is better.
-**Mamba-Segmentation fixes this:**
-- **Fixed lightweight U-Net decoder** → identical decoder across all experiments
-- **Fixed TriBraid loss** (Lovász + Focal + Boundary) → same optimization objective for every backbone
-- **Fixed training protocol** → 50k iterations, AdamW, poly LR, 512×512 crops, same augmentations
-- **Standardized feature interface** → {F1, F2, F3, F4} at strides {4, 8, 16, 32}
-- **Five backbone families** → VMamba, MambaVision, Spatial-Mamba, CNN (DeepLabv3), Transformer (UNetFormer)
-**Outcome:** differences in results reflect backbone behavior. Nothing else.
-<p align="center">
-  <img src="IGARSS%202026/Architecture.png" alt="Controlled Pipeline Architecture" width="100%">
-</p>
-<p align="center"><i>Lock the pipeline. Swap the backbone. Read the truth. Three SSM families (Spatial-Mamba, MambaVision, VMamba) share a single U-Net decoder and standardized feature interface {F1–F4}.</i></p>
 ---
-## ✨ Why Controlled Benchmarking Matters
-Every backbone paper ships its own decoder, its own training recipe, its own augmentation policy. You compare "Method A" to "Method B" — but you're really comparing two *entire pipelines*.
-Mamba-Segmentation isolates the **one variable that matters:**
-| What | Status |
 |---|---|
 | Encoder backbone | 🔀 **Swapped** per experiment — the ONLY variable |
-| Decoder architecture | 🔒 Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
-| Loss function | 🔒 Fixed (Lovász-Softmax + Focal + Boundary) |
-| Training schedule | 🔒 Fixed (50k iters, AdamW, poly decay) |
-| Augmentations | 🔒 Fixed (random crop, flip, color jitter) |
 | Input resolution | 🔒 Fixed (512×512) |
 | Feature interface | 🔒 Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
-When the results differ, you know *exactly* why.
 ---
-## 🧠 The Controlled Pipeline
-```
-Encoder:     swapped per experiment — the ONLY variable
-Decoder:     fixed lightweight U-Net (256ch, MambaBlock2d, addition skips)
-Interface:   {F1, F2, F3, F4} at strides {4, 8, 16, 32}
-Training:    50k iters · AdamW · poly LR decay · 512×512 crops · fixed augmentations
-Loss:        L = L_lovász + L_focal + 0.5 × L_boundary
-               ├─ Lovász-Softmax   → direct IoU optimization
-               ├─ Focal (γ=2.0)    → class imbalance handling
-               └─ Boundary (2px)   → edge penalty with warmup
-```
-**Backbone families tested:**
-| Family | Backbones | Type |
-|---|---|---|
-| **VMamba** | Tiny, Small, Base | SSM — cross-scan 2D selective state-space |
-| **MambaVision** | Tiny, Small, Base, Large, Large2 | SSM/Hybrid — Mamba + self-attention |
-| **Spatial-Mamba** | Tiny, Small, Base | SSM — spatially-aware scanning |
-| **DeepLabv3+** | ResNet-50 | CNN baseline |
-| **UNetFormer** | ResNet-18 | Transformer baseline |
-**Datasets:**
-- **LoveDA** → All→All, Urban→Rural, Rural→Urban (source-only, zero adaptation)
-- **ISPRS Potsdam** → high-resolution urban parsing (6-class)
 ---
-## ⚡ Quick Start
-### 1. Clone & Install
-```bash
-git clone https://github.com/YOUR_USERNAME/Mamba-Segmentation
-cd Mamba-Segmentation
-conda create -n mamba-seg python=3.9 -y
-conda activate mamba-seg
-cd MambaVision && pip install -r requirements.txt
-```
-### 2. Grab Pre-trained Backbone Weights
-> 🤗 **All trained segmentation checkpoints are available on [Hugging Face](https://huggingface.co/dineth18/Mamba-Segmentation).** Download `best.pth` for any model directly from there.
-| Backbone | Source | Location |
-|---|---|---|
-| VMamba (Tiny/Small/Base) | [VMamba repo](https://github.com/MzeroMiko/VMamba) | `VMamba/Vmamba_weights/ImageNet-1K/` |
-| MambaVision (Tiny→Large2) | [NVIDIA MambaVision](https://github.com/NVlabs/MambaVision) | `MambaVision/weights/1k/` |
-| Spatial-Mamba (Tiny/Small/Base) | [Spatial-Mamba repo](https://github.com/EdwardChaworworrachat/SpatialMamba) | `spatial-mamba/weights/imageNet1K/` |
-| ResNet-50 / ResNet-18 | [torchvision](https://pytorch.org/vision/stable/models.html) | `weights/imagenet/` |
-Set the weights path in each backbone's `config.py` — that's it.
-### 3. Configure Your Experiment
-Each backbone family has its own directory with a standardized interface:
-```
-<ModelFamily>/
-├── config.py          # ← edit DATA_ROOT / OUTPUT_DIR, or set env vars
-├── config_icprs.py    # ← for ISPRS Potsdam experiments
-├── train.py           # ← same training loop across all families
-├── model.py
-├── encoders.py
-├── light_decoder.py   # ← THE fixed decoder (identical everywhere)
-├── losses.py          # ← THE fixed loss (identical everywhere)
-└── utils.py
-```
-**Path configuration** — two approaches:
-**Option A — environment variables (recommended):**
-```bash
-export LOVEDA_ROOT=/path/to/LoveDA          # for LoveDA experiments
-export POTSDAM_ROOT=/path/to/ISPRS_Potsdam  # for Potsdam experiments
-export OUTPUT_DIR=/path/to/output           # optional — defaults to Comparison_Experiments/
-python train.py
-```
-**Option B — edit the config directly:**
-Open `config.py` and change `DATA_ROOT` and `OUTPUT_DIR` near the top of the file.
 ---
-## 🗂 Data Preparation
-Plug-and-play support for **LoveDA** and **ISPRS Potsdam**.
-<details>
-<summary>📁 <b>LoveDA Layout</b></summary>
-```
-DATA_ROOT/
-├── Train/
-│   ├── Urban/
-│   │   ├── images_png/
-│   │   └── masks_png/
-│   └── Rural/
-│       ├── images_png/
-│       └── masks_png/
-├── Val/
-│   ├── Urban/
-│   │   ├── images_png/
-│   │   └── masks_png/
-│   └── Rural/
-│       ├── images_png/
-│       └── masks_png/
-└── Test/
-```
-- **7 classes:** Background, Building, Road, Water, Barren, Forest, Agricultural
-- **Resolution:** 1024×1024 (cropped to 512×512 during training)
-- **Domains:** Urban and Rural — used for cross-domain evaluation
-</details>
-<details>
-<summary>📁 <b>ISPRS Potsdam Layout</b></summary>
-```
-DATA_ROOT/
-├── Images/
-├── Labels/
-└── splits/
-    ├── train.txt
-    ├── val.txt
-    └── test.txt
-```
-- **6 classes:** Impervious, Building, Low Vegetation, Tree, Car, Clutter
-- **Resolution:** 6000×6000 tiles (cropped to 512×512)
-</details>
-**Must-do:** Set `DATA_ROOT` in `config.py` (LoveDA) or `config_icprs.py` (Potsdam) to your local dataset path.
 ---
-## 🚀 Train & Evaluation
-YAML-free, config-driven — clean and reproducible.
-### Train
-```bash
-# LoveDA — pick any backbone family
-cd MambaVision                          # or VMamba/, spatial-mamba/, CNN_DeepLabv3p/, etc.
-# → edit config.py: set DATA_ROOT, OUTPUT_DIR, and backbone variant
-python train.py
-# ISPRS Potsdam
-cd VMamba
-# → edit config_icprs.py: set DATA_ROOT and OUTPUT_DIR
-python train.py
 ```
-Checkpoints + TensorBoard logs land in `Comparison_Experiments/<experiment_name>/`.
-### Efficiency Profiling
 ```bash
-# Single model benchmark (FPS + peak VRAM)
-python tools/benchmark_fps_mem.py \
-  --model mambavision --variant base --device cuda:0
-# Full sweep across all families
-python tools/benchmark_fps_mem_total.py \
-  --device cuda:0 --batch_size 1
-```
----
-## 🔬 Analysis Scripts
-Three diagnostic scripts that reproduce every analytical claim in the paper:
-| Script | What It Measures | What It Tells You |
-|---|---|---|
-| `analysis/boundary_analysis.py` | Boundary vs. interior mIoU under domain shift | Boundary degradation is the dominant failure mode — not interior misclassification |
-| `analysis/cross_domain_analysis.py` | U→R and R→U metrics for all families | Domain transfer asymmetry is backbone-agnostic — it's a data property |
-| `analysis/rotation_analysis.py` | Prediction stability under 90°/180°/270° rotations | Tests whether SSM scan-order introduces orientation artifacts |
-```bash
-python analysis/boundary_analysis.py \
-  --device cuda:0 --use_pretrained 1
-python analysis/cross_domain_analysis.py \
-  --device cuda:0 --use_pretrained 1
-python analysis/rotation_analysis.py \
-  --device cuda:0 --use_pretrained 1 \
-  --pack_rotations 1 \
-  --families mambavision,vmamba,spatialmamba
 ```
-Results land in `analysis_outputs/` as CSV files ready for plotting.
 ---
-## 📊 Results
-Straight from the paper — reproducible out of the box.
-Every row shares the same decoder, loss, optimizer, schedule, augmentations, and data splits. **The only variable is the encoder backbone.**
-| Type | Backbone | LoveDA mIoU | U→R | R→U | Potsdam mIoU |
-|---|---|---:|---:|---:|---:|
-| CNN | DeepLabv3 (controlled) | 43.01 | 30.36 | 39.98 | 75.09 |
-| Transformer | UNetFormer (controlled) | 48.61 | 34.56 | 44.84 | 74.99 |
-| **SSM** 🔥 | **VMamba-Small** | **55.66** | **40.62** | 53.52 | **77.59** |
-| **SSM** 🔥 | **MambaVision-L** | 55.25 | 38.53 | **54.01** | 77.07 |
-| SSM | Spatial-Mamba-B | 48.03 | 35.23 | 46.55 | 70.00 |
-> 🏆 **VMamba-Small. 55.66 mIoU. +7.05 over the best Transformer. +12.65 over the best CNN. Same decoder. Same training. No tricks.**
-### Accuracy vs. Throughput
-<p align="center">
-  <img src="IGARSS%202026/fps_vs_miou.png" alt="mIoU vs Inference Throughput" width="60%">
-</p>
-<p align="center"><i>mIoU (%) vs. inference throughput (FPS) for all SSM variants. VMamba holds near-peak accuracy across all sizes. MambaVision trades speed for capacity with diminishing returns. Spatial-Mamba sits in the lower tier.</i></p>
-### Key Takeaways
-🔥 **SSMs dominate the fair fight.** VMamba-Small beats UNetFormer by +7.05 and DeepLabv3 by +12.65 on LoveDA — under identical conditions. This is the backbone, not the pipeline.
-📏 **Bigger ≠ better under a fixed decoder.** MambaVision-L carries far more parameters than VMamba-Small yet scores 55.25 vs. 55.66. Scaling the encoder past a threshold buys nothing when the decoder stays constant.
-🔄 **Domain transfer is asymmetric — and backbone-agnostic.** Rural→Urban outperforms Urban→Rural by 10–15 points across every family. VMamba-Small: 53.52 R→U vs. 40.62 U→R. This is a data distribution property, not a model property.
-🧱 **Boundaries are the unsolved failure mode.** Under domain shift, interior accuracy holds. Boundary accuracy collapses. Every backbone, every family, same story. Whoever cracks boundary sensitivity under distribution shift wins the next round.
-### Qualitative Results — LoveDA
-<p align="center">
-  <img src="IGARSS%202026/loveda_qualitative_detailed_enhanced.png" alt="LoveDA Qualitative Results" width="85%">
-</p>
-<p align="center"><i>Predictions + error maps (magenta = false positive, dark green = false negative) on LoveDA Urban and Rural scenes. VMamba-S and VMamba-B produce the cleanest boundaries; Spatial-Mamba-B shows the most false positives at class transitions.</i></p>
-### Qualitative Results — ISPRS Potsdam
-<p align="center">
-  <img src="IGARSS%202026/potsdam_qualitative_detailed_enhanced.png" alt="ISPRS Potsdam Qualitative Results" width="85%">
-</p>
-<p align="center"><i>Predictions + error maps on ISPRS Potsdam. All SSM variants handle large homogeneous regions well; errors concentrate at fine-grained boundaries (cars, narrow roads) — consistent with the boundary analysis findings.</i></p>
----
-## 🧬 Backbone Overview
-| Backbone | Architecture | Key Idea | RS Segmentation Impact |
-|---|---|---|---|
-| **VMamba** | Cross-scan 2D selective SSM | Global spatial context with linear complexity via multi-directional scanning | 🥇 Top performer: 55.66 LoveDA mIoU, strongest domain transfer |
-| **MambaVision** | Hybrid Mamba + self-attention | Interleaves Mamba blocks (early stages) with attention (late stages) | Matches VMamba on Potsdam, but extra capacity doesn't help on LoveDA |
-| **Spatial-Mamba** | Spatially-aware SSM | Explicit positional inductive biases in the state-space pathway | Beats CNN baseline, but scan-order alone insufficient without global modeling |
-| **DeepLabv3+** | CNN (ResNet-50) | Atrous convolutions + ASPP for multi-scale context | Controlled CNN reference — 43.01 mIoU baseline |
-| **UNetFormer** | Transformer (ResNet-18) | Efficient self-attention decoder for dense prediction | Controlled Transformer reference — 48.61 mIoU baseline |
----
-## 🙏 Acknowledgements
-This work builds on prior advances in visual state-space models and remote-sensing segmentation. We gratefully acknowledge:
-- **[VMamba](https://github.com/MzeroMiko/VMamba)** — Visual State Space Model backbone
-- **[MambaVision](https://github.com/NVlabs/MambaVision)** — NVIDIA's hybrid Mamba-Transformer architecture
-- **[Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba)** — Spatially-aware Mamba variant
-- **[LoveDA](https://github.com/Junjue-Wang/LoveDA)** and **[ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/)** dataset creators
----
-## 📜 Citation
-If Mamba-Segmentation fuels your research, please cite:
 ```bibtex
 @article{wasalathilaka2026controlledbenchmark,
   title={A Controlled Benchmark of Visual State-Space Backbones with
-         Domain-Shift and Boundary Analysis for Remote-Sensing
-         Segmentation},
-  author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon,
-          Oshadha and Wijenayake, Buddhi and Godaliyadda, Roshan and
-          Herath, Vijitha and Ekanayake, Parakrama},
-  journal={IGRAAS 2026},
   year={2026}
 }
 ```
 ---
-🌍🛰️ Built at the **University of Peradeniya**. Got inspired? Give us a ⭐

 ---
+license: mit
+language:
+  - en
+tags:
+  - remote-sensing
+  - semantic-segmentation
+  - mamba
+  - state-space-model
+  - vmamba
+  - mambavision
+  - spatial-mamba
+  - pytorch
+  - benchmark
+  - loveda
+  - isprs-potsdam
+  - domain-adaptation
+datasets:
+  - LoveDA
+  - ISPRS-Potsdam
+pipeline_tag: image-segmentation
 ---
+# Mamba-Segmentation
+**Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
+> *Accepted at IGARSS 2026*
+One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder — so the results finally mean something.
 ---
+## What Is This?
+Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better.
+This repo fixes that. **One shared pipeline — swap the backbone — read the truth.**
+| Component | Status |
 |---|---|
 | Encoder backbone | 🔀 **Swapped** per experiment — the ONLY variable |
+| Decoder | 🔒 Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
+| Loss | 🔒 Fixed (Lovász-Softmax + Focal + Boundary) |
+| Training schedule | 🔒 Fixed (50k iters, AdamW, poly LR decay) |
+| Augmentations | 🔒 Fixed (random crop, flip, colour jitter) |
 | Input resolution | 🔒 Fixed (512×512) |
 | Feature interface | 🔒 Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
 ---
+## Checkpoints in This Repository
+All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure.
+### LoveDA Experiments — `Comparison_Experiments/`
+#### MambaVision (NVIDIA hybrid Mamba-Transformer)
+| Checkpoint path | Training split |
+|---|---|
+| `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) |
+| `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) |
+| `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) |
+| `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+#### VMamba (cross-scan 2D selective SSM)
+| Checkpoint path | Training split |
+|---|---|
+| `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) |
+| `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) |
+| `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+#### VisionMamba / Vim (bidirectional Mamba)
+| Checkpoint path | Training split |
+|---|---|
+| `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+#### Spatial-Mamba (spatially-aware SSM)
+| Checkpoint path | Training split |
+|---|---|
+| `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+| `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All |
+| `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
+| `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
+#### CNN & Transformer Baselines
+| Checkpoint path | Model |
+|---|---|
+| `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All |
+| `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban |
+| `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural |
+| `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All |
+| `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All |
+| `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban |
+| `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural |
 ---
+### ISPRS Potsdam Experiments — `Comparison_Experiments_ICPRS_potsdam/`
+| Checkpoint path | Model |
+|---|---|
+| `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny |
+| `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 |
+| `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small |
+| `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base |
+| `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large |
+| `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 |
+| `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny |
+| `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small |
+| `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base |
+| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny |
+| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small |
+| `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base |
+| `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 |
+| `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 |
 ---
+### ImageNet Backbone Weights — `weights/imagenet/`
+| File | Description |
+|---|---|
+| `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained |
+| `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained |
+---
+## Results Summary
+Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.**
+### LoveDA
+| Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) |
+|---|---:|---:|---:|
+| DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 |
+| UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 |
+| VMamba-Small **🥇** | **55.66** | **40.62** | 53.52 |
+| MambaVision-Large | 55.25 | 38.53 | **54.01** |
+| Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 |
+### ISPRS Potsdam
+| Backbone | mIoU |
+|---|---:|
+| DeepLabv3+ ResNet-50 | 75.09 |
+| UNetFormer ResNet-18 | 74.99 |
+| VMamba-Small **🥇** | **77.59** |
+| MambaVision-Large | 77.07 |
+| Spatial-Mamba-Base | 70.00 |
+**Key findings:**
+- SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA).
+- Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder.
+- Domain transfer is asymmetric across all backbone families (Rural→Urban consistently outperforms Urban→Rural by 10–15 points) — a data distribution property, not a model property.
+- Boundary accuracy collapses under domain shift while interior accuracy holds — every backbone, every family.
 ---
+## How to Load a Checkpoint
+```python
+import torch
+# Example: load MambaVision-Base best checkpoint for LoveDA All→All
+ckpt = torch.load(
+    "Comparison_Experiments/mambavision_base_512/checkpoints/best.pth",
+    map_location="cpu"
+)
+# keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score'
+model_state = ckpt["model"]
 ```
+To build the full model and run inference, clone the code repository and follow the setup instructions:
 ```bash
+git clone https://github.com/dineth18/Mamba-Segmentation
+cd Mamba-Segmentation/MambaVision   # or VMamba/, spatial-mamba/, etc.
+pip install -r requirements.txt
+# Set your dataset path (no need to edit config files)
+export LOVEDA_ROOT=/path/to/LoveDA
+export POTSDAM_ROOT=/path/to/ISPRS_Potsdam
+python eval.py --checkpoint path/to/best.pth
 ```
 ---
+## Citation
+If this benchmark is useful for your research, please cite:
 ```bibtex
 @article{wasalathilaka2026controlledbenchmark,
   title={A Controlled Benchmark of Visual State-Space Backbones with
+         Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation},
+  author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha
+          and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha
+          and Ekanayake, Parakrama},
+  journal={IGARSS 2026},
   year={2026}
 }
 ```
 ---
+## Acknowledgements
+- [VMamba](https://github.com/MzeroMiko/VMamba) — Visual State Space Model
+- [MambaVision](https://github.com/NVlabs/MambaVision) — NVIDIA hybrid Mamba-Transformer
+- [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) — Spatially-aware Mamba
+- [LoveDA](https://github.com/Junjue-Wang/LoveDA) — Land-cover domain adaptation dataset
+- [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) — Urban semantic labeling benchmark
+Built at the **University of Peradeniya**.