dineth18 commited on
Commit
fe4b2ae
Β·
verified Β·
1 Parent(s): 4e30fe8

Restore HF model card with YAML frontmatter and full model index

Browse files
Files changed (1) hide show
  1. README.md +188 -294
README.md CHANGED
@@ -1,358 +1,252 @@
1
- # πŸš€ Mamba-Segmentation
2
-
3
- **Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
4
-
5
- ### πŸ† The First Fair-Fight Benchmark for SSM vs. CNN vs. Transformer Backbones in Remote Sensing πŸ†
6
-
7
- [![πŸ† Venue](https://img.shields.io/badge/πŸ†_IGRAAS_2026-Accepted-brightgreen)](https://2026.ieeeigarss.org/)
8
- [![🐍 Python](https://img.shields.io/badge/🐍_Python-3.9-3776AB)](https://www.python.org/)
9
- [![πŸ”₯ PyTorch](https://img.shields.io/badge/πŸ”₯_PyTorch-2.0+-EE4C2C)](https://pytorch.org/)
10
- [![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)
11
- [![πŸ€— Weights](https://img.shields.io/badge/πŸ€—_Weights-Hugging_Face-yellow)](https://huggingface.co/dineth18/Mamba-Segmentation)
12
-
13
- One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder β€” so the results finally mean something. SSMs dominate, scaling plateaus early, domain transfer is asymmetric, and boundaries are where every model breaks.
14
-
15
- Ready to see which backbone actually wins a fair fight? Let's go.
16
-
17
  ---
18
-
19
- [πŸ”­ Overview](#-overview) β€’ [✨ Why Controlled?](#-why-controlled-benchmarking-matters) β€’ [🧠 Pipeline](#-the-controlled-pipeline) β€’ [⚑ Quick Start](#-quick-start) β€’ [πŸ—‚ Data](#-data-preparation) β€’ [πŸš€ Train & Eval](#-train--evaluation) β€’ [πŸ”¬ Analysis](#-analysis-scripts) β€’ [πŸ“Š Results](#-results) β€’ [πŸ™ Acknowledgements](#-acknowledgements) β€’ [πŸ“œ Cite](#-citation)
20
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
 
23
 
24
- ## πŸ”­ Overview
25
-
26
- Remote-sensing segmentation benchmarks have a fatal flaw: they change the backbone **and** the decoder **and** the loss **and** the schedule **and** the augmentations β€” all at once. The resulting numbers tell you who tuned harder, not which backbone is better.
27
-
28
- **Mamba-Segmentation fixes this:**
29
-
30
- - **Fixed lightweight U-Net decoder** β†’ identical decoder across all experiments
31
- - **Fixed TriBraid loss** (LovΓ‘sz + Focal + Boundary) β†’ same optimization objective for every backbone
32
- - **Fixed training protocol** β†’ 50k iterations, AdamW, poly LR, 512Γ—512 crops, same augmentations
33
- - **Standardized feature interface** β†’ {F1, F2, F3, F4} at strides {4, 8, 16, 32}
34
- - **Five backbone families** β†’ VMamba, MambaVision, Spatial-Mamba, CNN (DeepLabv3), Transformer (UNetFormer)
35
 
36
- **Outcome:** differences in results reflect backbone behavior. Nothing else.
37
 
38
- <p align="center">
39
- <img src="IGARSS%202026/Architecture.png" alt="Controlled Pipeline Architecture" width="100%">
40
- </p>
41
- <p align="center"><i>Lock the pipeline. Swap the backbone. Read the truth. Three SSM families (Spatial-Mamba, MambaVision, VMamba) share a single U-Net decoder and standardized feature interface {F1–F4}.</i></p>
42
 
43
  ---
44
 
45
- ## ✨ Why Controlled Benchmarking Matters
46
 
47
- Every backbone paper ships its own decoder, its own training recipe, its own augmentation policy. You compare "Method A" to "Method B" β€” but you're really comparing two *entire pipelines*.
48
 
49
- Mamba-Segmentation isolates the **one variable that matters:**
50
 
51
- | What | Status |
52
  |---|---|
53
  | Encoder backbone | πŸ”€ **Swapped** per experiment β€” the ONLY variable |
54
- | Decoder architecture | πŸ”’ Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
55
- | Loss function | πŸ”’ Fixed (LovΓ‘sz-Softmax + Focal + Boundary) |
56
- | Training schedule | πŸ”’ Fixed (50k iters, AdamW, poly decay) |
57
- | Augmentations | πŸ”’ Fixed (random crop, flip, color jitter) |
58
  | Input resolution | πŸ”’ Fixed (512Γ—512) |
59
  | Feature interface | πŸ”’ Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
60
 
61
- When the results differ, you know *exactly* why.
62
-
63
  ---
64
 
65
- ## 🧠 The Controlled Pipeline
66
-
67
- ```
68
- Encoder: swapped per experiment β€” the ONLY variable
69
- Decoder: fixed lightweight U-Net (256ch, MambaBlock2d, addition skips)
70
- Interface: {F1, F2, F3, F4} at strides {4, 8, 16, 32}
71
- Training: 50k iters Β· AdamW Β· poly LR decay Β· 512Γ—512 crops Β· fixed augmentations
72
- Loss: L = L_lovΓ‘sz + L_focal + 0.5 Γ— L_boundary
73
- β”œβ”€ LovΓ‘sz-Softmax β†’ direct IoU optimization
74
- β”œβ”€ Focal (Ξ³=2.0) β†’ class imbalance handling
75
- └─ Boundary (2px) β†’ edge penalty with warmup
76
- ```
77
 
78
- **Backbone families tested:**
79
 
80
- | Family | Backbones | Type |
81
- |---|---|---|
82
- | **VMamba** | Tiny, Small, Base | SSM β€” cross-scan 2D selective state-space |
83
- | **MambaVision** | Tiny, Small, Base, Large, Large2 | SSM/Hybrid β€” Mamba + self-attention |
84
- | **Spatial-Mamba** | Tiny, Small, Base | SSM β€” spatially-aware scanning |
85
- | **DeepLabv3+** | ResNet-50 | CNN baseline |
86
- | **UNetFormer** | ResNet-18 | Transformer baseline |
87
 
88
- **Datasets:**
89
- - **LoveDA** → All→All, Urban→Rural, Rural→Urban (source-only, zero adaptation)
90
- - **ISPRS Potsdam** β†’ high-resolution urban parsing (6-class)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
  ---
93
 
94
- ## ⚑ Quick Start
95
-
96
- ### 1. Clone & Install
97
 
98
- ```bash
99
- git clone https://github.com/YOUR_USERNAME/Mamba-Segmentation
100
- cd Mamba-Segmentation
101
-
102
- conda create -n mamba-seg python=3.9 -y
103
- conda activate mamba-seg
104
-
105
- cd MambaVision && pip install -r requirements.txt
106
- ```
107
-
108
- ### 2. Grab Pre-trained Backbone Weights
109
-
110
- > πŸ€— **All trained segmentation checkpoints are available on [Hugging Face](https://huggingface.co/dineth18/Mamba-Segmentation).** Download `best.pth` for any model directly from there.
111
-
112
- | Backbone | Source | Location |
113
- |---|---|---|
114
- | VMamba (Tiny/Small/Base) | [VMamba repo](https://github.com/MzeroMiko/VMamba) | `VMamba/Vmamba_weights/ImageNet-1K/` |
115
- | MambaVision (Tiny→Large2) | [NVIDIA MambaVision](https://github.com/NVlabs/MambaVision) | `MambaVision/weights/1k/` |
116
- | Spatial-Mamba (Tiny/Small/Base) | [Spatial-Mamba repo](https://github.com/EdwardChaworworrachat/SpatialMamba) | `spatial-mamba/weights/imageNet1K/` |
117
- | ResNet-50 / ResNet-18 | [torchvision](https://pytorch.org/vision/stable/models.html) | `weights/imagenet/` |
118
-
119
- Set the weights path in each backbone's `config.py` β€” that's it.
120
-
121
- ### 3. Configure Your Experiment
122
-
123
- Each backbone family has its own directory with a standardized interface:
124
-
125
- ```
126
- <ModelFamily>/
127
- β”œβ”€β”€ config.py # ← edit DATA_ROOT / OUTPUT_DIR, or set env vars
128
- β”œβ”€β”€ config_icprs.py # ← for ISPRS Potsdam experiments
129
- β”œβ”€β”€ train.py # ← same training loop across all families
130
- β”œβ”€β”€ model.py
131
- β”œβ”€β”€ encoders.py
132
- β”œβ”€β”€ light_decoder.py # ← THE fixed decoder (identical everywhere)
133
- β”œβ”€β”€ losses.py # ← THE fixed loss (identical everywhere)
134
- └── utils.py
135
- ```
136
-
137
- **Path configuration** β€” two approaches:
138
-
139
- **Option A β€” environment variables (recommended):**
140
- ```bash
141
- export LOVEDA_ROOT=/path/to/LoveDA # for LoveDA experiments
142
- export POTSDAM_ROOT=/path/to/ISPRS_Potsdam # for Potsdam experiments
143
- export OUTPUT_DIR=/path/to/output # optional β€” defaults to Comparison_Experiments/
144
- python train.py
145
- ```
146
-
147
- **Option B β€” edit the config directly:**
148
- Open `config.py` and change `DATA_ROOT` and `OUTPUT_DIR` near the top of the file.
149
 
150
  ---
151
 
152
- ## πŸ—‚ Data Preparation
153
-
154
- Plug-and-play support for **LoveDA** and **ISPRS Potsdam**.
155
 
156
- <details>
157
- <summary>πŸ“ <b>LoveDA Layout</b></summary>
 
 
158
 
159
- ```
160
- DATA_ROOT/
161
- β”œβ”€β”€ Train/
162
- β”‚ β”œβ”€β”€ Urban/
163
- β”‚ β”‚ β”œβ”€β”€ images_png/
164
- β”‚ β”‚ └── masks_png/
165
- β”‚ └── Rural/
166
- β”‚ β”œβ”€β”€ images_png/
167
- β”‚ └── masks_png/
168
- β”œβ”€β”€ Val/
169
- β”‚ β”œβ”€β”€ Urban/
170
- β”‚ β”‚ β”œβ”€β”€ images_png/
171
- β”‚ β”‚ └── masks_png/
172
- β”‚ └── Rural/
173
- β”‚ β”œβ”€β”€ images_png/
174
- β”‚ └── masks_png/
175
- └── Test/
176
- ```
177
 
178
- - **7 classes:** Background, Building, Road, Water, Barren, Forest, Agricultural
179
- - **Resolution:** 1024Γ—1024 (cropped to 512Γ—512 during training)
180
- - **Domains:** Urban and Rural β€” used for cross-domain evaluation
181
 
182
- </details>
183
 
184
- <details>
185
- <summary>πŸ“ <b>ISPRS Potsdam Layout</b></summary>
186
 
187
- ```
188
- DATA_ROOT/
189
- β”œβ”€β”€ Images/
190
- β”œβ”€β”€ Labels/
191
- └── splits/
192
- β”œβ”€β”€ train.txt
193
- β”œβ”€β”€ val.txt
194
- └── test.txt
195
- ```
196
 
197
- - **6 classes:** Impervious, Building, Low Vegetation, Tree, Car, Clutter
198
- - **Resolution:** 6000Γ—6000 tiles (cropped to 512Γ—512)
199
 
200
- </details>
 
 
 
 
 
 
201
 
202
- **Must-do:** Set `DATA_ROOT` in `config.py` (LoveDA) or `config_icprs.py` (Potsdam) to your local dataset path.
 
 
 
 
203
 
204
  ---
205
 
206
- ## πŸš€ Train & Evaluation
207
 
208
- YAML-free, config-driven β€” clean and reproducible.
 
209
 
210
- ### Train
211
-
212
- ```bash
213
- # LoveDA β€” pick any backbone family
214
- cd MambaVision # or VMamba/, spatial-mamba/, CNN_DeepLabv3p/, etc.
215
- # β†’ edit config.py: set DATA_ROOT, OUTPUT_DIR, and backbone variant
216
- python train.py
217
-
218
- # ISPRS Potsdam
219
- cd VMamba
220
- # β†’ edit config_icprs.py: set DATA_ROOT and OUTPUT_DIR
221
- python train.py
222
  ```
223
 
224
- Checkpoints + TensorBoard logs land in `Comparison_Experiments/<experiment_name>/`.
225
-
226
- ### Efficiency Profiling
227
 
228
  ```bash
229
- # Single model benchmark (FPS + peak VRAM)
230
- python tools/benchmark_fps_mem.py \
231
- --model mambavision --variant base --device cuda:0
232
-
233
- # Full sweep across all families
234
- python tools/benchmark_fps_mem_total.py \
235
- --device cuda:0 --batch_size 1
236
- ```
237
-
238
- ---
239
-
240
- ## πŸ”¬ Analysis Scripts
241
-
242
- Three diagnostic scripts that reproduce every analytical claim in the paper:
243
 
244
- | Script | What It Measures | What It Tells You |
245
- |---|---|---|
246
- | `analysis/boundary_analysis.py` | Boundary vs. interior mIoU under domain shift | Boundary degradation is the dominant failure mode β€” not interior misclassification |
247
- | `analysis/cross_domain_analysis.py` | U→R and R→U metrics for all families | Domain transfer asymmetry is backbone-agnostic — it's a data property |
248
- | `analysis/rotation_analysis.py` | Prediction stability under 90Β°/180Β°/270Β° rotations | Tests whether SSM scan-order introduces orientation artifacts |
249
 
250
- ```bash
251
- python analysis/boundary_analysis.py \
252
- --device cuda:0 --use_pretrained 1
253
-
254
- python analysis/cross_domain_analysis.py \
255
- --device cuda:0 --use_pretrained 1
256
-
257
- python analysis/rotation_analysis.py \
258
- --device cuda:0 --use_pretrained 1 \
259
- --pack_rotations 1 \
260
- --families mambavision,vmamba,spatialmamba
261
  ```
262
 
263
- Results land in `analysis_outputs/` as CSV files ready for plotting.
264
-
265
  ---
266
 
267
- ## πŸ“Š Results
268
-
269
- Straight from the paper β€” reproducible out of the box.
270
-
271
- Every row shares the same decoder, loss, optimizer, schedule, augmentations, and data splits. **The only variable is the encoder backbone.**
272
-
273
- | Type | Backbone | LoveDA mIoU | U→R | R→U | Potsdam mIoU |
274
- |---|---|---:|---:|---:|---:|
275
- | CNN | DeepLabv3 (controlled) | 43.01 | 30.36 | 39.98 | 75.09 |
276
- | Transformer | UNetFormer (controlled) | 48.61 | 34.56 | 44.84 | 74.99 |
277
- | **SSM** πŸ”₯ | **VMamba-Small** | **55.66** | **40.62** | 53.52 | **77.59** |
278
- | **SSM** πŸ”₯ | **MambaVision-L** | 55.25 | 38.53 | **54.01** | 77.07 |
279
- | SSM | Spatial-Mamba-B | 48.03 | 35.23 | 46.55 | 70.00 |
280
-
281
- > πŸ† **VMamba-Small. 55.66 mIoU. +7.05 over the best Transformer. +12.65 over the best CNN. Same decoder. Same training. No tricks.**
282
-
283
- ### Accuracy vs. Throughput
284
-
285
- <p align="center">
286
- <img src="IGARSS%202026/fps_vs_miou.png" alt="mIoU vs Inference Throughput" width="60%">
287
- </p>
288
- <p align="center"><i>mIoU (%) vs. inference throughput (FPS) for all SSM variants. VMamba holds near-peak accuracy across all sizes. MambaVision trades speed for capacity with diminishing returns. Spatial-Mamba sits in the lower tier.</i></p>
289
-
290
- ### Key Takeaways
291
 
292
- πŸ”₯ **SSMs dominate the fair fight.** VMamba-Small beats UNetFormer by +7.05 and DeepLabv3 by +12.65 on LoveDA β€” under identical conditions. This is the backbone, not the pipeline.
293
-
294
- πŸ“ **Bigger β‰  better under a fixed decoder.** MambaVision-L carries far more parameters than VMamba-Small yet scores 55.25 vs. 55.66. Scaling the encoder past a threshold buys nothing when the decoder stays constant.
295
-
296
- πŸ”„ **Domain transfer is asymmetric β€” and backbone-agnostic.** Ruralβ†’Urban outperforms Urbanβ†’Rural by 10–15 points across every family. VMamba-Small: 53.52 Rβ†’U vs. 40.62 Uβ†’R. This is a data distribution property, not a model property.
297
-
298
- 🧱 **Boundaries are the unsolved failure mode.** Under domain shift, interior accuracy holds. Boundary accuracy collapses. Every backbone, every family, same story. Whoever cracks boundary sensitivity under distribution shift wins the next round.
299
-
300
- ### Qualitative Results β€” LoveDA
301
-
302
- <p align="center">
303
- <img src="IGARSS%202026/loveda_qualitative_detailed_enhanced.png" alt="LoveDA Qualitative Results" width="85%">
304
- </p>
305
- <p align="center"><i>Predictions + error maps (magenta = false positive, dark green = false negative) on LoveDA Urban and Rural scenes. VMamba-S and VMamba-B produce the cleanest boundaries; Spatial-Mamba-B shows the most false positives at class transitions.</i></p>
306
-
307
- ### Qualitative Results β€” ISPRS Potsdam
308
-
309
- <p align="center">
310
- <img src="IGARSS%202026/potsdam_qualitative_detailed_enhanced.png" alt="ISPRS Potsdam Qualitative Results" width="85%">
311
- </p>
312
- <p align="center"><i>Predictions + error maps on ISPRS Potsdam. All SSM variants handle large homogeneous regions well; errors concentrate at fine-grained boundaries (cars, narrow roads) β€” consistent with the boundary analysis findings.</i></p>
313
-
314
- ---
315
-
316
- ## 🧬 Backbone Overview
317
-
318
- | Backbone | Architecture | Key Idea | RS Segmentation Impact |
319
- |---|---|---|---|
320
- | **VMamba** | Cross-scan 2D selective SSM | Global spatial context with linear complexity via multi-directional scanning | πŸ₯‡ Top performer: 55.66 LoveDA mIoU, strongest domain transfer |
321
- | **MambaVision** | Hybrid Mamba + self-attention | Interleaves Mamba blocks (early stages) with attention (late stages) | Matches VMamba on Potsdam, but extra capacity doesn't help on LoveDA |
322
- | **Spatial-Mamba** | Spatially-aware SSM | Explicit positional inductive biases in the state-space pathway | Beats CNN baseline, but scan-order alone insufficient without global modeling |
323
- | **DeepLabv3+** | CNN (ResNet-50) | Atrous convolutions + ASPP for multi-scale context | Controlled CNN reference β€” 43.01 mIoU baseline |
324
- | **UNetFormer** | Transformer (ResNet-18) | Efficient self-attention decoder for dense prediction | Controlled Transformer reference β€” 48.61 mIoU baseline |
325
-
326
- ---
327
-
328
- ## πŸ™ Acknowledgements
329
-
330
- This work builds on prior advances in visual state-space models and remote-sensing segmentation. We gratefully acknowledge:
331
-
332
- - **[VMamba](https://github.com/MzeroMiko/VMamba)** β€” Visual State Space Model backbone
333
- - **[MambaVision](https://github.com/NVlabs/MambaVision)** β€” NVIDIA's hybrid Mamba-Transformer architecture
334
- - **[Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba)** β€” Spatially-aware Mamba variant
335
- - **[LoveDA](https://github.com/Junjue-Wang/LoveDA)** and **[ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/)** dataset creators
336
-
337
- ---
338
-
339
- ## πŸ“œ Citation
340
-
341
- If Mamba-Segmentation fuels your research, please cite:
342
 
343
  ```bibtex
344
  @article{wasalathilaka2026controlledbenchmark,
345
  title={A Controlled Benchmark of Visual State-Space Backbones with
346
- Domain-Shift and Boundary Analysis for Remote-Sensing
347
- Segmentation},
348
- author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon,
349
- Oshadha and Wijenayake, Buddhi and Godaliyadda, Roshan and
350
- Herath, Vijitha and Ekanayake, Parakrama},
351
- journal={IGRAAS 2026},
352
  year={2026}
353
  }
354
  ```
355
 
356
  ---
357
 
358
- πŸŒπŸ›°οΈ Built at the **University of Peradeniya**. Got inspired? Give us a ⭐
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - remote-sensing
7
+ - semantic-segmentation
8
+ - mamba
9
+ - state-space-model
10
+ - vmamba
11
+ - mambavision
12
+ - spatial-mamba
13
+ - pytorch
14
+ - benchmark
15
+ - loveda
16
+ - isprs-potsdam
17
+ - domain-adaptation
18
+ datasets:
19
+ - LoveDA
20
+ - ISPRS-Potsdam
21
+ pipeline_tag: image-segmentation
22
  ---
23
 
24
+ # Mamba-Segmentation
25
 
26
+ **Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
 
 
 
 
 
 
 
 
 
 
27
 
28
+ > *Accepted at IGARSS 2026*
29
 
30
+ One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder β€” so the results finally mean something.
 
 
 
31
 
32
  ---
33
 
34
+ ## What Is This?
35
 
36
+ Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better.
37
 
38
+ This repo fixes that. **One shared pipeline β€” swap the backbone β€” read the truth.**
39
 
40
+ | Component | Status |
41
  |---|---|
42
  | Encoder backbone | πŸ”€ **Swapped** per experiment β€” the ONLY variable |
43
+ | Decoder | πŸ”’ Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
44
+ | Loss | πŸ”’ Fixed (LovΓ‘sz-Softmax + Focal + Boundary) |
45
+ | Training schedule | πŸ”’ Fixed (50k iters, AdamW, poly LR decay) |
46
+ | Augmentations | πŸ”’ Fixed (random crop, flip, colour jitter) |
47
  | Input resolution | πŸ”’ Fixed (512Γ—512) |
48
  | Feature interface | πŸ”’ Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
49
 
 
 
50
  ---
51
 
52
+ ## Checkpoints in This Repository
 
 
 
 
 
 
 
 
 
 
 
53
 
54
+ All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure.
55
 
56
+ ### LoveDA Experiments β€” `Comparison_Experiments/`
 
 
 
 
 
 
57
 
58
+ #### MambaVision (NVIDIA hybrid Mamba-Transformer)
59
+ | Checkpoint path | Training split |
60
+ |---|---|
61
+ | `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All |
62
+ | `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
63
+ | `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
64
+ | `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) |
65
+ | `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) |
66
+ | `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) |
67
+ | `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All |
68
+ | `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
69
+ | `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
70
+ | `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All |
71
+ | `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
72
+ | `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
73
+ | `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All |
74
+ | `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
75
+ | `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
76
+ | `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All |
77
+ | `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
78
+ | `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
79
+
80
+ #### VMamba (cross-scan 2D selective SSM)
81
+ | Checkpoint path | Training split |
82
+ |---|---|
83
+ | `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All |
84
+ | `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
85
+ | `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
86
+ | `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All |
87
+ | `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) |
88
+ | `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) |
89
+ | `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
90
+ | `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
91
+ | `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All |
92
+ | `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
93
+ | `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
94
+
95
+ #### VisionMamba / Vim (bidirectional Mamba)
96
+ | Checkpoint path | Training split |
97
+ |---|---|
98
+ | `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All |
99
+ | `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
100
+ | `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
101
+ | `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All |
102
+ | `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
103
+ | `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
104
+ | `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All |
105
+ | `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
106
+ | `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
107
+
108
+ #### Spatial-Mamba (spatially-aware SSM)
109
+ | Checkpoint path | Training split |
110
+ |---|---|
111
+ | `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All |
112
+ | `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
113
+ | `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
114
+ | `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All |
115
+ | `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
116
+ | `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
117
+ | `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All |
118
+ | `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
119
+ | `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
120
+
121
+ #### CNN & Transformer Baselines
122
+ | Checkpoint path | Model |
123
+ |---|---|
124
+ | `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All |
125
+ | `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban |
126
+ | `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural |
127
+ | `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All |
128
+ | `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All |
129
+ | `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban |
130
+ | `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural |
131
 
132
  ---
133
 
134
+ ### ISPRS Potsdam Experiments β€” `Comparison_Experiments_ICPRS_potsdam/`
 
 
135
 
136
+ | Checkpoint path | Model |
137
+ |---|---|
138
+ | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny |
139
+ | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 |
140
+ | `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small |
141
+ | `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base |
142
+ | `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large |
143
+ | `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 |
144
+ | `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny |
145
+ | `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small |
146
+ | `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base |
147
+ | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny |
148
+ | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small |
149
+ | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base |
150
+ | `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 |
151
+ | `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
152
 
153
  ---
154
 
155
+ ### ImageNet Backbone Weights β€” `weights/imagenet/`
 
 
156
 
157
+ | File | Description |
158
+ |---|---|
159
+ | `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained |
160
+ | `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained |
161
 
162
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
 
164
+ ## Results Summary
 
 
165
 
166
+ Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.**
167
 
168
+ ### LoveDA
 
169
 
170
+ | Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) |
171
+ |---|---:|---:|---:|
172
+ | DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 |
173
+ | UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 |
174
+ | VMamba-Small **πŸ₯‡** | **55.66** | **40.62** | 53.52 |
175
+ | MambaVision-Large | 55.25 | 38.53 | **54.01** |
176
+ | Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 |
 
 
177
 
178
+ ### ISPRS Potsdam
 
179
 
180
+ | Backbone | mIoU |
181
+ |---|---:|
182
+ | DeepLabv3+ ResNet-50 | 75.09 |
183
+ | UNetFormer ResNet-18 | 74.99 |
184
+ | VMamba-Small **πŸ₯‡** | **77.59** |
185
+ | MambaVision-Large | 77.07 |
186
+ | Spatial-Mamba-Base | 70.00 |
187
 
188
+ **Key findings:**
189
+ - SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA).
190
+ - Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder.
191
+ - Domain transfer is asymmetric across all backbone families (Ruralβ†’Urban consistently outperforms Urbanβ†’Rural by 10–15 points) β€” a data distribution property, not a model property.
192
+ - Boundary accuracy collapses under domain shift while interior accuracy holds β€” every backbone, every family.
193
 
194
  ---
195
 
196
+ ## How to Load a Checkpoint
197
 
198
+ ```python
199
+ import torch
200
 
201
+ # Example: load MambaVision-Base best checkpoint for LoveDA All→All
202
+ ckpt = torch.load(
203
+ "Comparison_Experiments/mambavision_base_512/checkpoints/best.pth",
204
+ map_location="cpu"
205
+ )
206
+ # keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score'
207
+ model_state = ckpt["model"]
 
 
 
 
 
208
  ```
209
 
210
+ To build the full model and run inference, clone the code repository and follow the setup instructions:
 
 
211
 
212
  ```bash
213
+ git clone https://github.com/dineth18/Mamba-Segmentation
214
+ cd Mamba-Segmentation/MambaVision # or VMamba/, spatial-mamba/, etc.
215
+ pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
216
 
217
+ # Set your dataset path (no need to edit config files)
218
+ export LOVEDA_ROOT=/path/to/LoveDA
219
+ export POTSDAM_ROOT=/path/to/ISPRS_Potsdam
 
 
220
 
221
+ python eval.py --checkpoint path/to/best.pth
 
 
 
 
 
 
 
 
 
 
222
  ```
223
 
 
 
224
  ---
225
 
226
+ ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
227
 
228
+ If this benchmark is useful for your research, please cite:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
229
 
230
  ```bibtex
231
  @article{wasalathilaka2026controlledbenchmark,
232
  title={A Controlled Benchmark of Visual State-Space Backbones with
233
+ Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation},
234
+ author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha
235
+ and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha
236
+ and Ekanayake, Parakrama},
237
+ journal={IGARSS 2026},
 
238
  year={2026}
239
  }
240
  ```
241
 
242
  ---
243
 
244
+ ## Acknowledgements
245
+
246
+ - [VMamba](https://github.com/MzeroMiko/VMamba) β€” Visual State Space Model
247
+ - [MambaVision](https://github.com/NVlabs/MambaVision) β€” NVIDIA hybrid Mamba-Transformer
248
+ - [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) β€” Spatially-aware Mamba
249
+ - [LoveDA](https://github.com/Junjue-Wang/LoveDA) β€” Land-cover domain adaptation dataset
250
+ - [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) β€” Urban semantic labeling benchmark
251
+
252
+ Built at the **University of Peradeniya**.