dineth18 commited on
Commit
4e30fe8
Β·
verified Β·
1 Parent(s): d201d19

Add HF badge and update weights section with link

Browse files
Files changed (1) hide show
  1. README.md +296 -186
README.md CHANGED
@@ -1,248 +1,358 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- license: mit
3
- language:
4
- - en
5
- tags:
6
- - remote-sensing
7
- - semantic-segmentation
8
- - mamba
9
- - state-space-model
10
- - vmamba
11
- - mambavision
12
- - spatial-mamba
13
- - pytorch
14
- - benchmark
15
- - loveda
16
- - isprs-potsdam
17
- - domain-adaptation
18
- datasets:
19
- - LoveDA
20
- - ISPRS-Potsdam
21
- pipeline_tag: image-segmentation
22
  ---
23
 
24
- # Mamba-Segmentation
25
 
26
- **Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
 
 
 
 
27
 
28
- > *Accepted at IGARSS 2026*
 
 
 
 
29
 
30
- One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder β€” so the results finally mean something.
 
 
 
 
 
31
 
32
  ---
33
 
34
- ## What Is This?
35
 
36
- Remote-sensing segmentation papers routinely change the backbone *and* the decoder *and* the loss *and* the training schedule all at once. The numbers tell you who tuned harder, not which backbone is better.
37
 
38
- This repo fixes that. **One shared pipeline β€” swap the backbone β€” read the truth.**
39
 
40
- | Component | Status |
41
  |---|---|
42
  | Encoder backbone | πŸ”€ **Swapped** per experiment β€” the ONLY variable |
43
- | Decoder | πŸ”’ Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
44
- | Loss | πŸ”’ Fixed (LovΓ‘sz-Softmax + Focal + Boundary) |
45
- | Training schedule | πŸ”’ Fixed (50k iters, AdamW, poly LR decay) |
46
- | Augmentations | πŸ”’ Fixed (random crop, flip, colour jitter) |
47
  | Input resolution | πŸ”’ Fixed (512Γ—512) |
48
  | Feature interface | πŸ”’ Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
49
 
 
 
50
  ---
51
 
52
- ## Checkpoints in This Repository
53
 
54
- All checkpoints are `best.pth` files (highest validation mIoU during training) stored with their original directory structure.
 
 
 
 
 
 
 
 
 
55
 
56
- ### LoveDA Experiments β€” `Comparison_Experiments/`
57
 
58
- #### MambaVision (NVIDIA hybrid Mamba-Transformer)
59
- | Checkpoint path | Training split |
60
- |---|---|
61
- | `Comparison_Experiments/mambavision_tiny_512/checkpoints/best.pth` | All→All |
62
- | `Comparison_Experiments/mambavision_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
63
- | `Comparison_Experiments/mambavision_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
64
- | `Comparison_Experiments/mambavision_tiny2_512/checkpoints/best.pth` | All→All (v2) |
65
- | `Comparison_Experiments/mambavision_tiny2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban (v2) |
66
- | `Comparison_Experiments/mambavision_tiny2_urbantrain_512/checkpoints/best.pth` | Urban→Rural (v2) |
67
- | `Comparison_Experiments/mambavision_small_512/checkpoints/best.pth` | All→All |
68
- | `Comparison_Experiments/mambavision_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
69
- | `Comparison_Experiments/mambavision_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
70
- | `Comparison_Experiments/mambavision_base_512/checkpoints/best.pth` | All→All |
71
- | `Comparison_Experiments/mambavision_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
72
- | `Comparison_Experiments/mambavision_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
73
- | `Comparison_Experiments/mambavision_large_512/checkpoints/best.pth` | All→All |
74
- | `Comparison_Experiments/mambavision_large_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
75
- | `Comparison_Experiments/mambavision_large_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
76
- | `Comparison_Experiments/mambavision_large2_512/checkpoints/best.pth` | All→All |
77
- | `Comparison_Experiments/mambavision_large2_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
78
- | `Comparison_Experiments/mambavision_large2_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
79
-
80
- #### VMamba (cross-scan 2D selective SSM)
81
- | Checkpoint path | Training split |
82
- |---|---|
83
- | `Comparison_Experiments/Vmamb_tiny_512/checkpoints/best.pth` | All→All |
84
- | `Comparison_Experiments/vmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
85
- | `Comparison_Experiments/vmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
86
- | `Comparison_Experiments/Vmamb_small_512/checkpoints/best.pth` | All→All |
87
- | `Comparison_Experiments/Vmamb_small_512_2/checkpoints/best.pth` | All→All (run 2) |
88
- | `Comparison_Experiments/Vmamb_small_512_3/checkpoints/best.pth` | All→All (run 3) |
89
- | `Comparison_Experiments/vmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
90
- | `Comparison_Experiments/vmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
91
- | `Comparison_Experiments/Vmamb_base_512/checkpoints/best.pth` | All→All |
92
- | `Comparison_Experiments/vmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
93
- | `Comparison_Experiments/vmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
94
-
95
- #### VisionMamba / Vim (bidirectional Mamba)
96
- | Checkpoint path | Training split |
97
- |---|---|
98
- | `Comparison_Experiments/VisionMamba_tiny_512/checkpoints/best.pth` | All→All |
99
- | `Comparison_Experiments/visionmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
100
- | `Comparison_Experiments/visionmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
101
- | `Comparison_Experiments/VisionMamba_small_512/checkpoints/best.pth` | All→All |
102
- | `Comparison_Experiments/visionmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
103
- | `Comparison_Experiments/visionmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
104
- | `Comparison_Experiments/VisionMamba_base_512/checkpoints/best.pth` | All→All |
105
- | `Comparison_Experiments/visionmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
106
- | `Comparison_Experiments/visionmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
107
-
108
- #### Spatial-Mamba (spatially-aware SSM)
109
- | Checkpoint path | Training split |
110
- |---|---|
111
- | `Comparison_Experiments/spatialmamba_tiny_512/checkpoints/best.pth` | All→All |
112
- | `Comparison_Experiments/spatialmamba_tiny_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
113
- | `Comparison_Experiments/spatialmamba_tiny_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
114
- | `Comparison_Experiments/spatialmamba_small_512/checkpoints/best.pth` | All→All |
115
- | `Comparison_Experiments/spatialmamba_small_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
116
- | `Comparison_Experiments/spatialmamba_small_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
117
- | `Comparison_Experiments/spatialmamba_base_512/checkpoints/best.pth` | All→All |
118
- | `Comparison_Experiments/spatialmamba_base_ruraltrain_512/checkpoints/best.pth` | Rural→Urban |
119
- | `Comparison_Experiments/spatialmamba_base_urbantrain_512/checkpoints/best.pth` | Urban→Rural |
120
-
121
- #### CNN & Transformer Baselines
122
- | Checkpoint path | Model |
123
- |---|---|
124
- | `Comparison_Experiments/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, All→All |
125
- | `Comparison_Experiments/cnn_deeplabv3p_resnet50_ruraltrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Rural→Urban |
126
- | `Comparison_Experiments/cnn_deeplabv3p_resnet50_urbantrain_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50, Urban→Rural |
127
- | `Comparison_Experiments/cnn_unet_r50_512/checkpoints/best.pth` | U-Net ResNet-50, All→All |
128
- | `Comparison_Experiments/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18, All→All |
129
- | `Comparison_Experiments/transformerunetformer_resnet18_ruraltrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Rural→Urban |
130
- | `Comparison_Experiments/transformerunetformer_resnet18_urbantrain_512/checkpoints/best.pth` | UNetFormer ResNet-18, Urban→Rural |
131
 
132
  ---
133
 
134
- ### ISPRS Potsdam Experiments β€” `Comparison_Experiments_ICPRS_potsdam/`
135
 
136
- | Checkpoint path | Model |
137
- |---|---|
138
- | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny_512/checkpoints/best.pth` | MambaVision-Tiny |
139
- | `Comparison_Experiments_ICPRS_potsdam/mambavision_tiny2_512/checkpoints/best.pth` | MambaVision-Tiny2 |
140
- | `Comparison_Experiments_ICPRS_potsdam/mambavision_small_512/checkpoints/best.pth` | MambaVision-Small |
141
- | `Comparison_Experiments_ICPRS_potsdam/mambavision_base_512/checkpoints/best.pth` | MambaVision-Base |
142
- | `Comparison_Experiments_ICPRS_potsdam/mambavision_large_512/checkpoints/best.pth` | MambaVision-Large |
143
- | `Comparison_Experiments_ICPRS_potsdam/mambavision_large2_512/checkpoints/best.pth` | MambaVision-Large2 |
144
- | `Comparison_Experiments_ICPRS_potsdam/vmamba_tiny_512/checkpoints/best.pth` | VMamba-Tiny |
145
- | `Comparison_Experiments_ICPRS_potsdam/vmamba_small_512/checkpoints/best.pth` | VMamba-Small |
146
- | `Comparison_Experiments_ICPRS_potsdam/vmamba_base_512/checkpoints/best.pth` | VMamba-Base |
147
- | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_tiny_512/checkpoints/best.pth` | Spatial-Mamba-Tiny |
148
- | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_small_512/checkpoints/best.pth` | Spatial-Mamba-Small |
149
- | `Comparison_Experiments_ICPRS_potsdam/spatialmamba_base_512/checkpoints/best.pth` | Spatial-Mamba-Base |
150
- | `Comparison_Experiments_ICPRS_potsdam/cnn_deeplabv3p_r50_512/checkpoints/best.pth` | DeepLabv3+ ResNet-50 |
151
- | `Comparison_Experiments_ICPRS_potsdam/transformer_unetformer_r18_512/checkpoints/best.pth` | UNetFormer ResNet-18 |
152
 
153
- ---
 
 
154
 
155
- ### ImageNet Backbone Weights β€” `weights/imagenet/`
 
156
 
157
- | File | Description |
158
- |---|---|
159
- | `weights/imagenet/resnet50-11ad3fa6.pth` | ResNet-50 ImageNet-1K pretrained |
160
- | `weights/imagenet/resnet18-f37072fd.pth` | ResNet-18 ImageNet-1K pretrained |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
161
 
162
  ---
163
 
164
- ## Results Summary
165
 
166
- Every row shares the same decoder, loss, optimizer, schedule, and data splits. **The only variable is the encoder.**
167
 
168
- ### LoveDA
 
169
 
170
- | Backbone | mIoU (All→All) | mIoU (U→R) | mIoU (R→U) |
171
- |---|---:|---:|---:|
172
- | DeepLabv3+ ResNet-50 (CNN) | 43.01 | 30.36 | 39.98 |
173
- | UNetFormer ResNet-18 (Transformer) | 48.61 | 34.56 | 44.84 |
174
- | VMamba-Small **πŸ₯‡** | **55.66** | **40.62** | 53.52 |
175
- | MambaVision-Large | 55.25 | 38.53 | **54.01** |
176
- | Spatial-Mamba-Base | 48.03 | 35.23 | 46.55 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
177
 
178
- ### ISPRS Potsdam
 
179
 
180
- | Backbone | mIoU |
181
- |---|---:|
182
- | DeepLabv3+ ResNet-50 | 75.09 |
183
- | UNetFormer ResNet-18 | 74.99 |
184
- | VMamba-Small **πŸ₯‡** | **77.59** |
185
- | MambaVision-Large | 77.07 |
186
- | Spatial-Mamba-Base | 70.00 |
 
 
187
 
188
- **Key findings:**
189
- - SSMs outperform CNNs and Transformers by a significant margin under identical conditions (+7–12 mIoU on LoveDA).
190
- - Scaling the encoder past VMamba-Small yields diminishing returns under a fixed decoder.
191
- - Domain transfer is asymmetric across all backbone families (Ruralβ†’Urban consistently outperforms Urbanβ†’Rural by 10–15 points) β€” a data distribution property, not a model property.
192
- - Boundary accuracy collapses under domain shift while interior accuracy holds β€” every backbone, every family.
 
193
 
194
  ---
195
 
196
- ## How to Load a Checkpoint
 
 
197
 
198
- ```python
199
- import torch
200
 
201
- # Example: load MambaVision-Base best checkpoint for LoveDA All→All
202
- ckpt = torch.load(
203
- "Comparison_Experiments/mambavision_base_512/checkpoints/best.pth",
204
- map_location="cpu"
205
- )
206
- # keys: 'model', 'optimizer', 'scheduler', 'iter', 'best_score'
207
- model_state = ckpt["model"]
 
 
 
208
  ```
209
 
210
- To build the full model and run inference, clone the code repository and follow the setup instructions there:
 
 
211
 
212
  ```bash
213
- git clone https://github.com/dineth18/Mamba-Segmentation
214
- cd Mamba-Segmentation/MambaVision # or VMamba/, spatial-mamba/, etc.
215
- pip install -r requirements.txt
216
- # edit config.py β†’ set DATA_ROOT and backbone variant
217
- python eval.py --checkpoint path/to/best.pth
 
 
218
  ```
219
 
220
  ---
221
 
222
- ## Citation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
223
 
224
- If this benchmark is useful for your research, please cite:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
225
 
226
  ```bibtex
227
  @article{wasalathilaka2026controlledbenchmark,
228
  title={A Controlled Benchmark of Visual State-Space Backbones with
229
- Domain-Shift and Boundary Analysis for Remote-Sensing Segmentation},
230
- author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon, Oshadha
231
- and Wijenayake, Buddhi and Godaliyadda, Roshan and Herath, Vijitha
232
- and Ekanayake, Parakrama},
233
- journal={IGARSS 2026},
 
234
  year={2026}
235
  }
236
  ```
237
 
238
  ---
239
 
240
- ## Acknowledgements
241
-
242
- - [VMamba](https://github.com/MzeroMiko/VMamba) β€” Visual State Space Model
243
- - [MambaVision](https://github.com/NVlabs/MambaVision) β€” NVIDIA hybrid Mamba-Transformer
244
- - [Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba) β€” Spatially-aware Mamba
245
- - [LoveDA](https://github.com/Junjue-Wang/LoveDA) β€” Land-cover domain adaptation dataset
246
- - [ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/) β€” Urban semantic labeling benchmark
247
-
248
- Built at the **University of Peradeniya**.
 
1
+ # πŸš€ Mamba-Segmentation
2
+
3
+ **Controlled Visual State-Space Backbone Benchmark with Domain-Shift & Boundary Analysis for Remote-Sensing Segmentation**
4
+
5
+ ### πŸ† The First Fair-Fight Benchmark for SSM vs. CNN vs. Transformer Backbones in Remote Sensing πŸ†
6
+
7
+ [![πŸ† Venue](https://img.shields.io/badge/πŸ†_IGRAAS_2026-Accepted-brightgreen)](https://2026.ieeeigarss.org/)
8
+ [![🐍 Python](https://img.shields.io/badge/🐍_Python-3.9-3776AB)](https://www.python.org/)
9
+ [![πŸ”₯ PyTorch](https://img.shields.io/badge/πŸ”₯_PyTorch-2.0+-EE4C2C)](https://pytorch.org/)
10
+ [![License](https://img.shields.io/badge/License-MIT-yellow)](LICENSE)
11
+ [![πŸ€— Weights](https://img.shields.io/badge/πŸ€—_Weights-Hugging_Face-yellow)](https://huggingface.co/dineth18/Mamba-Segmentation)
12
+
13
+ One pipeline. One decoder. One loss. One schedule. **Five backbone families.** The only variable is the encoder β€” so the results finally mean something. SSMs dominate, scaling plateaus early, domain transfer is asymmetric, and boundaries are where every model breaks.
14
+
15
+ Ready to see which backbone actually wins a fair fight? Let's go.
16
+
17
  ---
18
+
19
+ [πŸ”­ Overview](#-overview) β€’ [✨ Why Controlled?](#-why-controlled-benchmarking-matters) β€’ [🧠 Pipeline](#-the-controlled-pipeline) β€’ [⚑ Quick Start](#-quick-start) β€’ [πŸ—‚ Data](#-data-preparation) β€’ [πŸš€ Train & Eval](#-train--evaluation) β€’ [πŸ”¬ Analysis](#-analysis-scripts) β€’ [πŸ“Š Results](#-results) β€’ [πŸ™ Acknowledgements](#-acknowledgements) β€’ [πŸ“œ Cite](#-citation)
20
+
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
22
 
 
23
 
24
+ ## πŸ”­ Overview
25
+
26
+ Remote-sensing segmentation benchmarks have a fatal flaw: they change the backbone **and** the decoder **and** the loss **and** the schedule **and** the augmentations β€” all at once. The resulting numbers tell you who tuned harder, not which backbone is better.
27
+
28
+ **Mamba-Segmentation fixes this:**
29
 
30
+ - **Fixed lightweight U-Net decoder** β†’ identical decoder across all experiments
31
+ - **Fixed TriBraid loss** (LovΓ‘sz + Focal + Boundary) β†’ same optimization objective for every backbone
32
+ - **Fixed training protocol** β†’ 50k iterations, AdamW, poly LR, 512Γ—512 crops, same augmentations
33
+ - **Standardized feature interface** β†’ {F1, F2, F3, F4} at strides {4, 8, 16, 32}
34
+ - **Five backbone families** β†’ VMamba, MambaVision, Spatial-Mamba, CNN (DeepLabv3), Transformer (UNetFormer)
35
 
36
+ **Outcome:** differences in results reflect backbone behavior. Nothing else.
37
+
38
+ <p align="center">
39
+ <img src="IGARSS%202026/Architecture.png" alt="Controlled Pipeline Architecture" width="100%">
40
+ </p>
41
+ <p align="center"><i>Lock the pipeline. Swap the backbone. Read the truth. Three SSM families (Spatial-Mamba, MambaVision, VMamba) share a single U-Net decoder and standardized feature interface {F1–F4}.</i></p>
42
 
43
  ---
44
 
45
+ ## ✨ Why Controlled Benchmarking Matters
46
 
47
+ Every backbone paper ships its own decoder, its own training recipe, its own augmentation policy. You compare "Method A" to "Method B" β€” but you're really comparing two *entire pipelines*.
48
 
49
+ Mamba-Segmentation isolates the **one variable that matters:**
50
 
51
+ | What | Status |
52
  |---|---|
53
  | Encoder backbone | πŸ”€ **Swapped** per experiment β€” the ONLY variable |
54
+ | Decoder architecture | πŸ”’ Fixed (lightweight U-Net, 256ch, MambaBlock2d) |
55
+ | Loss function | πŸ”’ Fixed (LovΓ‘sz-Softmax + Focal + Boundary) |
56
+ | Training schedule | πŸ”’ Fixed (50k iters, AdamW, poly decay) |
57
+ | Augmentations | πŸ”’ Fixed (random crop, flip, color jitter) |
58
  | Input resolution | πŸ”’ Fixed (512Γ—512) |
59
  | Feature interface | πŸ”’ Fixed ({F1–F4} at strides {4, 8, 16, 32}) |
60
 
61
+ When the results differ, you know *exactly* why.
62
+
63
  ---
64
 
65
+ ## 🧠 The Controlled Pipeline
66
 
67
+ ```
68
+ Encoder: swapped per experiment β€” the ONLY variable
69
+ Decoder: fixed lightweight U-Net (256ch, MambaBlock2d, addition skips)
70
+ Interface: {F1, F2, F3, F4} at strides {4, 8, 16, 32}
71
+ Training: 50k iters Β· AdamW Β· poly LR decay Β· 512Γ—512 crops Β· fixed augmentations
72
+ Loss: L = L_lovΓ‘sz + L_focal + 0.5 Γ— L_boundary
73
+ β”œβ”€ LovΓ‘sz-Softmax β†’ direct IoU optimization
74
+ β”œβ”€ Focal (Ξ³=2.0) β†’ class imbalance handling
75
+ └─ Boundary (2px) β†’ edge penalty with warmup
76
+ ```
77
 
78
+ **Backbone families tested:**
79
 
80
+ | Family | Backbones | Type |
81
+ |---|---|---|
82
+ | **VMamba** | Tiny, Small, Base | SSM β€” cross-scan 2D selective state-space |
83
+ | **MambaVision** | Tiny, Small, Base, Large, Large2 | SSM/Hybrid β€” Mamba + self-attention |
84
+ | **Spatial-Mamba** | Tiny, Small, Base | SSM β€” spatially-aware scanning |
85
+ | **DeepLabv3+** | ResNet-50 | CNN baseline |
86
+ | **UNetFormer** | ResNet-18 | Transformer baseline |
87
+
88
+ **Datasets:**
89
+ - **LoveDA** → All→All, Urban→Rural, Rural→Urban (source-only, zero adaptation)
90
+ - **ISPRS Potsdam** β†’ high-resolution urban parsing (6-class)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
91
 
92
  ---
93
 
94
+ ## ⚑ Quick Start
95
 
96
+ ### 1. Clone & Install
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
97
 
98
+ ```bash
99
+ git clone https://github.com/YOUR_USERNAME/Mamba-Segmentation
100
+ cd Mamba-Segmentation
101
 
102
+ conda create -n mamba-seg python=3.9 -y
103
+ conda activate mamba-seg
104
 
105
+ cd MambaVision && pip install -r requirements.txt
106
+ ```
107
+
108
+ ### 2. Grab Pre-trained Backbone Weights
109
+
110
+ > πŸ€— **All trained segmentation checkpoints are available on [Hugging Face](https://huggingface.co/dineth18/Mamba-Segmentation).** Download `best.pth` for any model directly from there.
111
+
112
+ | Backbone | Source | Location |
113
+ |---|---|---|
114
+ | VMamba (Tiny/Small/Base) | [VMamba repo](https://github.com/MzeroMiko/VMamba) | `VMamba/Vmamba_weights/ImageNet-1K/` |
115
+ | MambaVision (Tiny→Large2) | [NVIDIA MambaVision](https://github.com/NVlabs/MambaVision) | `MambaVision/weights/1k/` |
116
+ | Spatial-Mamba (Tiny/Small/Base) | [Spatial-Mamba repo](https://github.com/EdwardChaworworrachat/SpatialMamba) | `spatial-mamba/weights/imageNet1K/` |
117
+ | ResNet-50 / ResNet-18 | [torchvision](https://pytorch.org/vision/stable/models.html) | `weights/imagenet/` |
118
+
119
+ Set the weights path in each backbone's `config.py` β€” that's it.
120
+
121
+ ### 3. Configure Your Experiment
122
+
123
+ Each backbone family has its own directory with a standardized interface:
124
+
125
+ ```
126
+ <ModelFamily>/
127
+ β”œβ”€β”€ config.py # ← edit DATA_ROOT / OUTPUT_DIR, or set env vars
128
+ β”œβ”€β”€ config_icprs.py # ← for ISPRS Potsdam experiments
129
+ β”œβ”€β”€ train.py # ← same training loop across all families
130
+ β”œβ”€β”€ model.py
131
+ β”œβ”€β”€ encoders.py
132
+ β”œβ”€β”€ light_decoder.py # ← THE fixed decoder (identical everywhere)
133
+ β”œβ”€β”€ losses.py # ← THE fixed loss (identical everywhere)
134
+ └── utils.py
135
+ ```
136
+
137
+ **Path configuration** β€” two approaches:
138
+
139
+ **Option A β€” environment variables (recommended):**
140
+ ```bash
141
+ export LOVEDA_ROOT=/path/to/LoveDA # for LoveDA experiments
142
+ export POTSDAM_ROOT=/path/to/ISPRS_Potsdam # for Potsdam experiments
143
+ export OUTPUT_DIR=/path/to/output # optional β€” defaults to Comparison_Experiments/
144
+ python train.py
145
+ ```
146
+
147
+ **Option B β€” edit the config directly:**
148
+ Open `config.py` and change `DATA_ROOT` and `OUTPUT_DIR` near the top of the file.
149
 
150
  ---
151
 
152
+ ## πŸ—‚ Data Preparation
153
 
154
+ Plug-and-play support for **LoveDA** and **ISPRS Potsdam**.
155
 
156
+ <details>
157
+ <summary>πŸ“ <b>LoveDA Layout</b></summary>
158
 
159
+ ```
160
+ DATA_ROOT/
161
+ β”œβ”€β”€ Train/
162
+ β”‚ β”œβ”€β”€ Urban/
163
+ β”‚ β”‚ β”œβ”€β”€ images_png/
164
+ β”‚ β”‚ └── masks_png/
165
+ β”‚ └── Rural/
166
+ β”‚ β”œβ”€β”€ images_png/
167
+ β”‚ └── masks_png/
168
+ β”œβ”€β”€ Val/
169
+ β”‚ β”œβ”€β”€ Urban/
170
+ β”‚ β”‚ β”œβ”€β”€ images_png/
171
+ β”‚ β”‚ └── masks_png/
172
+ β”‚ └── Rural/
173
+ β”‚ β”œβ”€β”€ images_png/
174
+ β”‚ └── masks_png/
175
+ └── Test/
176
+ ```
177
+
178
+ - **7 classes:** Background, Building, Road, Water, Barren, Forest, Agricultural
179
+ - **Resolution:** 1024Γ—1024 (cropped to 512Γ—512 during training)
180
+ - **Domains:** Urban and Rural β€” used for cross-domain evaluation
181
+
182
+ </details>
183
 
184
+ <details>
185
+ <summary>πŸ“ <b>ISPRS Potsdam Layout</b></summary>
186
 
187
+ ```
188
+ DATA_ROOT/
189
+ β”œβ”€β”€ Images/
190
+ β”œβ”€β”€ Labels/
191
+ └── splits/
192
+ β”œβ”€β”€ train.txt
193
+ β”œβ”€β”€ val.txt
194
+ └── test.txt
195
+ ```
196
 
197
+ - **6 classes:** Impervious, Building, Low Vegetation, Tree, Car, Clutter
198
+ - **Resolution:** 6000Γ—6000 tiles (cropped to 512Γ—512)
199
+
200
+ </details>
201
+
202
+ **Must-do:** Set `DATA_ROOT` in `config.py` (LoveDA) or `config_icprs.py` (Potsdam) to your local dataset path.
203
 
204
  ---
205
 
206
+ ## πŸš€ Train & Evaluation
207
+
208
+ YAML-free, config-driven β€” clean and reproducible.
209
 
210
+ ### Train
 
211
 
212
+ ```bash
213
+ # LoveDA β€” pick any backbone family
214
+ cd MambaVision # or VMamba/, spatial-mamba/, CNN_DeepLabv3p/, etc.
215
+ # β†’ edit config.py: set DATA_ROOT, OUTPUT_DIR, and backbone variant
216
+ python train.py
217
+
218
+ # ISPRS Potsdam
219
+ cd VMamba
220
+ # β†’ edit config_icprs.py: set DATA_ROOT and OUTPUT_DIR
221
+ python train.py
222
  ```
223
 
224
+ Checkpoints + TensorBoard logs land in `Comparison_Experiments/<experiment_name>/`.
225
+
226
+ ### Efficiency Profiling
227
 
228
  ```bash
229
+ # Single model benchmark (FPS + peak VRAM)
230
+ python tools/benchmark_fps_mem.py \
231
+ --model mambavision --variant base --device cuda:0
232
+
233
+ # Full sweep across all families
234
+ python tools/benchmark_fps_mem_total.py \
235
+ --device cuda:0 --batch_size 1
236
  ```
237
 
238
  ---
239
 
240
+ ## πŸ”¬ Analysis Scripts
241
+
242
+ Three diagnostic scripts that reproduce every analytical claim in the paper:
243
+
244
+ | Script | What It Measures | What It Tells You |
245
+ |---|---|---|
246
+ | `analysis/boundary_analysis.py` | Boundary vs. interior mIoU under domain shift | Boundary degradation is the dominant failure mode β€” not interior misclassification |
247
+ | `analysis/cross_domain_analysis.py` | U→R and R→U metrics for all families | Domain transfer asymmetry is backbone-agnostic — it's a data property |
248
+ | `analysis/rotation_analysis.py` | Prediction stability under 90Β°/180Β°/270Β° rotations | Tests whether SSM scan-order introduces orientation artifacts |
249
+
250
+ ```bash
251
+ python analysis/boundary_analysis.py \
252
+ --device cuda:0 --use_pretrained 1
253
+
254
+ python analysis/cross_domain_analysis.py \
255
+ --device cuda:0 --use_pretrained 1
256
+
257
+ python analysis/rotation_analysis.py \
258
+ --device cuda:0 --use_pretrained 1 \
259
+ --pack_rotations 1 \
260
+ --families mambavision,vmamba,spatialmamba
261
+ ```
262
+
263
+ Results land in `analysis_outputs/` as CSV files ready for plotting.
264
+
265
+ ---
266
+
267
+ ## πŸ“Š Results
268
+
269
+ Straight from the paper β€” reproducible out of the box.
270
+
271
+ Every row shares the same decoder, loss, optimizer, schedule, augmentations, and data splits. **The only variable is the encoder backbone.**
272
+
273
+ | Type | Backbone | LoveDA mIoU | U→R | R→U | Potsdam mIoU |
274
+ |---|---|---:|---:|---:|---:|
275
+ | CNN | DeepLabv3 (controlled) | 43.01 | 30.36 | 39.98 | 75.09 |
276
+ | Transformer | UNetFormer (controlled) | 48.61 | 34.56 | 44.84 | 74.99 |
277
+ | **SSM** πŸ”₯ | **VMamba-Small** | **55.66** | **40.62** | 53.52 | **77.59** |
278
+ | **SSM** πŸ”₯ | **MambaVision-L** | 55.25 | 38.53 | **54.01** | 77.07 |
279
+ | SSM | Spatial-Mamba-B | 48.03 | 35.23 | 46.55 | 70.00 |
280
+
281
+ > πŸ† **VMamba-Small. 55.66 mIoU. +7.05 over the best Transformer. +12.65 over the best CNN. Same decoder. Same training. No tricks.**
282
+
283
+ ### Accuracy vs. Throughput
284
+
285
+ <p align="center">
286
+ <img src="IGARSS%202026/fps_vs_miou.png" alt="mIoU vs Inference Throughput" width="60%">
287
+ </p>
288
+ <p align="center"><i>mIoU (%) vs. inference throughput (FPS) for all SSM variants. VMamba holds near-peak accuracy across all sizes. MambaVision trades speed for capacity with diminishing returns. Spatial-Mamba sits in the lower tier.</i></p>
289
+
290
+ ### Key Takeaways
291
+
292
+ πŸ”₯ **SSMs dominate the fair fight.** VMamba-Small beats UNetFormer by +7.05 and DeepLabv3 by +12.65 on LoveDA β€” under identical conditions. This is the backbone, not the pipeline.
293
+
294
+ πŸ“ **Bigger β‰  better under a fixed decoder.** MambaVision-L carries far more parameters than VMamba-Small yet scores 55.25 vs. 55.66. Scaling the encoder past a threshold buys nothing when the decoder stays constant.
295
+
296
+ πŸ”„ **Domain transfer is asymmetric β€” and backbone-agnostic.** Ruralβ†’Urban outperforms Urbanβ†’Rural by 10–15 points across every family. VMamba-Small: 53.52 Rβ†’U vs. 40.62 Uβ†’R. This is a data distribution property, not a model property.
297
+
298
+ 🧱 **Boundaries are the unsolved failure mode.** Under domain shift, interior accuracy holds. Boundary accuracy collapses. Every backbone, every family, same story. Whoever cracks boundary sensitivity under distribution shift wins the next round.
299
+
300
+ ### Qualitative Results β€” LoveDA
301
+
302
+ <p align="center">
303
+ <img src="IGARSS%202026/loveda_qualitative_detailed_enhanced.png" alt="LoveDA Qualitative Results" width="85%">
304
+ </p>
305
+ <p align="center"><i>Predictions + error maps (magenta = false positive, dark green = false negative) on LoveDA Urban and Rural scenes. VMamba-S and VMamba-B produce the cleanest boundaries; Spatial-Mamba-B shows the most false positives at class transitions.</i></p>
306
+
307
+ ### Qualitative Results β€” ISPRS Potsdam
308
+
309
+ <p align="center">
310
+ <img src="IGARSS%202026/potsdam_qualitative_detailed_enhanced.png" alt="ISPRS Potsdam Qualitative Results" width="85%">
311
+ </p>
312
+ <p align="center"><i>Predictions + error maps on ISPRS Potsdam. All SSM variants handle large homogeneous regions well; errors concentrate at fine-grained boundaries (cars, narrow roads) β€” consistent with the boundary analysis findings.</i></p>
313
+
314
+ ---
315
+
316
+ ## 🧬 Backbone Overview
317
 
318
+ | Backbone | Architecture | Key Idea | RS Segmentation Impact |
319
+ |---|---|---|---|
320
+ | **VMamba** | Cross-scan 2D selective SSM | Global spatial context with linear complexity via multi-directional scanning | πŸ₯‡ Top performer: 55.66 LoveDA mIoU, strongest domain transfer |
321
+ | **MambaVision** | Hybrid Mamba + self-attention | Interleaves Mamba blocks (early stages) with attention (late stages) | Matches VMamba on Potsdam, but extra capacity doesn't help on LoveDA |
322
+ | **Spatial-Mamba** | Spatially-aware SSM | Explicit positional inductive biases in the state-space pathway | Beats CNN baseline, but scan-order alone insufficient without global modeling |
323
+ | **DeepLabv3+** | CNN (ResNet-50) | Atrous convolutions + ASPP for multi-scale context | Controlled CNN reference β€” 43.01 mIoU baseline |
324
+ | **UNetFormer** | Transformer (ResNet-18) | Efficient self-attention decoder for dense prediction | Controlled Transformer reference β€” 48.61 mIoU baseline |
325
+
326
+ ---
327
+
328
+ ## πŸ™ Acknowledgements
329
+
330
+ This work builds on prior advances in visual state-space models and remote-sensing segmentation. We gratefully acknowledge:
331
+
332
+ - **[VMamba](https://github.com/MzeroMiko/VMamba)** β€” Visual State Space Model backbone
333
+ - **[MambaVision](https://github.com/NVlabs/MambaVision)** β€” NVIDIA's hybrid Mamba-Transformer architecture
334
+ - **[Spatial-Mamba](https://github.com/EdwardChaworworrachat/SpatialMamba)** β€” Spatially-aware Mamba variant
335
+ - **[LoveDA](https://github.com/Junjue-Wang/LoveDA)** and **[ISPRS Potsdam](https://www.isprs.org/education/benchmarks/UrbanSemLab/)** dataset creators
336
+
337
+ ---
338
+
339
+ ## πŸ“œ Citation
340
+
341
+ If Mamba-Segmentation fuels your research, please cite:
342
 
343
  ```bibtex
344
  @article{wasalathilaka2026controlledbenchmark,
345
  title={A Controlled Benchmark of Visual State-Space Backbones with
346
+ Domain-Shift and Boundary Analysis for Remote-Sensing
347
+ Segmentation},
348
+ author={Wasalathilaka, Nichula and Perea, Dineth and Samarakoon,
349
+ Oshadha and Wijenayake, Buddhi and Godaliyadda, Roshan and
350
+ Herath, Vijitha and Ekanayake, Parakrama},
351
+ journal={IGRAAS 2026},
352
  year={2026}
353
  }
354
  ```
355
 
356
  ---
357
 
358
+ πŸŒπŸ›°οΈ Built at the **University of Peradeniya**. Got inspired? Give us a ⭐