Spaces:

webVishnu
/

code-wizards

Runtime error

App Files Files Community

webVishnu commited on 16 days ago

Commit

77445cb

0 Parent(s):

inital

Browse files

Files changed (38) hide show

.gitignore +12 -0
.gradio/certificate.pem +31 -0
README.md +621 -0
desert_segmentation/__init__.py +3 -0
desert_segmentation/configs/default.yaml +77 -0
desert_segmentation/data/__init__.py +4 -0
desert_segmentation/data/dataset.py +129 -0
desert_segmentation/data/mask_encoding.py +90 -0
desert_segmentation/data/transforms.py +90 -0
desert_segmentation/demo/__init__.py +15 -0
desert_segmentation/demo/inference_ui.py +95 -0
desert_segmentation/infer/__init__.py +3 -0
desert_segmentation/infer/predict.py +210 -0
desert_segmentation/losses/__init__.py +3 -0
desert_segmentation/losses/combined.py +143 -0
desert_segmentation/metrics/__init__.py +15 -0
desert_segmentation/metrics/iou.py +143 -0
desert_segmentation/models/__init__.py +3 -0
desert_segmentation/models/factory.py +37 -0
desert_segmentation/train/__init__.py +4 -0
desert_segmentation/train/evaluate.py +30 -0
desert_segmentation/train/trainer.py +205 -0
desert_segmentation/utils/__init__.py +3 -0
desert_segmentation/utils/config.py +37 -0
desert_segmentation/utils/freq.py +66 -0
desert_segmentation/utils/logging_utils.py +11 -0
desert_segmentation/utils/seed.py +13 -0
desert_segmentation/utils/viz.py +60 -0
eval_summary.json +47 -0
requirements-demo.txt +3 -0
requirements.txt +10 -0
scripts/demo_gradio.py +219 -0
scripts/eval.py +146 -0
scripts/eval_summary.py +259 -0
scripts/infer.py +49 -0
scripts/train.py +166 -0
tests/test_confusion_metrics.py +50 -0
tests/test_mask_encoding.py +33 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,12 @@

+testing/
+training/
+checkpoints/
+eval_outputs/
+infer_outputs/
+logs/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.pyw
+*.pyz

.gradio/certificate.pem ADDED Viewed

	@@ -0,0 +1,31 @@

+-----BEGIN CERTIFICATE-----
+MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
+TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
+cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
+WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
+ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
+MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
+h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
+0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
+A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
+T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
+B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
+B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
+KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
+OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
+jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
+qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
+rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
+HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
+hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
+ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
+3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
+NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
+ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
+TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
+jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
+oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
+4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
+mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
+emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
+-----END CERTIFICATE-----

README.md ADDED Viewed

	@@ -0,0 +1,621 @@

+# Desert Semantic Segmentation
+End-to-end **semantic segmentation** for **off-road / desert** scenes: every pixel is classified into one of several terrain / object categories. The pipeline is built for **synthetic RGB + mask** data, **PyTorch**, **[segmentation_models_pytorch](https://github.com/qubvel/segmentation_models.pytorch)** (SMP), **Albumentations**, and hackathon-style iteration (strong baselines, IoU-driven checkpoints, optional EMA / TTA / ONNX).
+---
+## Table of contents
+1. [What this project does](#1-what-this-project-does)
+2. [Problem statement and goals](#2-problem-statement-and-goals)
+3. [Dataset layout and assumptions](#3-dataset-layout-and-assumptions)
+4. [Label format (critical)](#4-label-format-critical)
+5. [Repository structure](#5-repository-structure)
+6. [Configuration (`default.yaml`)](#6-configuration-defaultyaml)
+7. [High-level architecture](#7-high-level-architecture)
+8. [Data pipeline (detailed)](#8-data-pipeline-detailed)
+9. [Model](#9-model)
+10. [Loss functions](#10-loss-functions)
+11. [Metrics](#11-metrics)
+12. [Training loop](#12-training-loop)
+13. [Validation and evaluation scripts](#13-validation-and-evaluation-scripts)
+14. [Inference (testing folder, sliding window, TTA, ONNX)](#14-inference-testing-folder-sliding-window-tta-onnx)
+15. [Checkpoints and artifacts](#15-checkpoints-and-artifacts)
+16. [How to run (commands)](#16-how-to-run-commands)
+17. [Interactive demo (Gradio)](#17-interactive-demo-gradio)
+18. [Tests](#18-tests)
+19. [Dependencies and environment notes](#19-dependencies-and-environment-notes)
+20. [Design decisions and limitations](#20-design-decisions-and-limitations)
+21. [Extending the project](#21-extending-the-project)
+22. [Flowcharts](#22-flowcharts)
+---
+## 1. What this project does
+- **Input:** RGB color images (`Color_Images`).
+- **Supervision:** Per-pixel class masks (`Segmentation`) aligned by **filename** with the RGB image.
+- **Output:** A trained neural network that predicts a **class index per pixel** on validation, held-out **testing** images (no labels in repo), or any folder of images you point inference at.
+- **Primary quality metric:** **mean Intersection-over-Union (mIoU)** on the validation set, plus **per-class IoU**, **frequency-weighted IoU (fwIoU)**, and a **confusion matrix**.
+---
+## 2. Problem statement and goals
+| Goal | How we address it |
+|------|-------------------|
+| Accurate pixel-wise classification | DeepLabV3+ with ImageNet-pretrained encoder; CE + Dice loss; class-frequency weights |
+| Robustness (synthetic → harder real domains) | Strong photometric + mild “desert-like” augmentations (sun flare, shadow, blur, noise, JPEG) |
+| Class imbalance | Inverse log-frequency weights with a **cap**; rare-class-biased random crops |
+| Stable training | AdamW, cosine decay with **warmup**, gradient clipping, optional **EMA** |
+| Fast iteration | YAML-driven config; SMP for one-line model construction; scripts for train / eval / infer |
+| Deployment story | Optional **ONNX** export; inference timing written to `latency.txt` |
+**Note:** The original hackathon plan also mentioned **SegFormer-B2** as a balanced option. This codebase’s **default** is **DeepLabV3+ + ResNet-50**. UNet and FPN are supported in code; SegFormer is **not** implemented as a separate architecture in `models/factory.py` (you can experiment with **MiT** encoders under DeepLabV3+ if SMP supports your chosen encoder name).
+---
+## 3. Dataset layout and assumptions
+All paths in config are **relative to the workspace root** (`--root` on the CLI, or the repo root by default).
+```text
+<root>/
+  training/
+    train/
+      Color_Images/     # RGB training inputs
+      Segmentation/     # Training masks (same filenames as Color_Images)
+    val/
+      Color_Images/     # RGB validation inputs
+      Segmentation/     # Validation masks
+  testing/
+    Color_Images/       # Unlabeled images for final inference / demo
+```
+**Pairing rule:** For each split, every file in `Color_Images` must have a mask with the **same basename** in `Segmentation`. The dataset constructor raises if a mask is missing.
+**Typical image size in this workspace:** RGB and masks are often **960×540** (masks are single-channel uint16 PNGs). Training uses **512×512** crops; validation pads to a **512×512** canvas for batching.
+---
+## 4. Label format (critical)
+### 4.1 What the masks are
+- Masks are read as **2D arrays** (single channel).
+- In this dataset they behave as **`I;16` (16-bit unsigned)** semantic IDs: pixel values are **not** 0, 1, 2, …
+  They are **dataset-specific raw IDs**, e.g. `100, 200, 300, 500, 550, 600, 700, 800, 7100, 10000`.
+### 4.2 Mapping raw IDs → training indices
+The class `RawMaskCodec` in `desert_segmentation/data/mask_encoding.py`:
+1. Builds a **lookup table (LUT)** from `max(raw_ids)` down to 0.
+2. Maps each legal raw ID to a contiguous index **`0 … num_classes-1`** (uint8 for Albumentations compatibility).
+3. **Raises** if any pixel is not in the configured `raw_ids` list (unknown pixels would map to sentinel `255` in the LUT and trigger an error).
+**Why this matters:** Using the wrong mapping (or treating masks as 8-bit class indices) silently destroys learning.
+### 4.3 Ignore index (255)
+- **Training:** `ShiftScaleRotate` can introduce border pixels on the mask; those are filled with **`ignore_index` (255)**. Cross-entropy and Dice **ignore** those pixels.
+- **Validation:** `PadIfNeeded` pads the mask with **255** so square tensors align; metrics and loss skip those pixels.
+### 4.4 Class names
+`class_names` in YAML are **display labels** (e.g. `id_100`, …). Replace them with semantic names (e.g. `sky`, `sand`) when you have official ontology from the dataset provider.
+---
+## 5. Repository structure
+```text
+codewizard 2.0/
+  README.md                          # This file
+  requirements.txt                   # Python dependencies
+  requirements-demo.txt              # Optional: Gradio demo
+  desert_segmentation/               # Importable package
+    __init__.py
+    configs/
+      default.yaml                   # Single source of truth for paths & hyperparameters
+    data/
+      dataset.py                     # SegmentationDataset (pairing, crop, rare bias)
+      transforms.py                  # Albumentations train/val pipelines
+      mask_encoding.py               # RawMaskCodec + build_codec_from_config
+    models/
+      factory.py                     # SMP: DeepLabV3+, UNet, FPN
+    losses/
+      combined.py                    # CE, weighted CE, focal, CE+Dice + weight helper
+    metrics/
+      iou.py                         # Confusion matrix, IoU, mIoU, fwIoU
+    train/
+      trainer.py                     # Main training loop (AMP, EMA, scheduler, checkpoints)
+      evaluate.py                    # Batched validation metric pass
+    infer/
+      predict.py                     # Sliding window, TTA, folder inference, ONNX export
+    utils/
+      config.py                      # YAML load + path resolution
+      seed.py                        # Reproducibility
+      logging_utils.py               # Logging setup
+      freq.py                        # Scan mask folders for class frequencies
+      viz.py                         # Colorization + overlay + triplet PNG export
+    demo/
+      inference_ui.py                # Gradio helpers: legend HTML, validation, composites
+  scripts/
+    train.py                         # CLI: train from config
+    eval.py                          # CLI: val metrics + confusion + visualization PNGs
+    eval_summary.py                  # CLI: mIoU (all + valid-GT), fwIoU, accuracies, GT counts, per-class table (+ JSON)
+    infer.py                         # CLI: run on testing/ or export ONNX
+    demo_gradio.py                  # CLI: browser upload demo (Gradio)
+  tests/
+    test_mask_encoding.py            # Unit tests for codec / unknown pixels
+```
+**Scripts** add the repo root to `sys.path` so you can run them without installing the package as a wheel.
+---
+## 6. Configuration (`default.yaml`)
+Key sections (see `desert_segmentation/configs/default.yaml` for the full file):
+| Section | Purpose |
+|---------|---------|
+| `root` | Base path for resolving relative data paths (overridden by `--root` in scripts) |
+| `data.*` | Relative dirs for train/val images and masks, test images, `raw_ids`, `class_names`, `crop_size`, `rare_class_crop_prob`, `weighted_sampler`, `weighted_sampler_eps`, `ignore_index` |
+| `model.*` | `architecture` (`deeplabv3plus` \| `unet` \| `fpn`), `encoder_name`, `encoder_weights` |
+| `train.*` | `batch_size`, `epochs`, `lr`, `weight_decay`, `warmup_ratio`, `amp`, `gradient_clip`, `seed`, `checkpoint_dir`, `log_interval`, `early_stop_patience` |
+| `loss.*` | `name` (`ce` \| `weighted_ce` \| `ce_dice` \| `focal_ce` \| `focal_ce_dice`), `dice_weight`, `label_smoothing` (CE modes only), `class_weight_cap`, `focal_gamma` |
+| `augmentation.strong` | Enables extra sun flare + shadow blocks in training |
+| `ema.*` | Optional exponential moving average of weights for evaluation |
+| `inference.*` | `tile_size`, `overlap` (for sliding window), `tta_flip`, `batch_size` (reserved for future batching) |
+---
+## 7. High-level architecture
+```mermaid
+flowchart TB
+  subgraph inputs [Inputs]
+    RGB[RGB images]
+    GT[Ground truth masks]
+  end
+  subgraph prep [Preprocessing]
+    Codec[RawMaskCodec LUT]
+    Crop[Train: random 512 crop with rare bias]
+    ValPad[Val: resize longest side then pad to 512]
+    Aug[Albumentations geom plus color]
+  end
+  subgraph model [Model SMP]
+    DL[DeepLabV3Plus default]
+  end
+  subgraph train [Training]
+    Loss[CE plus Dice with class weights]
+    Opt[AdamW plus cosine warmup LR]
+    AMP[AMP if CUDA]
+    EMA[EMA optional]
+    CKPT[Best mIoU checkpoint]
+  end
+  subgraph out [Outputs]
+    Metrics[mIoU per class IoU fwIoU confusion]
+    Viz[Overlays triplets]
+    ONNX[Optional ONNX]
+  end
+  RGB --> Codec
+  GT --> Codec
+  Codec --> Crop
+  Codec --> ValPad
+  Crop --> Aug
+  Aug --> DL
+  ValPad --> DL
+  DL --> Loss
+  Loss --> Opt
+  Opt --> AMP
+  Opt --> EMA
+  DL --> Metrics
+  Metrics --> CKPT
+  Metrics --> Viz
+  DL --> ONNX
+```
+---
+## 8. Data pipeline (detailed)
+### 8.1 `SegmentationDataset` (`data/dataset.py`)
+1. **List images** in `images_dir` with extensions: `.png`, `.jpg`, `.jpeg`, `.bmp`, `.tif`, `.tiff`.
+2. **Verify** each image has a mask with the same filename in `masks_dir`.
+3. **Load RGB** with Pillow → `HxWx3` uint8.
+4. **Load mask** as numpy 2D → cast to `uint16` → **`codec.encode_mask`** → `HxW` uint8 with values `0 … C-1` (or padded 255 later in transforms).
+**Train mode (`mode="train"`):**
+- **`_random_crop_bias_rare`:** Extract a **`crop_size × crop_size`** patch.
+  - With probability `rare_class_crop_prob` (default **0.35**), pick the **rarest class** in that image (by histogram) and center the crop on a random pixel of that class (if any exist).
+  - Otherwise pick a uniformly random center.
+- If the image is smaller than the crop, **zero-pad** the image and **255-pad** the mask (ignore regions).
+**Val mode (`mode="val"`):**
+- No random crop in the dataset; the **full** image goes to Albumentations.
+### 8.2 Transforms (`data/transforms.py`)
+**Train (`build_train_transforms`):**
+- **Geometric:** `HorizontalFlip`, `ShiftScaleRotate` (shift, scale, ±10° rotation) with `mask_value=ignore_index` on borders.
+- **Photometric:** brightness/contrast, hue/sat/value, Gaussian blur, Gaussian noise, JPEG compression simulation, RGB shift.
+- **If `augmentation.strong`:** `RandomSunFlare`, `RandomShadow` (desert-relevant appearance stress).
+- **Normalize:** ImageNet mean/std.
+- **`ToTensorV2`:** Image → `float` tensor `CHW`; mask handled so downstream converts to `long` in `__getitem__`.
+**Val (`build_val_transforms`):**
+- `LongestMaxSize(crop_size)` then `PadIfNeeded(crop_size, crop_size)` with **mask pad = 255** (ignored in loss/metrics).
+### 8.3 Class frequency estimation (`utils/freq.py`)
+Before training, `scripts/train.py` calls **`estimate_pixel_frequencies`** over **all** training mask files (configurable `max_files` in code; train script uses full corpus). This yields a normalized frequency vector per class → used to build **class weights**.
+---
+## 9. Model
+**Factory:** `desert_segmentation/models/factory.py`
+| `architecture` | SMP class | Notes |
+|----------------|-----------|--------|
+| `deeplabv3plus` (default) | `smp.DeepLabV3Plus` | Mainline; strong decoder + atrous spatial pyramid |
+| `unet` | `smp.Unet` | Classic encoder–decoder skips |
+| `fpn` | `smp.FPN` | Feature pyramid neck |
+**Default encoder:** `resnet50` with `encoder_weights: imagenet`.
+**Forward:** Input batch `N×3×H×W` → logits `N×C×H×W` where `C = num_classes`.
+---
+## 10. Loss functions
+**File:** `desert_segmentation/losses/combined.py`
+**Modes (`loss.name`):**
+| Mode | Description |
+|------|-------------|
+| `ce` | Plain cross-entropy, unweighted |
+| `weighted_ce` | Cross-entropy with per-class `weight` tensor |
+| `ce_dice` (default) | `CE(weighted) + dice_weight * multiclass_Dice_loss` |
+| `focal_ce` | Focal modulated CE; optional class weights on pixels |
+| `focal_ce_dice` | `focal_ce` + `dice_weight * multiclass_Dice_loss` (same class weights in focal term) |
+**Shared options:**
+- **`ignore_index`:** Pixels with label 255 are masked out of CE / focal / dice.
+- **`label_smoothing`:** Applied to **CE-based** modes (`ce`, `weighted_ce`, `ce_dice`) only; not used in `focal_ce` / `focal_ce_dice`.
+**Class weights (`compute_class_weights_from_freq`):**
+1. Start from per-class pixel frequency `freq` on the training masks.
+2. `w ∝ 1 / log(freq + ε)`, normalize by mean.
+3. Clamp the ratio `w / median(w)` to **`class_weight_cap`** (default **15**) so rare classes do not explode the loss.
+---
+## 11. Metrics
+**File:** `desert_segmentation/metrics/iou.py`
+1. **Confusion matrix** `C×C` (implementation uses `idx = tgt * C + pred` then `bincount`; rows correspond to **ground-truth class**, columns to **predicted class**).
+2. **Per-class IoU:**
+   \(\text{IoU}_k = \frac{TP_k}{TP_k + FP_k + FN_k}\)
+   with `TP_k = CM[k,k]`, row/col sums for FP/FN.
+3. **mIoU:** Mean of per-class IoU over finite entries.
+4. **fwIoU (frequency-weighted IoU):** \(\sum_k \text{IoU}_k \cdot p_k\) where \(p_k\) is the empirical frequency of class \(k\) in the ground-truth pixels (column marginals).
+**Note:** The docstring in `compute_confusion` mentions “pred rows, target columns”; the actual indexing follows **`tgt` (row) × `C` + `pred` (col)`** after reshape.
+---
+## 12. Training loop
+**File:** `desert_segmentation/train/trainer.py`
+**Optimizer:** AdamW on all parameters.
+**Learning rate:** `LambdaLR` with:
+- **Linear warmup** for `warmup_ratio` of total optimizer steps (default **8%**).
+- **Cosine** decay from 1.0 down to `min_ratio` **0.01** (implemented in `_warmup_cosine_lambda`).
+**AMP (mixed precision):**
+- Enabled only if `train.amp` is true **and** `torch.cuda.is_available()`.
+- Uses `torch.cuda.amp.autocast` + `GradScaler` when on CUDA.
+- On **CPU**, AMP is off; training uses standard FP32 backward (no scaler).
+**Gradient clipping:** Global norm clip when `gradient_clip > 0` (default **1.0**).
+**EMA (optional):**
+- If `ema.enabled`, after each optimizer step the code maintains a **shadow weight** copy per trainable parameter: exponential decay **0.999** by default.
+- **Each epoch:** Training weights are **deep-copied**; **EMA weights are copied into the model** for validation only; then the training snapshot is **restored** so optimization continues from the non-EMA weights.
+**Checkpointing:**
+- Every epoch: `checkpoints/last.pt` (model, optional EMA dict, optimizer, full config, class names).
+- **Best validation mIoU:** `checkpoints/best.pt` (adds `miou`, `per_class_iou`).
+**Early stopping:** If validation mIoU does not improve for `early_stop_patience` epochs (default **12**), training stops.
+**Optional smoke flags (`scripts/train.py`):**
+- `--epochs N` — override epoch count.
+- `--max_train_batches K` — stop each training epoch after `K` batches (debug only; scheduler still advances per batch).
+**Logging:** `checkpoints/history.json` lists per-epoch `miou` and `fw_iou`.
+---
+## 13. Validation and evaluation scripts
+**Core loop:** `desert_segmentation/train/evaluate.py` runs the model in `eval()` mode, accumulates confusion via `IoUMetrics`, returns a dict.
+**CLI:** `scripts/eval.py`
+1. Loads config + builds validation dataset (same codec and val transforms as training).
+2. Loads checkpoint from `--checkpoint`.
+3. **Weight loading priority:** If `ema` dict exists in checkpoint, **EMA tensors are copied into parameters** for evaluation; else `state_dict` from `model` key.
+4. Runs full val loader → logs **mIoU**, **fwIoU**, per-class IoU.
+5. Writes:
+   - `eval_outputs/metrics.json` (or `--out_dir`)
+   - `confusion.npy`
+   - Up to `--max_viz` side-by-side **RGB | GT | Pred** PNGs (`save_triplet` in `utils/viz.py`), with ImageNet denormalization for RGB panels.
+---
+## 14. Inference (testing folder, sliding window, TTA, ONNX)
+**CLI:** `scripts/infer.py`
+### 14.1 Folder inference
+- Reads `testing/Color_Images` (or whatever `data.test_images` points to).
+- Loads checkpoint with the same **EMA-first** rule as eval.
+- For each image:
+  - If **both** height and width ≤ `tile_size` (512): single forward pass.
+  - Else: **sliding window** with stride `tile_size * (1 - overlap)` (default overlap **0.25** → stride **384**).
+  - Pads the image with **reflect** padding so tile grid covers corners; crops back to original size.
+  - Accumulates **per-class logits** weighted by a **2D Gaussian** (`sigma ∝ tile/3`) so tile borders blend smoothly; final prediction is **`argmax` over classes** per pixel.
+### 14.2 Test-time augmentation (TTA)
+If `inference.tta_flip` is true: logits = **0.5 × (logits(x) + unflip(logits(flip(x))))** horizontally.
+### 14.3 Outputs
+Under `--out_dir` (default `infer_outputs/`):
+- `pred_<filename>` — color overlay (prediction tinted on RGB).
+- `triplet_<filename>` — **RGB | blank or GT | Pred** strip (test set has no GT, so middle panel is zeros in current `save_triplet` usage).
+- `latency.txt` — mean milliseconds per image and device string.
+### 14.4 ONNX
+`python scripts/infer.py --checkpoint ... --onnx model.onnx` calls `export_onnx`: builds model on **CPU**, dummy input `1×3×512×512`, `torch.onnx.export` with dynamic axes for batch and spatial size.
+---
+## 15. Checkpoints and artifacts
+| Artifact | Contents |
+|----------|----------|
+| `checkpoints/best.pt` | `model`, `ema` (optional), `miou`, `per_class_iou`, `config`, `class_names` |
+| `checkpoints/last.pt` | Latest epoch snapshot + optimizer |
+| `checkpoints/history.json` | List of `{epoch, miou, fw_iou}` |
+| `eval_outputs/*` | `metrics.json`, `confusion.npy`, visualization PNGs |
+| `infer_outputs/*` | Overlays, triplets, `latency.txt` |
+---
+## 16. How to run (commands)
+From the repository root (adjust paths if yours differ).
+### 16.1 Install
+```powershell
+python -m pip install -r requirements.txt
+```
+### 16.2 Train
+```powershell
+$env:PYTHONPATH="."
+python scripts\train.py --root "d:\codewizard 2.0"
+```
+Optional:
+```powershell
+python scripts\train.py --root "d:\codewizard 2.0" --config desert_segmentation\configs\default.yaml --epochs 5 --max_train_batches 50
+```
+**Imbalanced classes (optional YAML):** set `loss.name` to `focal_ce_dice` for focal + Dice; tune `class_weight_cap`, `rare_class_crop_prob`, and/or `data.weighted_sampler: true` to oversample train images that contain rare classes (scans all train masks once at startup—can take a minute on large sets).
+### 16.3 Evaluate (validation)
+```powershell
+python scripts\eval.py --root "d:\codewizard 2.0" --checkpoint checkpoints\best.pt --out_dir eval_outputs
+```
+**Metric summary (no PNGs):** prints **mIoU (all classes)** and **mIoU (classes with val GT)** (the latter ignores absent classes so it is easier to interpret on sparse val labels), **fwIoU**, **global / mean class accuracy**, **val GT pixel counts per class**, and a **per-class IoU / recall** table. Same validation forward pass as `eval.py`. Optional: `--json-out eval_summary.json` (includes `miou_valid_gt_classes`, `val_gt_pixel_counts`).
+```powershell
+python scripts\eval_summary.py --root "d:\codewizard 2.0" --checkpoint checkpoints\best.pt --json-out eval_summary.json
+```
+To print only **mIoU** and **per-class IoU** stored inside the checkpoint (no GPU eval): `python scripts\eval_summary.py --from-checkpoint-only --checkpoint checkpoints\best.pt`
+### 16.4 Infer on `testing/Color_Images`
+```powershell
+python scripts\infer.py --root "d:\codewizard 2.0" --checkpoint checkpoints\best.pt --out_dir infer_outputs --limit 20
+```
+### 16.5 Export ONNX
+```powershell
+python scripts\infer.py --root "d:\codewizard 2.0" --checkpoint checkpoints\best.pt --onnx model.onnx
+```
+---
+## 17. Interactive demo (Gradio)
+Upload an RGB image in the browser and get a **colored class mask**, **overlay**, a **side-by-side strip** (RGB | mask | overlay), a **fixed legend** (colors match `palette()` in training), **inference time**, and **dominant classes** (pixel histogram). Uses the same path as CLI inference: [`_load_model_for_inference`](d:\codewizard 2.0\desert_segmentation\infer\predict.py) and [`predict_image`](d:\codewizard 2.0\desert_segmentation\infer\predict.py) (EMA weights preferred when present in the checkpoint).
+**Install** (base + demo extras):
+```powershell
+python -m pip install -r requirements.txt -r requirements-demo.txt
+```
+**Run** (from repo root; model loads **once** at startup — look for a log line `Model ready`):
+```powershell
+$env:PYTHONPATH="."
+python scripts\demo_gradio.py --root "d:\codewizard 2.0" --checkpoint checkpoints\best.pt
+```
+**CLI flags:** `--host` (default `127.0.0.1`), `--port` (default `7860`), `--share` (temporary public Gradio link), `--max-side`, `--max-megapixels` (reject huge uploads before inference).
+**Environment variables** (optional defaults if flags omitted):
+| Variable | Purpose |
+|----------|---------|
+| `ROOT` | Workspace root (same as `--root`) |
+| `CHECKPOINT_PATH` | Path to `best.pt` (relative paths resolve under `ROOT`) |
+**Advanced panel:** TTA on/off, tile overlap slider, tile size slider (256–2048, step 64). Overrides are passed into `predict_image` only; the checkpoint file is not modified.
+**v1 limitations:** No per-pixel **confidence heatmap** for full sliding-window runs (only `argmax` is returned from `predict_image`). See plan follow-up to add logits fusion if needed.
+**Windows:** Use backslashes or quoted paths as above; first launch may be slow while dependencies initialize.
+**Follow-ups (not in v1):** full-resolution **confidence** heatmap (needs logits path in `predict.py`); **ZIP** batch upload; **two-checkpoint** comparison UI; client-side **ONNX** inference.
+---
+## 18. Tests
+```powershell
+python -m pytest tests\test_mask_encoding.py -q
+```
+Covers:
+- Round-trip **raw mask ↔ class indices** for known IDs.
+- **Unknown raw pixel** raises `ValueError`.
+- LUT correctness for each configured raw id.
+---
+## 19. Dependencies and environment notes
+**`requirements.txt`:**
+- `torch`, `torchvision`, `numpy`, `Pillow`, `PyYAML`
+- `albumentations` pinned to `<1.5` to reduce optional native build issues on some Windows setups
+- `segmentation-models-pytorch` (SMP)
+- `tqdm`, `pytest`
+- Optional demo: `requirements-demo.txt` adds **Gradio**
+**Windows:** `scripts/train.py` and `scripts/eval.py` set `num_workers=0` for `DataLoader` on NT to avoid multiprocessing friction.
+**SMP pretrained weights:** First run may download encoder weights (e.g. ResNet-50 ImageNet) via SMP / Hugging Face hubs depending on SMP version.
+---
+## 20. Design decisions and limitations
+| Topic | Decision / limitation |
+|-------|------------------------|
+| Mask modes | **16-bit raw IDs** supported via LUT; **P-mode palette** and **RGB color masks** are *not* auto-detected in this codebase—extend `mask_encoding.py` if your dataset uses them |
+| SegFormer | **Not** a separate `architecture` enum; plan mentioned SegFormer-B2 as an alternative—would require additional factory code or using a supported SMP encoder |
+| Val resolution | Images are **letterboxed** to 512×512 for batching; mIoU is on padded regions with ignore—fine for hackathon; for publication-grade eval consider sliding-window val too |
+| Inference fusion | Overlapping tiles add **Gaussian-weighted logits** per class into an accumulator; the final label is **`argmax` over the accumulated logits** (feathered overlap fusion). A per-pixel `weight` tensor is also accumulated in code for possible future normalization extensions |
+| Poly LR / sync BN | **Not** implemented (cosine+warmup only) |
+| Ensemble | **Not** implemented (single model + optional EMA) |
+---
+## 21. Extending the project
+1. **New classes / raw IDs:** Edit `data.raw_ids` and `data.class_names` in YAML; rerun frequency scan is automatic in `train.py`.
+2. **UNet / FPN:** Set `model.architecture` to `unet` or `fpn`; pick a valid `encoder_name` for SMP.
+3. **Larger encoder:** e.g. `encoder_name: resnet101` for DeepLabV3+.
+4. **Loss ablation:** Set `loss.name` to `ce`, `weighted_ce`, `focal_ce`, or `focal_ce_dice`; tune `dice_weight`, `label_smoothing`, `class_weight_cap`.
+5. **Stronger aug:** Add Albumentations ops in `transforms.py` (keep `additional_targets={"mask":"mask"}` for paired geometry).
+---
+## 22. Flowcharts
+### 22.1 Training epoch (simplified)
+```mermaid
+flowchart TD
+  start[Start epoch]
+  trainLoop[For each batch]
+  fwd[Forward logits]
+  lossStep[Compute loss CE plus Dice]
+  backward[Backward plus clip]
+  stepOpt[Optimizer step plus scheduler step]
+  emaUp[Update EMA if enabled]
+  endTrain[End train batches]
+  snap[Snapshot model weights]
+  applyEMA[Copy EMA into model if enabled]
+  valRun[Run validation mIoU]
+  restore[Restore snapshot weights]
+  better{New best mIoU?}
+  saveBest[Save best.pt]
+  early{Patience exceeded?}
+  stop[Stop training]
+  start --> trainLoop
+  trainLoop --> fwd --> lossStep --> backward --> stepOpt --> emaUp
+  emaUp --> trainLoop
+  trainLoop --> endTrain
+  endTrain --> snap --> applyEMA --> valRun --> restore --> better
+  better -->|yes| saveBest --> early
+  better -->|no| early
+  early -->|yes| stop
+  early -->|no| start
+```
+### 22.2 Inference on large images
+```mermaid
+flowchart LR
+  img[Input RGB HxW]
+  pad[Reflect pad to tile grid]
+  tiles[For each tile]
+  fwdT[Forward logits optional TTA]
+  g[Multiply by Gaussian feather]
+  acc[Accumulate class logits maps]
+  argmax[Argmax over classes]
+  cropBack[Crop to original HxW]
+  img --> pad --> tiles --> fwdT --> g --> acc --> argmax --> cropBack
+```
+---
+## Acknowledgments
+- **segmentation_models_pytorch** (Pavel Iakubovskii and contributors) for modular segmentation architectures.
+- **Albumentations** for fast, paired image–mask augmentations.
+---
+*Generated to document the implementation in this repository as of the README authoring date. For the original hackathon planning narrative, see your separate plan document (not stored in this repo’s `README`).*

desert_segmentation/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ """Desert semantic segmentation training and inference package."""
2	+
3	+ __version__ = "0.1.0"

desert_segmentation/configs/default.yaml ADDED Viewed

	@@ -0,0 +1,77 @@

+# Paths are relative to `root` unless absolute.
+root: "."
+data:
+  train_images: "training/train/Color_Images"
+  train_masks: "training/train/Segmentation"
+  val_images: "training/val/Color_Images"
+  val_masks: "training/val/Segmentation"
+  test_images: "testing/Color_Images"
+  # Raw uint16 label IDs (must match PNG values) and display names
+  raw_ids: [100, 200, 300, 500, 550, 600, 700, 800, 7100, 10000]
+  class_names:
+    - "id_100"
+    - "id_200"
+    - "id_300"
+    - "id_500"
+    - "id_550"
+    - "id_600"
+    - "id_700"
+    - "id_800"
+    - "id_7100"
+    - "id_10000"
+  crop_size: 512
+  num_workers: 4
+  # Prefer crops containing underrepresented classes (probability 0–1). Higher = more
+  # training crops centered on rare-class pixels (see SegmentationDataset).
+  rare_class_crop_prob: 0.35
+  # Oversample images that contain rare classes (scans train masks at startup).
+  weighted_sampler: false
+  weighted_sampler_eps: 1.0e-6
+  ignore_index: 255
+model:
+  architecture: "deeplabv3plus"
+  encoder_name: "resnet50"
+  encoder_weights: "imagenet"
+  # Alternative: mit_b2 with deeplabv3plus if supported by SMP
+  # encoder_name: "mit_b2"
+train:
+  batch_size: 4
+  epochs: 40
+  lr: 0.0003
+  weight_decay: 0.0005
+  warmup_ratio: 0.08
+  amp: true
+  gradient_clip: 1.0
+  seed: 42
+  checkpoint_dir: "checkpoints"
+  log_interval: 20
+  early_stop_patience: 12
+loss:
+  # ce | weighted_ce | ce_dice | focal_ce | focal_ce_dice
+  name: "focal_ce_dice"
+  dice_weight: 0.5
+  # Used only for CE-based modes (ce, weighted_ce, ce_dice). Ignored for focal_ce / focal_ce_dice.
+  label_smoothing: 0.05
+  # Inverse log-frequency class weights; ratio clamped to [1/cap, cap] vs median. Typical cap 5–25;
+  # higher = stronger upweight for rare classes (watch for instability).
+  class_weight_cap: 15.0
+  focal_gamma: 2.0
+augmentation:
+  strong: true
+ema:
+  enabled: true
+  decay: 0.999
+inference:
+  tile_size: 512
+  overlap: 0.25
+  tta_flip: true
+  batch_size: 1

desert_segmentation/data/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from desert_segmentation.data.dataset import SegmentationDataset
+from desert_segmentation.data.mask_encoding import RawMaskCodec
+__all__ = ["SegmentationDataset", "RawMaskCodec"]

desert_segmentation/data/dataset.py ADDED Viewed

	@@ -0,0 +1,129 @@

+"""Image / mask dataset with optional rare-class biased cropping."""
+from __future__ import annotations
+import os
+import random
+from pathlib import Path
+from typing import Callable, List, Optional, Sequence, Tuple
+import numpy as np
+import torch
+from PIL import Image
+from torch.utils.data import Dataset
+from desert_segmentation.data.mask_encoding import RawMaskCodec
+def _list_images(dir_path: Path) -> List[str]:
+    exts = {".png", ".jpg", ".jpeg", ".bmp", ".tif", ".tiff"}
+    return sorted(
+        f for f in os.listdir(dir_path) if Path(f).suffix.lower() in exts
+    )
+class SegmentationDataset(Dataset):
+    def __init__(
+        self,
+        images_dir: Path,
+        masks_dir: Path,
+        codec: RawMaskCodec,
+        transform: Optional[Callable] = None,
+        mode: str = "train",
+        crop_size: int = 512,
+        rare_class_crop_prob: float = 0.35,
+        ignore_index: int = 255,
+        seed: int = 42,
+    ) -> None:
+        self.images_dir = Path(images_dir)
+        self.masks_dir = Path(masks_dir)
+        self.codec = codec
+        self.transform = transform
+        self.mode = mode
+        self.crop_size = crop_size
+        self.rare_class_crop_prob = rare_class_crop_prob if mode == "train" else 0.0
+        self.ignore_index = ignore_index
+        self._rng = random.Random(seed)
+        names = _list_images(self.images_dir)
+        self._pairs: List[Tuple[str, str]] = []
+        for n in names:
+            mp = self.masks_dir / n
+            if not mp.is_file():
+                raise FileNotFoundError(f"Missing mask for {n}: {mp}")
+            self._pairs.append((str(self.images_dir / n), str(mp)))
+        if not self._pairs:
+            raise RuntimeError(f"No images in {self.images_dir}")
+    def __len__(self) -> int:
+        return len(self._pairs)
+    @property
+    def image_names(self) -> List[str]:
+        """Basenames aligned with dataset indices (for weighted sampling)."""
+        return [Path(p[0]).name for p in self._pairs]
+    def _load_pair(self, ip: str, mp: str) -> Tuple[np.ndarray, np.ndarray]:
+        image = np.array(Image.open(ip).convert("RGB"))
+        raw_mask = np.array(Image.open(mp))
+        if raw_mask.ndim == 2:
+            enc, _ = self.codec.encode_mask(raw_mask.astype(np.uint16))
+        else:
+            raise ValueError(f"Expected single-channel mask, got shape {raw_mask.shape}")
+        return image, enc
+    def _random_crop_bias_rare(
+        self, image: np.ndarray, mask: np.ndarray
+    ) -> Tuple[np.ndarray, np.ndarray]:
+        h, w = image.shape[:2]
+        ch, cw = self.crop_size, self.crop_size
+        if h < ch or w < cw:
+            pad_h = max(0, ch - h)
+            pad_w = max(0, cw - w)
+            image = np.pad(image, ((0, pad_h), (0, pad_w), (0, 0)), mode="constant")
+            mask = np.pad(mask, ((0, pad_h), (0, pad_w)), mode="constant", constant_values=self.ignore_index)
+            h, w = image.shape[:2]
+        if self.mode == "train" and self._rng.random() < self.rare_class_crop_prob:
+            hist, _ = np.histogram(mask.flatten(), bins=self.codec.num_classes, range=(0, self.codec.num_classes))
+            rare = int(np.argmin(hist))
+            ys, xs = np.where(mask == rare)
+            if len(xs) > 0:
+                idx = self._rng.randrange(len(xs))
+                cx, cy = int(xs[idx]), int(ys[idx])
+            else:
+                cx, cy = w // 2, h // 2
+        else:
+            cx, cy = self._rng.randrange(w), self._rng.randrange(h)
+        x0 = np.clip(cx - cw // 2, 0, w - cw)
+        y0 = np.clip(cy - ch // 2, 0, h - ch)
+        return image[y0 : y0 + ch, x0 : x0 + cw], mask[y0 : y0 + ch, x0 : x0 + cw]
+    def __getitem__(self, idx: int) -> dict:
+        ip, mp = self._pairs[idx]
+        image, mask = self._load_pair(ip, mp)
+        if self.mode == "train":
+            image, mask = self._random_crop_bias_rare(image, mask)
+        if self.transform is not None:
+            t = self.transform(image=image, mask=mask)
+            image = t["image"]
+            mask = t["mask"]
+        if isinstance(mask, torch.Tensor):
+            mask_t = mask
+        else:
+            mask_t = torch.from_numpy(np.asarray(mask))
+        if mask_t.dtype in (torch.float32, torch.float16):
+            mask_t = (mask_t * 255.0).round().clamp(0, 255).long()
+        else:
+            mask_t = mask_t.long()
+        return {
+            "image": image,
+            "mask": mask_t,
+            "path": ip,
+        }

desert_segmentation/data/mask_encoding.py ADDED Viewed

	@@ -0,0 +1,90 @@

+"""Decode 16-bit raw mask values to contiguous class indices and back."""
+from __future__ import annotations
+from dataclasses import dataclass
+from typing import Dict, List, Sequence, Tuple
+import numpy as np
+@dataclass(frozen=True)
+class RawMaskCodec:
+    """Maps dataset-specific raw label IDs (e.g. uint16 PNG values) to 0..num_classes-1."""
+    raw_ids: Tuple[int, ...]
+    class_names: Tuple[str, ...]
+    def __post_init__(self) -> None:
+        if len(self.raw_ids) != len(self.class_names):
+            raise ValueError("raw_ids and class_names must have the same length")
+        if len(set(self.raw_ids)) != len(self.raw_ids):
+            raise ValueError("raw_ids must be unique")
+    @property
+    def num_classes(self) -> int:
+        return len(self.raw_ids)
+    @property
+    def raw_to_index(self) -> Dict[int, int]:
+        return {r: i for i, r in enumerate(self.raw_ids)}
+    @property
+    def index_to_raw(self) -> Dict[int, int]:
+        return {i: r for i, r in enumerate(self.raw_ids)}
+    def _build_lut(self) -> np.ndarray:
+        max_id = max(self.raw_ids)
+        lut = np.full(max_id + 1, 255, dtype=np.uint8)
+        for i, rid in enumerate(self.raw_ids):
+            lut[rid] = i
+        return lut
+    def encode_mask(self, raw: np.ndarray) -> Tuple[np.ndarray, float]:
+        """Map raw uint16 labels to uint8 class indices 0..C-1. Returns (encoded, unknown_fraction)."""
+        if raw.ndim != 2:
+            raise ValueError(f"Expected HxW mask, got shape {raw.shape}")
+        lut = self._build_lut()
+        if int(raw.max()) >= lut.size:
+            raise ValueError(f"Mask value {int(raw.max())} exceeds LUT; extend raw_ids in config.")
+        out = lut[raw.astype(np.int64, copy=False)]
+        unknown_frac = float((out == 255).mean())
+        if unknown_frac > 0:
+            bad = out == 255
+            raise ValueError(
+                f"Unknown mask pixels: {unknown_frac:.6f} of image. "
+                f"Unique unknown raw values: {np.unique(raw[bad])[:16]}"
+            )
+        return out.astype(np.uint8), unknown_frac
+    def decode_to_raw(self, class_indices: np.ndarray) -> np.ndarray:
+        """Map class indices back to raw dataset IDs (for visualization/export)."""
+        arr = np.asarray(class_indices)
+        raw = np.zeros_like(arr, dtype=np.uint16)
+        for i, rid in enumerate(self.raw_ids):
+            raw[arr == i] = rid
+        return raw
+def build_codec_from_config(raw_id_list: Sequence[int], names: Sequence[str]) -> RawMaskCodec:
+    pairs = sorted(zip(raw_id_list, names), key=lambda x: x[0])
+    r, n = zip(*pairs)
+    return RawMaskCodec(raw_ids=tuple(int(x) for x in r), class_names=tuple(str(x) for x in n))
+def default_desert_codec() -> RawMaskCodec:
+    """Default codec for this workspace: 10 classes with fixed raw IDs (see dataset scan)."""
+    raw_ids = (100, 200, 300, 500, 550, 600, 700, 800, 7100, 10000)
+    names = (
+        "id_100",
+        "id_200",
+        "id_300",
+        "id_500",
+        "id_550",
+        "id_600",
+        "id_700",
+        "id_800",
+        "id_7100",
+        "id_10000",
+    )
+    return RawMaskCodec(raw_ids=raw_ids, class_names=names)

desert_segmentation/data/transforms.py ADDED Viewed

	@@ -0,0 +1,90 @@

+"""Albumentations pipelines for images and class masks."""
+from __future__ import annotations
+from typing import Any, Tuple
+import albumentations as A
+from albumentations.pytorch import ToTensorV2
+def _base_normalize() -> A.Normalize:
+    return A.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
+def build_train_transforms(
+    crop_size: int,
+    strong: bool = True,
+    ignore_index: int = 255,
+) -> A.Compose:
+    """Spatial crops are applied in `SegmentationDataset` (with rare-class bias)."""
+    del crop_size
+    geometric: list[Any] = [
+        A.HorizontalFlip(p=0.5),
+        A.ShiftScaleRotate(
+            shift_limit=0.02,
+            scale_limit=0.12,
+            rotate_limit=10,
+            border_mode=0,
+            mask_value=ignore_index,
+            p=0.55,
+        ),
+    ]
+    color: list[Any] = [
+        A.RandomBrightnessContrast(brightness_limit=0.25, contrast_limit=0.25, p=0.55),
+        A.HueSaturationValue(hue_shift_limit=14, sat_shift_limit=22, val_shift_limit=14, p=0.4),
+        A.GaussianBlur(blur_limit=(3, 5), p=0.22),
+        A.GaussNoise(var_limit=(8.0, 48.0), p=0.25),
+        A.ImageCompression(quality_lower=70, quality_upper=100, p=0.25),
+        A.RGBShift(r_shift_limit=18, g_shift_limit=18, b_shift_limit=18, p=0.28),
+    ]
+    if strong:
+        color.extend(
+            [
+                A.RandomSunFlare(
+                    flare_roi=(0.45, 0.0, 1.0, 0.42),
+                    angle_lower=0.4,
+                    p=0.12,
+                ),
+                A.RandomShadow(
+                    shadow_roi=(0, 0.42, 1, 1),
+                    num_shadows_lower=1,
+                    num_shadows_upper=2,
+                    p=0.16,
+                ),
+            ]
+        )
+    return A.Compose(
+        geometric + color + [_base_normalize(), ToTensorV2()],
+        additional_targets={"mask": "mask"},
+    )
+def build_val_transforms(
+    crop_size: int,
+    ignore_index: int = 255,
+) -> A.Compose:
+    return A.Compose(
+        [
+            A.LongestMaxSize(max_size=crop_size),
+            A.PadIfNeeded(
+                min_height=crop_size,
+                min_width=crop_size,
+                border_mode=0,
+                value=0,
+                mask_value=ignore_index,
+            ),
+            _base_normalize(),
+            ToTensorV2(),
+        ],
+        additional_targets={"mask": "mask"},
+    )
+def apply_transform(
+    transform: A.Compose,
+    image,
+    mask,
+) -> Tuple[Any, Any]:
+    out = transform(image=image, mask=mask)
+    return out["image"], out["mask"]

desert_segmentation/demo/__init__.py ADDED Viewed

	@@ -0,0 +1,15 @@

+from desert_segmentation.demo.inference_ui import (
+    build_legend_rows,
+    dominant_classes_markdown,
+    legend_table_html,
+    side_by_side_strip,
+    validate_rgb_array,
+)
+__all__ = [
+    "build_legend_rows",
+    "dominant_classes_markdown",
+    "legend_table_html",
+    "side_by_side_strip",
+    "validate_rgb_array",
+]

desert_segmentation/demo/inference_ui.py ADDED Viewed

	@@ -0,0 +1,95 @@

+"""Helpers for Gradio / web demo: legend, validation, composites."""
+from __future__ import annotations
+import html
+from typing import Any, Dict, List, Sequence, Tuple
+import numpy as np
+from desert_segmentation.utils.viz import palette
+def validate_rgb_array(
+    arr: np.ndarray,
+    max_side: int = 4096,
+    max_megapixels: float = 16.0,
+) -> None:
+    """Raises ValueError with a user-facing message if invalid or too large."""
+    if arr is None:
+        raise ValueError("No image provided.")
+    if not isinstance(arr, np.ndarray):
+        arr = np.asarray(arr)
+    if arr.ndim != 3 or arr.shape[2] != 3:
+        raise ValueError(f"Expected RGB image HxWx3, got shape {getattr(arr, 'shape', None)}")
+    h, w = arr.shape[0], arr.shape[1]
+    if h < 1 or w < 1:
+        raise ValueError("Image is empty.")
+    if max(h, w) > max_side:
+        raise ValueError(f"Image too large: max side is {max_side}px (got {h}x{w}).")
+    mp = (h * w) / 1_000_000.0
+    if mp > max_megapixels:
+        raise ValueError(f"Image too large: max {max_megapixels} megapixels (got {mp:.1f} MP).")
+def build_legend_rows(class_names: Sequence[str], num_classes: int, seed: int = 42) -> Tuple[List[Dict[str, Any]], np.ndarray]:
+    """Returns list of {index, name, hex, r, g, b} and color table (same seed as viz.palette)."""
+    colors = palette(num_classes, seed=seed)
+    rows: List[Dict[str, Any]] = []
+    for i, name in enumerate(class_names):
+        r, g, b = (int(colors[i, 0]), int(colors[i, 1]), int(colors[i, 2]))
+        rows.append(
+            {
+                "index": i,
+                "name": str(name),
+                "hex": f"#{r:02x}{g:02x}{b:02x}",
+                "r": r,
+                "g": g,
+                "b": b,
+            }
+        )
+    return rows, colors
+def legend_table_html(rows: Sequence[Dict[str, Any]]) -> str:
+    """Small HTML table with color swatches for Gradio gr.HTML."""
+    parts = [
+        "<table style='border-collapse:collapse;font-size:14px'>",
+        "<thead><tr><th>Swatch</th><th>#</th><th>Name</th><th>Hex</th></tr></thead><tbody>",
+    ]
+    for row in rows:
+        sw = f"background-color:{row['hex']};width:32px;height:22px;border:1px solid #888"
+        safe_name = html.escape(str(row["name"]))
+        parts.append(
+            f"<tr><td><div style='{sw}'></div></td>"
+            f"<td>{row['index']}</td><td>{safe_name}</td><td><code>{row['hex']}</code></td></tr>"
+        )
+    parts.append("</tbody></table>")
+    return "".join(parts)
+def dominant_classes_markdown(pred: np.ndarray, class_names: Sequence[str], top_k: int = 3) -> str:
+    flat = pred.reshape(-1).astype(np.int64, copy=False)
+    n = len(class_names)
+    counts = np.bincount(flat, minlength=n)
+    total = int(counts.sum())
+    if total == 0:
+        return "_No pixels._"
+    order = np.argsort(-counts)
+    lines: List[str] = []
+    for i in order[:top_k]:
+        c = int(counts[i])
+        if c == 0:
+            continue
+        pct = 100.0 * c / total
+        name = class_names[i] if i < len(class_names) else str(i)
+        lines.append(f"- **{name}** (class {i}): **{pct:.1f}%**")
+    return "\n".join(lines) if lines else "_No dominant classes._"
+def side_by_side_strip(rgb: np.ndarray, mask_rgb: np.ndarray, overlay_rgb: np.ndarray, gap: int = 8) -> np.ndarray:
+    """Horizontal strip: RGB | colored mask | overlay."""
+    h, w = rgb.shape[:2]
+    gap_arr = np.zeros((h, gap, 3), dtype=np.uint8)
+    return np.concatenate([rgb, gap_arr, mask_rgb, gap_arr, overlay_rgb], axis=1)

desert_segmentation/infer/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from desert_segmentation.infer.predict import predict_image, predict_folder
2	+
3	+ __all__ = ["predict_image", "predict_folder"]

desert_segmentation/infer/predict.py ADDED Viewed

	@@ -0,0 +1,210 @@

+"""Sliding-window inference with optional horizontal-flip TTA and ONNX export helper."""
+from __future__ import annotations
+import logging
+import os
+import time
+from pathlib import Path
+from typing import List, Optional, Tuple
+import numpy as np
+import torch
+import torch.nn as nn
+from PIL import Image
+from tqdm import tqdm
+from desert_segmentation.data.mask_encoding import RawMaskCodec, build_codec_from_config
+from desert_segmentation.models.factory import create_model
+from desert_segmentation.utils.viz import blend_overlay, colorize_mask, palette, save_triplet
+logger = logging.getLogger(__name__)
+def _gaussian_2d(h: int, w: int) -> np.ndarray:
+    yy, xx = np.ogrid[:h, :w]
+    cy, cx = (h - 1) / 2.0, (w - 1) / 2.0
+    sig = min(h, w) / 3.0
+    g = np.exp(-(((yy - cy) ** 2 + (xx - cx) ** 2) / (2.0 * sig**2)))
+    return g.astype(np.float32)
+def _preprocess(
+    rgb: np.ndarray,
+    mean: Tuple[float, float, float],
+    std: Tuple[float, float, float],
+) -> torch.Tensor:
+    # Keep float32 end-to-end: np.array(mean) defaults to float64 and would upcast x → conv2d dtype mismatch.
+    x = rgb.astype(np.float32, copy=False) / 255.0
+    m = np.asarray(mean, dtype=np.float32).reshape(1, 1, 3)
+    s = np.asarray(std, dtype=np.float32).reshape(1, 1, 3)
+    x = (x - m) / s
+    t = torch.from_numpy(np.ascontiguousarray(x)).permute(2, 0, 1).unsqueeze(0)
+    return t.float()
+@torch.no_grad()
+def _forward_logits(
+    model: nn.Module,
+    x: torch.Tensor,
+    device: torch.device,
+    tta_flip: bool,
+) -> torch.Tensor:
+    logits = model(x)
+    if not tta_flip:
+        return logits
+    xf = torch.flip(x, dims=[3])
+    lf = model(xf)
+    lf = torch.flip(lf, dims=[3])
+    return (logits + lf) * 0.5
+def _tile_starts(length: int, tile: int, stride: int) -> List[int]:
+    if length <= tile:
+        return [0]
+    last_pos = length - tile
+    starts = list(range(0, last_pos + 1, stride))
+    if not starts:
+        return [0]
+    if starts[-1] != last_pos:
+        starts.append(last_pos)
+    return sorted(set(starts))
+@torch.no_grad()
+def predict_image(
+    model: nn.Module,
+    image_np: np.ndarray,
+    device: torch.device,
+    tile_size: int,
+    overlap: float,
+    tta_flip: bool,
+    mean: Tuple[float, float, float] = (0.485, 0.456, 0.406),
+    std: Tuple[float, float, float] = (0.229, 0.224, 0.225),
+) -> np.ndarray:
+    """Returns HxW int class map."""
+    h, w = image_np.shape[:2]
+    if h <= tile_size and w <= tile_size:
+        t = _preprocess(image_np, mean, std).to(device)
+        logits = _forward_logits(model, t, device, tta_flip)
+        return logits.argmax(dim=1).squeeze(0).cpu().numpy().astype(np.int64)
+    stride = max(1, int(tile_size * (1.0 - overlap)))
+    g = _gaussian_2d(tile_size, tile_size)
+    n_ty = len(_tile_starts(h, tile_size, stride))
+    n_tx = len(_tile_starts(w, tile_size, stride))
+    H_pad = (n_ty - 1) * stride + tile_size
+    W_pad = (n_tx - 1) * stride + tile_size
+    pad_h = max(0, H_pad - h)
+    pad_w = max(0, W_pad - w)
+    img_p = np.pad(image_np, ((0, pad_h), (0, pad_w), (0, 0)), mode="reflect")
+    H, W = img_p.shape[:2]
+    t0 = _preprocess(img_p[0:tile_size, 0:tile_size], mean, std).to(device)
+    logits0 = _forward_logits(model, t0, device, tta_flip)
+    num_classes = int(logits0.shape[1])
+    acc = np.zeros((num_classes, H, W), dtype=np.float32)
+    weight = np.zeros((H, W), dtype=np.float32)
+    for y in _tile_starts(H, tile_size, stride):
+        for x in _tile_starts(W, tile_size, stride):
+            tile = img_p[y : y + tile_size, x : x + tile_size]
+            t = _preprocess(tile, mean, std).to(device)
+            logits = _forward_logits(model, t, device, tta_flip)
+            probs = torch.softmax(logits, dim=1)
+            ls = probs.squeeze(0).cpu().numpy()
+            acc[:, y : y + tile_size, x : x + tile_size] += ls * g
+            weight[y : y + tile_size, x : x + tile_size] += g
+    pred = np.argmax(acc, axis=0).astype(np.int64)
+    return pred[:h, :w]
+def _load_model_for_inference(
+    checkpoint_path: Path,
+    device: torch.device,
+) -> Tuple[nn.Module, dict, RawMaskCodec]:
+    try:
+        ckpt = torch.load(checkpoint_path, map_location=device, weights_only=False)
+    except TypeError:
+        ckpt = torch.load(checkpoint_path, map_location=device)
+    cfg = ckpt["config"]
+    raw_ids = cfg["data"]["raw_ids"]
+    names = ckpt.get("class_names") or tuple(cfg["data"].get("class_names") or ())
+    if not names:
+        names = tuple(str(x) for x in raw_ids)
+    codec = build_codec_from_config(raw_ids, names)
+    model = create_model(cfg["model"], num_classes=codec.num_classes).to(device)
+    if ckpt.get("model") is not None:
+        model.load_state_dict(ckpt["model"])
+    if ckpt.get("ema") is not None:
+        for n, p in model.named_parameters():
+            if n in ckpt["ema"]:
+                p.data.copy_(ckpt["ema"][n].to(device))
+    model.eval()
+    return model, cfg, codec
+@torch.no_grad()
+def predict_folder(
+    checkpoint_path: Path,
+    image_dir: Path,
+    out_dir: Path,
+    device: Optional[torch.device] = None,
+    limit: Optional[int] = None,
+) -> None:
+    device = device or torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model, cfg, codec = _load_model_for_inference(checkpoint_path, device)
+    icfg = cfg.get("inference") or {}
+    tile_size = int(icfg.get("tile_size", 512))
+    overlap = float(icfg.get("overlap", 0.25))
+    tta = bool(icfg.get("tta_flip", True))
+    out_dir.mkdir(parents=True, exist_ok=True)
+    colors = palette(codec.num_classes)
+    names = sorted(f for f in os.listdir(image_dir) if f.lower().endswith((".png", ".jpg", ".jpeg")))
+    if limit is not None:
+        names = names[:limit]
+    times: List[float] = []
+    for name in tqdm(names, desc="infer"):
+        ip = image_dir / name
+        rgb = np.array(Image.open(ip).convert("RGB"))
+        t0 = time.perf_counter()
+        pred = predict_image(model, rgb, device, tile_size, overlap, tta)
+        times.append(time.perf_counter() - t0)
+        overlay = blend_overlay(rgb, colorize_mask(pred, colors))
+        Image.fromarray(overlay).save(out_dir / f"pred_{name}")
+        save_triplet(out_dir / f"triplet_{name}", rgb, None, pred, colors)
+    if times:
+        mean_ms = float(np.mean(times) * 1000.0)
+        logger.info("mean inference time: %.2f ms (device=%s)", mean_ms, device)
+        with (out_dir / "latency.txt").open("w", encoding="utf-8") as f:
+            f.write(f"mean_ms_per_image={mean_ms:.4f}\n")
+            f.write(f"device={device}\n")
+def export_onnx(
+    checkpoint_path: Path,
+    out_onnx: Path,
+    height: int = 512,
+    width: int = 512,
+    opset: int = 17,
+) -> None:
+    device = torch.device("cpu")
+    model, _, _ = _load_model_for_inference(checkpoint_path, device)
+    model.eval()
+    dummy = torch.randn(1, 3, height, width, device=device)
+    torch.onnx.export(
+        model,
+        dummy,
+        str(out_onnx),
+        input_names=["input"],
+        output_names=["logits"],
+        opset_version=opset,
+        dynamic_axes={
+            "input": {0: "batch", 2: "height", 3: "width"},
+            "logits": {0: "batch", 2: "h", 3: "w"},
+        },
+    )

desert_segmentation/losses/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from desert_segmentation.losses.combined import build_loss
2	+
3	+ __all__ = ["build_loss"]

desert_segmentation/losses/combined.py ADDED Viewed

	@@ -0,0 +1,143 @@

+"""Segmentation losses: CE, weighted CE, focal, Dice, and combinations."""
+from __future__ import annotations
+from typing import Optional, Tuple
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+def _ce(
+    logits: torch.Tensor,
+    target: torch.Tensor,
+    weight: Optional[torch.Tensor],
+    ignore_index: int,
+    label_smoothing: float,
+) -> torch.Tensor:
+    return F.cross_entropy(
+        logits,
+        target,
+        weight=weight,
+        ignore_index=ignore_index,
+        label_smoothing=label_smoothing,
+    )
+def _focal_ce(
+    logits: torch.Tensor,
+    target: torch.Tensor,
+    gamma: float,
+    weight: Optional[torch.Tensor],
+    ignore_index: int,
+) -> torch.Tensor:
+    log_probs = F.log_softmax(logits, dim=1)
+    probs = log_probs.exp()
+    tgt = target.clone()
+    valid = tgt != ignore_index
+    tgt_clamped = tgt.clone()
+    tgt_clamped[~valid] = 0
+    log_pt = log_probs.gather(1, tgt_clamped.unsqueeze(1)).squeeze(1)
+    pt = probs.gather(1, tgt_clamped.unsqueeze(1)).squeeze(1)
+    focal = (1 - pt) ** gamma * (-log_pt)
+    if weight is not None:
+        focal = focal * weight[tgt_clamped]
+    focal = focal * valid.float()
+    return focal.sum() / (valid.float().sum().clamp_min(1.0))
+def _multiclass_dice(
+    logits: torch.Tensor,
+    target: torch.Tensor,
+    ignore_index: int,
+    eps: float = 1e-6,
+) -> torch.Tensor:
+    probs = F.softmax(logits, dim=1)
+    n, c, _, _ = probs.shape
+    tgt = target
+    valid = tgt != ignore_index
+    dice_losses = []
+    for k in range(c):
+        pk = probs[:, k]
+        tk = (tgt == k).float()
+        m = valid.float()
+        pk, tk = pk * m, tk * m
+        inter = (pk * tk).sum(dim=(1, 2))
+        denom = pk.sum(dim=(1, 2)) + tk.sum(dim=(1, 2)) + eps
+        dice = 1.0 - (2.0 * inter + eps) / denom
+        dice_losses.append(dice.mean())
+    return torch.stack(dice_losses).mean()
+class CombinedSegLoss(nn.Module):
+    def __init__(
+        self,
+        mode: str,
+        num_classes: int,
+        ignore_index: int = 255,
+        class_weights: Optional[torch.Tensor] = None,
+        dice_weight: float = 0.5,
+        label_smoothing: float = 0.05,
+        focal_gamma: float = 2.0,
+    ) -> None:
+        super().__init__()
+        self.mode = mode
+        self.num_classes = num_classes
+        self.ignore_index = ignore_index
+        self.register_buffer("class_weights", class_weights if class_weights is not None else torch.ones(num_classes))
+        self.dice_weight = dice_weight
+        self.label_smoothing = label_smoothing
+        self.focal_gamma = focal_gamma
+    def forward(self, logits: torch.Tensor, target: torch.Tensor) -> Tuple[torch.Tensor, dict]:
+        w = self.class_weights
+        if self.mode == "ce":
+            loss = _ce(logits, target, None, self.ignore_index, self.label_smoothing)
+        elif self.mode == "weighted_ce":
+            loss = _ce(logits, target, w, self.ignore_index, self.label_smoothing)
+        elif self.mode == "focal_ce":
+            loss = _focal_ce(logits, target, self.focal_gamma, w, self.ignore_index)
+        elif self.mode == "ce_dice":
+            ce = _ce(logits, target, w, self.ignore_index, self.label_smoothing)
+            dice = _multiclass_dice(logits, target, self.ignore_index)
+            loss = ce + self.dice_weight * dice
+        elif self.mode == "focal_ce_dice":
+            focal = _focal_ce(logits, target, self.focal_gamma, w, self.ignore_index)
+            dice = _multiclass_dice(logits, target, self.ignore_index)
+            loss = focal + self.dice_weight * dice
+        else:
+            raise ValueError(f"Unknown loss mode {self.mode}")
+        return loss, {"loss": float(loss.detach().cpu())}
+def build_loss(
+    loss_cfg: dict,
+    num_classes: int,
+    class_weights: Optional[torch.Tensor],
+    ignore_index: int,
+) -> CombinedSegLoss:
+    mode = loss_cfg.get("name", "ce_dice")
+    return CombinedSegLoss(
+        mode=mode,
+        num_classes=num_classes,
+        ignore_index=ignore_index,
+        class_weights=class_weights,
+        dice_weight=float(loss_cfg.get("dice_weight", 0.5)),
+        label_smoothing=float(loss_cfg.get("label_smoothing", 0.0)),
+        focal_gamma=float(loss_cfg.get("focal_gamma", 2.0)),
+    )
+def compute_class_weights_from_freq(
+    freq: torch.Tensor,
+    cap: float = 15.0,
+    eps: float = 1e-6,
+) -> torch.Tensor:
+    """Inverse log frequency with mean normalization and per-class cap on max/min ratio."""
+    w = 1.0 / torch.log(freq + eps)
+    w = w / w.mean()
+    ratio = w / w.median()
+    ratio = torch.clamp(ratio, max=cap)
+    w = ratio * w.median()
+    return w

desert_segmentation/metrics/__init__.py ADDED Viewed

	@@ -0,0 +1,15 @@

+from desert_segmentation.metrics.iou import (
+    IoUMetrics,
+    compute_confusion,
+    confusion_to_accuracy_metrics,
+    gt_pixel_counts,
+    valid_class_miou_from_confusion,
+)
+__all__ = [
+    "IoUMetrics",
+    "compute_confusion",
+    "confusion_to_accuracy_metrics",
+    "gt_pixel_counts",
+    "valid_class_miou_from_confusion",
+]

desert_segmentation/metrics/iou.py ADDED Viewed

	@@ -0,0 +1,143 @@

+"""Per-class IoU, mIoU, frequency-weighted IoU, confusion matrix."""
+from __future__ import annotations
+from typing import Dict, Optional, Tuple, Union
+import numpy as np
+import torch
+def compute_confusion(
+    logits: torch.Tensor,
+    target: torch.Tensor,
+    num_classes: int,
+    ignore_index: int = 255,
+) -> torch.Tensor:
+    """Accumulate confusion matrix (pred rows, target columns) — shape CxC."""
+    pred = logits.argmax(dim=1).view(-1)
+    tgt = target.view(-1)
+    valid = tgt != ignore_index
+    pred = pred[valid]
+    tgt = tgt[valid]
+    if pred.numel() == 0:
+        return torch.zeros(num_classes, num_classes, dtype=torch.int64, device=logits.device)
+    idx = tgt * num_classes + pred
+    cm = torch.bincount(idx, minlength=num_classes * num_classes).reshape(num_classes, num_classes)
+    return cm
+def confusion_to_accuracy_metrics(
+    cm: Union[np.ndarray, torch.Tensor],
+    eps: float = 1e-12,
+) -> Dict[str, float | np.ndarray]:
+    """Pixel accuracies from confusion ``cm[gt_i, pred_j]`` (same layout as ``IoUMetrics``).
+    - **global_pixel_accuracy:** ``trace(cm) / sum(cm)`` — fraction of pixels correct.
+    - **mean_class_accuracy:** mean of per-class **recall** ``cm[k,k] / sum_j cm[k,j]`` over
+      classes with at least one ground-truth pixel (ignores empty rows).
+    Returns ``per_class_recall`` aligned with class index for optional reporting.
+    """
+    if isinstance(cm, torch.Tensor):
+        cm = cm.detach().cpu().numpy()
+    cm = np.asarray(cm, dtype=np.float64)
+    total = cm.sum()
+    if total <= eps:
+        z = np.zeros(cm.shape[0], dtype=np.float64)
+        return {
+            "global_pixel_accuracy": 0.0,
+            "mean_class_accuracy": 0.0,
+            "per_class_recall": z,
+        }
+    trace = np.trace(cm)
+    global_acc = float(trace / total)
+    row_sums = cm.sum(axis=1)
+    diag = np.diag(cm)
+    with np.errstate(divide="ignore", invalid="ignore"):
+        per_class_recall = np.where(row_sums > eps, diag / np.maximum(row_sums, eps), np.nan)
+    present = row_sums > eps
+    mean_class_acc = (
+        float(np.nanmean(per_class_recall[present])) if np.any(present) else 0.0
+    )
+    return {
+        "global_pixel_accuracy": global_acc,
+        "mean_class_accuracy": mean_class_acc,
+        "per_class_recall": per_class_recall,
+    }
+def gt_pixel_counts(cm: Union[np.ndarray, torch.Tensor]) -> np.ndarray:
+    """Ground-truth pixel counts per class: ``sum_j cm[gt_k, pred_j]`` (row sums)."""
+    if isinstance(cm, torch.Tensor):
+        cm = cm.detach().cpu().numpy()
+    cm = np.asarray(cm, dtype=np.float64)
+    return np.sum(cm, axis=1).astype(np.int64)
+def valid_class_miou_from_confusion(
+    cm: Union[np.ndarray, torch.Tensor],
+    eps: float = 1e-6,
+) -> float:
+    """Mean IoU over classes that have at least one ground-truth pixel on the val set.
+    Unlike full mIoU (mean over all classes, often many zeros when a class is absent from
+    val GT), this only averages **finite** per-class IoU values for rows with ``GT > 0``.
+    Returns ``0.0`` if no class has any GT pixels.
+    """
+    if isinstance(cm, torch.Tensor):
+        cm = cm.detach().cpu().numpy()
+    cm = np.asarray(cm, dtype=np.float64)
+    diag = np.diag(cm)
+    rows = cm.sum(axis=1)
+    cols = cm.sum(axis=0)
+    union = rows + cols - diag + eps
+    with np.errstate(divide="ignore", invalid="ignore"):
+        iou = diag / union
+    present = rows > 0
+    if not np.any(present):
+        return 0.0
+    finite = present & np.isfinite(iou)
+    if not np.any(finite):
+        return 0.0
+    return float(np.mean(iou[finite]))
+def confusion_to_iou(cm: torch.Tensor) -> Tuple[torch.Tensor, float, float]:
+    """Returns per-class IoU, mean IoU, frequency-weighted IoU."""
+    diag = torch.diag(cm).float()
+    rows = cm.sum(dim=1).float()
+    cols = cm.sum(dim=0).float()
+    union = rows + cols - diag + 1e-6
+    iou = diag / union
+    miou = iou[torch.isfinite(iou)].mean().item()
+    freq = cols / (cols.sum() + 1e-6)
+    fw_iou = (iou * freq).sum().item()
+    return iou, miou, fw_iou
+class IoUMetrics:
+    def __init__(self, num_classes: int, ignore_index: int = 255, device: Optional[torch.device] = None):
+        self.num_classes = num_classes
+        self.ignore_index = ignore_index
+        self.device = device or torch.device("cpu")
+        self.reset()
+    def reset(self) -> None:
+        self._cm = torch.zeros(self.num_classes, self.num_classes, dtype=torch.int64, device=self.device)
+    @torch.no_grad()
+    def update(self, logits: torch.Tensor, target: torch.Tensor) -> None:
+        logits = logits.to(self.device)
+        target = target.to(self.device)
+        self._cm += compute_confusion(logits, target, self.num_classes, self.ignore_index).to(self.device)
+    def compute(self) -> Dict[str, float | np.ndarray]:
+        cm = self._cm.cpu()
+        iou, miou, fw_iou = confusion_to_iou(cm)
+        return {
+            "per_class_iou": iou.numpy(),
+            "miou": miou,
+            "fw_iou": fw_iou,
+            "confusion": cm.numpy(),
+        }

desert_segmentation/models/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from desert_segmentation.models.factory import create_model
2	+
3	+ __all__ = ["create_model"]

desert_segmentation/models/factory.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""Build segmentation models via segmentation_models_pytorch."""
+from __future__ import annotations
+from typing import Any, Dict
+import segmentation_models_pytorch as smp
+import torch.nn as nn
+def create_model(model_cfg: Dict[str, Any], num_classes: int) -> nn.Module:
+    arch = (model_cfg.get("architecture") or "deeplabv3plus").lower()
+    encoder_name = model_cfg.get("encoder_name", "resnet50")
+    encoder_weights = model_cfg.get("encoder_weights", "imagenet")
+    if arch == "deeplabv3plus":
+        return smp.DeepLabV3Plus(
+            encoder_name=encoder_name,
+            encoder_weights=encoder_weights,
+            in_channels=3,
+            classes=num_classes,
+        )
+    if arch == "unet":
+        return smp.Unet(
+            encoder_name=encoder_name,
+            encoder_weights=encoder_weights,
+            in_channels=3,
+            classes=num_classes,
+        )
+    if arch == "fpn":
+        return smp.FPN(
+            encoder_name=encoder_name,
+            encoder_weights=encoder_weights,
+            in_channels=3,
+            classes=num_classes,
+        )
+    raise ValueError(f"Unknown architecture: {arch}")

desert_segmentation/train/__init__.py ADDED Viewed

	@@ -0,0 +1,4 @@

+from desert_segmentation.train.evaluate import evaluate
+from desert_segmentation.train.trainer import train
+__all__ = ["evaluate", "train"]

desert_segmentation/train/evaluate.py ADDED Viewed

	@@ -0,0 +1,30 @@

+"""Validation loop and metric aggregation."""
+from __future__ import annotations
+from typing import Optional
+import torch
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+from desert_segmentation.metrics.iou import IoUMetrics
+@torch.no_grad()
+def evaluate(
+    model: torch.nn.Module,
+    loader: DataLoader,
+    device: torch.device,
+    num_classes: int,
+    ignore_index: int = 255,
+    desc: str = "val",
+) -> dict:
+    model.eval()
+    metrics = IoUMetrics(num_classes=num_classes, ignore_index=ignore_index, device=device)
+    for batch in tqdm(loader, desc=desc, leave=False):
+        images = batch["image"].to(device, non_blocking=True)
+        masks = batch["mask"].to(device, non_blocking=True)
+        logits = model(images)
+        metrics.update(logits, masks)
+    return metrics.compute()

desert_segmentation/train/trainer.py ADDED Viewed

	@@ -0,0 +1,205 @@

+"""Training loop with AMP, cosine+warmup, EMA, best-mIoU checkpointing."""
+from __future__ import annotations
+import copy
+import json
+import logging
+import math
+from pathlib import Path
+from typing import Any, Dict, Optional
+import torch
+import torch.nn as nn
+from torch.cuda.amp import GradScaler, autocast
+from torch.optim import AdamW
+from torch.optim.lr_scheduler import LambdaLR
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+from desert_segmentation.train.evaluate import evaluate
+logger = logging.getLogger(__name__)
+class ModelEMA:
+    """Exponential moving average of model parameters."""
+    def __init__(self, model: nn.Module, decay: float = 0.999) -> None:
+        self.decay = decay
+        self.shadow: Dict[str, torch.Tensor] = {}
+        self._collect(model)
+    @torch.no_grad()
+    def _collect(self, model: nn.Module) -> None:
+        for n, p in model.named_parameters():
+            if p.requires_grad:
+                self.shadow[n] = p.detach().clone()
+    @torch.no_grad()
+    def update(self, model: nn.Module) -> None:
+        for n, p in model.named_parameters():
+            if not p.requires_grad:
+                continue
+            self.shadow[n].mul_(self.decay).add_(p.detach(), alpha=1.0 - self.decay)
+    @torch.no_grad()
+    def copy_to(self, model: nn.Module) -> None:
+        for n, p in model.named_parameters():
+            if n in self.shadow:
+                p.data.copy_(self.shadow[n])
+def _warmup_cosine_lambda(
+    total_steps: int,
+    warmup_steps: int,
+    min_ratio: float = 0.01,
+) -> Any:
+    def lr_lambda(step: int) -> float:
+        if step < warmup_steps:
+            return float(step + 1) / float(max(1, warmup_steps))
+        progress = (step - warmup_steps) / float(max(1, total_steps - warmup_steps))
+        return min_ratio + 0.5 * (1.0 - min_ratio) * (1.0 + math.cos(math.pi * progress))
+    return lr_lambda
+def train(
+    model: nn.Module,
+    train_loader: DataLoader,
+    val_loader: DataLoader,
+    criterion: nn.Module,
+    device: torch.device,
+    cfg: Dict[str, Any],
+    num_classes: int,
+    ignore_index: int,
+    checkpoint_dir: Path,
+    class_names: tuple[str, ...],
+    max_train_batches: Optional[int] = None,
+) -> Dict[str, Any]:
+    tcfg = cfg["train"]
+    epochs = int(tcfg["epochs"])
+    lr = float(tcfg["lr"])
+    wd = float(tcfg["weight_decay"])
+    amp_enabled = bool(tcfg.get("amp", True)) and torch.cuda.is_available()
+    clip = float(tcfg.get("gradient_clip", 0.0))
+    warmup_ratio = float(tcfg.get("warmup_ratio", 0.08))
+    patience = int(tcfg.get("early_stop_patience", 20))
+    log_interval = int(tcfg.get("log_interval", 20))
+    ema_cfg = cfg.get("ema") or {}
+    use_ema = bool(ema_cfg.get("enabled", False))
+    ema_decay = float(ema_cfg.get("decay", 0.999))
+    ema: Optional[ModelEMA] = ModelEMA(model, decay=ema_decay) if use_ema else None
+    opt = AdamW(model.parameters(), lr=lr, weight_decay=wd)
+    steps_per_epoch = max(1, len(train_loader))
+    total_steps = steps_per_epoch * epochs
+    warmup_steps = max(1, int(total_steps * warmup_ratio))
+    sched = LambdaLR(opt, _warmup_cosine_lambda(total_steps, warmup_steps))
+    scaler: Optional[GradScaler] = GradScaler() if amp_enabled else None
+    best_miou = -1.0
+    bad_epochs = 0
+    history: list = []
+    checkpoint_dir.mkdir(parents=True, exist_ok=True)
+    best_path = checkpoint_dir / "best.pt"
+    last_path = checkpoint_dir / "last.pt"
+    global_step = 0
+    for epoch in range(1, epochs + 1):
+        model.train()
+        running = 0.0
+        n_log = 0
+        pbar = tqdm(train_loader, desc=f"train {epoch}/{epochs}")
+        for batch_idx, batch in enumerate(pbar):
+            images = batch["image"].to(device, non_blocking=True)
+            masks = batch["mask"].to(device, non_blocking=True)
+            opt.zero_grad(set_to_none=True)
+            with autocast(enabled=amp_enabled):
+                logits = model(images)
+                loss, _ = criterion(logits, masks)
+            if scaler is not None:
+                scaler.scale(loss).backward()
+                if clip > 0:
+                    scaler.unscale_(opt)
+                    torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
+                scaler.step(opt)
+                scaler.update()
+            else:
+                loss.backward()
+                if clip > 0:
+                    torch.nn.utils.clip_grad_norm_(model.parameters(), clip)
+                opt.step()
+            sched.step()
+            global_step += 1
+            if ema is not None:
+                ema.update(model)
+            running += float(loss.detach().cpu())
+            n_log += 1
+            if global_step % log_interval == 0:
+                pbar.set_postfix(loss=f"{running / max(n_log, 1):.4f}")
+                running = 0.0
+                n_log = 0
+            if max_train_batches is not None and (batch_idx + 1) >= max_train_batches:
+                break
+        backup = copy.deepcopy(model.state_dict())
+        if ema is not None:
+            ema.copy_to(model)
+        val_metrics = evaluate(
+            model,
+            val_loader,
+            device,
+            num_classes=num_classes,
+            ignore_index=ignore_index,
+            desc=f"val {epoch}",
+        )
+        model.load_state_dict(backup)
+        miou = float(val_metrics["miou"])
+        row = {"epoch": epoch, "miou": miou, "fw_iou": float(val_metrics["fw_iou"])}
+        history.append(row)
+        logger.info(
+            "epoch %s | val mIoU=%.4f fwIoU=%.4f",
+            epoch,
+            miou,
+            row["fw_iou"],
+        )
+        torch.save(
+            {
+                "epoch": epoch,
+                "model": model.state_dict(),
+                "ema": ema.shadow if ema is not None else None,
+                "optimizer": opt.state_dict(),
+                "config": cfg,
+                "class_names": class_names,
+            },
+            last_path,
+        )
+        if miou > best_miou:
+            best_miou = miou
+            bad_epochs = 0
+            save_payload = {
+                "epoch": epoch,
+                "model": model.state_dict(),
+                "ema": ema.shadow if ema is not None else None,
+                "miou": miou,
+                "per_class_iou": val_metrics["per_class_iou"].tolist(),
+                "config": cfg,
+                "class_names": class_names,
+            }
+            torch.save(save_payload, best_path)
+            logger.info("saved new best checkpoint mIoU=%.4f -> %s", miou, best_path)
+        else:
+            bad_epochs += 1
+            if bad_epochs >= patience:
+                logger.info("early stopping at epoch %s (no improvement %s epochs)", epoch, patience)
+                break
+        with (checkpoint_dir / "history.json").open("w", encoding="utf-8") as f:
+            json.dump(history, f, indent=2)
+    return {"best_miou": best_miou, "best_path": str(best_path), "history": history}

desert_segmentation/utils/__init__.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from desert_segmentation.utils.seed import set_seed
2	+
3	+ __all__ = ["set_seed"]

desert_segmentation/utils/config.py ADDED Viewed

	@@ -0,0 +1,37 @@

+"""Load YAML config and resolve paths relative to workspace root."""
+from __future__ import annotations
+import os
+from pathlib import Path
+from typing import Any, Dict
+import yaml
+def load_config(path: Path | str, root: Path | None = None) -> Dict[str, Any]:
+    path = Path(path)
+    with path.open("r", encoding="utf-8") as f:
+        cfg = yaml.safe_load(f)
+    if root is None:
+        root = Path(cfg.get("root", ".")).resolve()
+    else:
+        root = Path(root).resolve()
+    cfg["root"] = str(root)
+    return cfg
+def resolve_path(root: Path, *parts: str) -> Path:
+    return (root / Path(*parts)).resolve()
+def get_paths(cfg: Dict[str, Any]) -> Dict[str, Path]:
+    root = Path(cfg["root"])
+    d = cfg["data"]
+    return {
+        "train_images": resolve_path(root, d["train_images"]),
+        "train_masks": resolve_path(root, d["train_masks"]),
+        "val_images": resolve_path(root, d["val_images"]),
+        "val_masks": resolve_path(root, d["val_masks"]),
+        "test_images": resolve_path(root, d["test_images"]),
+    }

desert_segmentation/utils/freq.py ADDED Viewed

	@@ -0,0 +1,66 @@

+"""Estimate class pixel frequencies from mask files (fast path for loss weighting)."""
+from __future__ import annotations
+import os
+from pathlib import Path
+from typing import List, Sequence
+import numpy as np
+import torch
+from PIL import Image
+from desert_segmentation.data.mask_encoding import RawMaskCodec
+def list_masks(dir_path: Path) -> List[str]:
+    return sorted(f for f in os.listdir(dir_path) if f.lower().endswith(".png"))
+@torch.no_grad()
+def estimate_pixel_frequencies(
+    masks_dir: Path,
+    codec: RawMaskCodec,
+    max_files: int | None = 800,
+) -> torch.Tensor:
+    paths = list_masks(masks_dir)
+    if max_files is not None:
+        paths = paths[:max_files]
+    counts = np.zeros(codec.num_classes, dtype=np.int64)
+    for name in paths:
+        raw = np.array(Image.open(masks_dir / name))
+        enc, _ = codec.encode_mask(raw.astype(np.uint16))
+        for c in range(codec.num_classes):
+            counts[c] += int((enc == c).sum())
+    freq = counts.astype(np.float64) / max(counts.sum(), 1)
+    return torch.tensor(freq, dtype=torch.float32)
+def per_image_sampling_weights(
+    masks_dir: Path,
+    image_basenames: Sequence[str],
+    codec: RawMaskCodec,
+    freq: torch.Tensor,
+    eps: float = 1e-6,
+) -> torch.DoubleTensor:
+    """Weights for ``WeightedRandomSampler``: upweight images containing rare classes.
+    For each mask, ``w_i = sum_{c : n_{ic}>0} 1 / (freq[c] + eps)``, then weights are
+    scaled to mean 1.0. ``image_basenames`` must match the order of
+    ``SegmentationDataset`` indices (same filenames as train pairs).
+    """
+    masks_dir = Path(masks_dir)
+    f = freq.detach().cpu().numpy().astype(np.float64)
+    raw_weights = np.zeros(len(image_basenames), dtype=np.float64)
+    for i, name in enumerate(image_basenames):
+        raw = np.array(Image.open(masks_dir / name))
+        enc, _ = codec.encode_mask(raw.astype(np.uint16))
+        present = np.zeros(codec.num_classes, dtype=bool)
+        for c in range(codec.num_classes):
+            present[c] = bool((enc == c).any())
+        raw_weights[i] = sum(1.0 / (f[c] + eps) for c in range(codec.num_classes) if present[c])
+    m = raw_weights.mean()
+    if m <= 0:
+        return torch.ones(len(image_basenames), dtype=torch.double)
+    scaled = raw_weights / m
+    return torch.tensor(scaled, dtype=torch.double)

desert_segmentation/utils/logging_utils.py ADDED Viewed

	@@ -0,0 +1,11 @@

+import logging
+import sys
+def setup_logging(level: int = logging.INFO) -> None:
+    logging.basicConfig(
+        level=level,
+        format="%(asctime)s [%(levelname)s] %(message)s",
+        datefmt="%H:%M:%S",
+        stream=sys.stdout,
+    )

desert_segmentation/utils/seed.py ADDED Viewed

	@@ -0,0 +1,13 @@

+import os
+import random
+import numpy as np
+import torch
+def set_seed(seed: int) -> None:
+    random.seed(seed)
+    np.random.seed(seed)
+    torch.manual_seed(seed)
+    torch.cuda.manual_seed_all(seed)
+    os.environ["PYTHONHASHSEED"] = str(seed)

desert_segmentation/utils/viz.py ADDED Viewed

	@@ -0,0 +1,60 @@

+"""Color overlays and side-by-side panels for segmentation."""
+from __future__ import annotations
+from pathlib import Path
+from typing import List, Sequence, Tuple
+import numpy as np
+from PIL import Image, ImageDraw, ImageFont
+def palette(num_classes: int, seed: int = 42) -> np.ndarray:
+    rng = np.random.default_rng(seed)
+    colors = rng.integers(32, 256, size=(num_classes, 3), dtype=np.uint8)
+    colors[0] = np.array([128, 128, 128], dtype=np.uint8)
+    return colors
+def colorize_mask(mask: np.ndarray, colors: np.ndarray) -> np.ndarray:
+    """mask HxW int 0..C-1 -> RGB uint8"""
+    m = mask.clip(0, len(colors) - 1)
+    return colors[m]
+def blend_overlay(
+    image_rgb: np.ndarray,
+    colored_mask: np.ndarray,
+    alpha: float = 0.55,
+) -> np.ndarray:
+    return (image_rgb.astype(np.float32) * (1 - alpha) + colored_mask.astype(np.float32) * alpha).clip(
+        0, 255
+    ).astype(np.uint8)
+def save_triplet(
+    out_path: Path,
+    rgb: np.ndarray,
+    gt: np.ndarray | None,
+    pred: np.ndarray,
+    class_colors: np.ndarray,
+    titles: Tuple[str, str, str] = ("RGB", "GT", "Pred"),
+) -> None:
+    h, w = rgb.shape[:2]
+    panels: List[np.ndarray] = [rgb]
+    if gt is not None:
+        panels.append(blend_overlay(rgb, colorize_mask(gt, class_colors)))
+    else:
+        panels.append(np.zeros_like(rgb))
+    panels.append(blend_overlay(rgb, colorize_mask(pred, class_colors)))
+    # Optional text strip (simple border)
+    gap = 8
+    total_w = w * len(panels) + gap * (len(panels) - 1)
+    canvas = np.zeros((h, total_w, 3), dtype=np.uint8)
+    x = 0
+    for p in panels:
+        canvas[:, x : x + w] = p
+        x += w + gap
+    Image.fromarray(canvas).save(out_path)

eval_summary.json ADDED Viewed

	@@ -0,0 +1,47 @@

+{
+  "checkpoint": "D:\\codewizard 2.0\\checkpoints\\best.pt",
+  "val_dir": "D:\\codewizard 2.0\\training\\val\\Color_Images",
+  "num_val_samples": 317,
+  "miou": 0.07851162552833557,
+  "miou_all_classes": 0.07851162552833557,
+  "miou_valid_gt_classes": 0.07851162270064849,
+  "fw_iou": 0.3974744379520416,
+  "global_pixel_accuracy": 0.448105526939844,
+  "mean_class_accuracy": 0.15349954460756882,
+  "per_class_iou": {
+    "id_100": 0.0,
+    "id_200": 0.0,
+    "id_300": 0.25856709480285645,
+    "id_500": 0.0,
+    "id_550": 0.0,
+    "id_600": 0.0,
+    "id_700": 0.0,
+    "id_800": 0.0,
+    "id_7100": 0.0,
+    "id_10000": 0.5265491604804993
+  },
+  "per_class_recall": {
+    "id_100": 0.0,
+    "id_200": 0.0,
+    "id_300": 0.7182908230723474,
+    "id_500": 0.0,
+    "id_550": 0.0,
+    "id_600": 0.0,
+    "id_700": 0.0,
+    "id_800": 0.0,
+    "id_7100": 0.0,
+    "id_10000": 0.8167046230033408
+  },
+  "val_gt_pixel_counts": {
+    "id_100": 1902003,
+    "id_200": 2808908,
+    "id_300": 9019195,
+    "id_500": 512976,
+    "id_550": 1976309,
+    "id_600": 1138100,
+    "id_700": 30968,
+    "id_800": 566002,
+    "id_7100": 11074438,
+    "id_10000": 17714653
+  }
+}

requirements-demo.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+# Optional: interactive Gradio demo (install alongside requirements.txt)
+# pip install -r requirements.txt -r requirements-demo.txt
+gradio>=4.44.0,<6

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+torch>=2.0.0
+torchvision>=0.15.0
+numpy>=1.24.0
+Pillow>=10.0.0
+PyYAML>=6.0
+# Pin avoids optional native build deps (e.g. stringzilla) on some Windows/Python setups
+albumentations>=1.3.1,<1.5
+segmentation-models-pytorch>=0.3.3
+tqdm>=4.66.0
+pytest>=7.4.0

scripts/demo_gradio.py ADDED Viewed

	@@ -0,0 +1,219 @@

+#!/usr/bin/env python3
+"""Gradio demo: upload RGB image, get colored mask, overlay, legend, and timing."""
+from __future__ import annotations
+import argparse
+import logging
+import os
+import sys
+import time
+from pathlib import Path
+from typing import Any, Dict, Tuple
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+import gradio as gr
+import numpy as np
+import torch
+from PIL import Image
+from desert_segmentation.demo.inference_ui import (
+    build_legend_rows,
+    dominant_classes_markdown,
+    legend_table_html,
+    side_by_side_strip,
+    validate_rgb_array,
+)
+from desert_segmentation.infer.predict import _load_model_for_inference, predict_image
+from desert_segmentation.utils.viz import blend_overlay, colorize_mask
+logger = logging.getLogger(__name__)
+_STATE: Dict[str, Any] = {}
+def _to_uint8_rgb(arr: Any) -> np.ndarray:
+    if arr is None:
+        raise gr.Error("Please upload an image.")
+    if isinstance(arr, Image.Image):
+        arr = np.array(arr.convert("RGB"))
+    a = np.asarray(arr)
+    if a.ndim == 2:
+        raise gr.Error("Expected a color RGB image, got grayscale.")
+    if a.ndim == 3 and a.shape[2] == 4:
+        a = a[:, :, :3]
+    if a.ndim != 3 or a.shape[2] != 3:
+        raise gr.Error(f"Expected HxWx3 RGB image, got shape {a.shape}.")
+    if np.issubdtype(a.dtype, np.floating) and float(a.max()) <= 1.0 + 1e-6:
+        a = (np.clip(a, 0.0, 1.0) * 255.0).round().astype(np.uint8)
+    elif a.dtype != np.uint8:
+        a = np.clip(a, 0, 255).astype(np.uint8)
+    return np.ascontiguousarray(a)
+def _init_state(checkpoint: Path, device: torch.device) -> None:
+    global _STATE
+    if _STATE:
+        return
+    logger.info("Loading checkpoint: %s", checkpoint)
+    model, cfg, codec = _load_model_for_inference(checkpoint, device)
+    icfg = cfg.get("inference") or {}
+    legend_rows, colors = build_legend_rows(codec.class_names, codec.num_classes, seed=42)
+    _STATE.update(
+        {
+            "model": model,
+            "cfg": cfg,
+            "codec": codec,
+            "device": device,
+            "icfg": icfg,
+            "legend_rows": legend_rows,
+            "colors": colors,
+            "legend_html_static": legend_table_html(legend_rows),
+        },
+    )
+    logger.info(
+        "Model ready | classes=%s | device=%s | default tile=%s overlap=%s tta=%s",
+        codec.num_classes,
+        device,
+        icfg.get("tile_size", 512),
+        icfg.get("overlap", 0.25),
+        icfg.get("tta_flip", True),
+    )
+def _run(
+    image_input: Any,
+    use_tta: bool,
+    overlap: float,
+    tile_size: float,
+    max_side: int,
+    max_megapixels: float,
+) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, str, str]:
+    rgb = _to_uint8_rgb(image_input)
+    try:
+        validate_rgb_array(rgb, max_side=max_side, max_megapixels=max_megapixels)
+    except ValueError as e:
+        raise gr.Error(str(e)) from e
+    st = _STATE
+    model = st["model"]
+    device = st["device"]
+    icfg = st["icfg"]
+    codec = st["codec"]
+    colors = st["colors"]
+    tile = int(round(float(tile_size))) if tile_size is not None else int(icfg.get("tile_size", 512))
+    tile = max(256, min(tile, 2048))
+    ov = float(overlap)
+    ov = max(0.0, min(ov, 0.5))
+    t0 = time.perf_counter()
+    pred = predict_image(model, rgb, device, tile, ov, bool(use_tta))
+    ms = (time.perf_counter() - t0) * 1000.0
+    colored = colorize_mask(pred, colors)
+    overlay = blend_overlay(rgb, colored)
+    strip = side_by_side_strip(rgb, colored, overlay)
+    dev_str = str(device)
+    if device.type == "cpu":
+        dev_str += " (CPU mode — slower than GPU)"
+    stats = (
+        f"**Inference:** {ms:.1f} ms  \n"
+        f"**Device:** {dev_str}  \n"
+        f"**Tile size:** {tile} | **Overlap:** {ov:.2f} | **TTA:** {use_tta}"
+    )
+    dominant = "### Dominant classes in this image\n" + dominant_classes_markdown(pred, codec.class_names, top_k=3)
+    return rgb, colored, overlay, strip, stats, dominant
+def main() -> None:
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
+    parser = argparse.ArgumentParser(description="Gradio demo for desert semantic segmentation")
+    parser.add_argument("--root", type=str, default=os.environ.get("ROOT"), help="Workspace root (default: repo root or env ROOT)")
+    parser.add_argument(
+        "--checkpoint",
+        type=str,
+        default=os.environ.get("CHECKPOINT_PATH"),
+        help="Path to best.pt (default: env CHECKPOINT_PATH or <root>/checkpoints/best.pt)",
+    )
+    parser.add_argument("--host", type=str, default="127.0.0.1")
+    parser.add_argument("--port", type=int, default=7860)
+    parser.add_argument("--share", action="store_true", help="Create a temporary public Gradio link")
+    parser.add_argument("--max-side", type=int, default=4096)
+    parser.add_argument("--max-megapixels", type=float, default=16.0)
+    args = parser.parse_args()
+    root = Path(args.root or ROOT).resolve()
+    ckpt_arg = args.checkpoint or str(root / "checkpoints" / "best.pt")
+    ckpt = Path(ckpt_arg)
+    if not ckpt.is_absolute():
+        ckpt = (root / ckpt).resolve()
+    if not ckpt.is_file():
+        raise SystemExit(f"Checkpoint not found: {ckpt}")
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    _init_state(ckpt, device)
+    icfg = _STATE["icfg"]
+    def_tta = bool(icfg.get("tta_flip", True))
+    def_ov = float(icfg.get("overlap", 0.25))
+    def_tile = float(icfg.get("tile_size", 512))
+    intro = """## Desert semantic segmentation demo
+This is **semantic segmentation**: each pixel is assigned one of several **classes** (terrain, vegetation, sky, etc.).
+It is **not** bounding-box object detection.
+**How to read the outputs:**
+- **Colored mask:** each color is one class (see legend).
+- **Overlay:** prediction blended on your photo.
+- **Strip:** original | mask | side-by-side for screenshots.
+_Confidence heatmaps for full-resolution sliding windows are not in this demo (v1); see README._
+"""
+    cpu_note = ""
+    if device.type == "cpu":
+        cpu_note = "\n\n> Running on **CPU** — expect slower inference. Use a CUDA GPU for best speed.\n"
+    with gr.Blocks(title="Desert segmentation", theme=gr.themes.Soft()) as demo:
+        gr.Markdown(intro + cpu_note)
+        inp = gr.Image(type="numpy", label="Upload RGB image", sources=["upload"])
+        with gr.Accordion("Advanced", open=False):
+            use_tta = gr.Checkbox(label="TTA (horizontal flip average)", value=def_tta)
+            overlap = gr.Slider(0.0, 0.5, value=def_ov, step=0.05, label="Tile overlap")
+            tile_sz = gr.Slider(256, 2048, value=int(def_tile), step=64, label="Tile size (pixels)")
+        run_btn = gr.Button("Run segmentation", variant="primary")
+        with gr.Row():
+            out_orig = gr.Image(label="Input", type="numpy")
+            out_mask = gr.Image(label="Colored class mask", type="numpy")
+            out_overlay = gr.Image(label="Overlay", type="numpy")
+        out_strip = gr.Image(label="RGB | mask | overlay", type="numpy")
+        stats_md = gr.Markdown("")
+        dominant_md = gr.Markdown("")
+        gr.Markdown("### Class legend (fixed palette)")
+        gr.HTML(_STATE["legend_html_static"])
+        def _fn(img, tta, ov, ts):
+            return _run(img, tta, ov, ts, args.max_side, args.max_megapixels)
+        run_btn.click(
+            fn=_fn,
+            inputs=[inp, use_tta, overlap, tile_sz],
+            outputs=[out_orig, out_mask, out_overlay, out_strip, stats_md, dominant_md],
+        )
+    logger.info("Launching Gradio on http://%s:%s", args.host, args.port)
+    demo.launch(server_name=args.host, server_port=args.port, share=args.share)
+if __name__ == "__main__":
+    main()

scripts/eval.py ADDED Viewed

	@@ -0,0 +1,146 @@

+#!/usr/bin/env python3
+"""Run validation, print metrics, save confusion matrix and overlays."""
+from __future__ import annotations
+import argparse
+import json
+import logging
+import os
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+import numpy as np
+import torch
+from PIL import Image
+from torch.utils.data import DataLoader
+from tqdm import tqdm
+from desert_segmentation.data.dataset import SegmentationDataset
+from desert_segmentation.data.mask_encoding import build_codec_from_config
+from desert_segmentation.data.transforms import build_val_transforms
+from desert_segmentation.models.factory import create_model
+from desert_segmentation.train.evaluate import evaluate
+from desert_segmentation.utils.config import get_paths, load_config
+from desert_segmentation.utils.logging_utils import setup_logging
+from desert_segmentation.utils.seed import set_seed
+from desert_segmentation.utils.viz import palette, save_triplet
+logger = logging.getLogger(__name__)
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--config", type=str, default=str(ROOT / "desert_segmentation" / "configs" / "default.yaml"))
+    parser.add_argument("--checkpoint", type=str, required=True)
+    parser.add_argument("--root", type=str, default=None)
+    parser.add_argument("--out_dir", type=str, default="eval_outputs")
+    parser.add_argument("--max_viz", type=int, default=24)
+    args = parser.parse_args()
+    root = Path(args.root or ROOT).resolve()
+    cfg = load_config(args.config, root=root)
+    setup_logging()
+    set_seed(int(cfg["train"]["seed"]))
+    paths = get_paths(cfg)
+    raw_ids = cfg["data"]["raw_ids"]
+    names = tuple(cfg["data"]["class_names"])
+    codec = build_codec_from_config(raw_ids, names)
+    ignore_index = int(cfg["data"].get("ignore_index", 255))
+    crop_size = int(cfg["data"]["crop_size"])
+    val_tf = build_val_transforms(crop_size=crop_size, ignore_index=ignore_index)
+    val_ds = SegmentationDataset(
+        paths["val_images"],
+        paths["val_masks"],
+        codec=codec,
+        transform=val_tf,
+        mode="val",
+        crop_size=crop_size,
+        rare_class_crop_prob=0.0,
+        ignore_index=ignore_index,
+        seed=int(cfg["train"]["seed"]),
+    )
+    nw = 0 if os.name == "nt" else int(cfg["data"].get("num_workers", 4))
+    val_loader = DataLoader(
+        val_ds,
+        batch_size=int(cfg["train"].get("val_batch_size", cfg["train"]["batch_size"])),
+        shuffle=False,
+        num_workers=nw,
+        pin_memory=torch.cuda.is_available(),
+    )
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    try:
+        ckpt = torch.load(Path(args.checkpoint), map_location=device, weights_only=False)
+    except TypeError:
+        ckpt = torch.load(Path(args.checkpoint), map_location=device)
+    cfg_ck = ckpt["config"]
+    model = create_model(cfg_ck["model"], num_classes=codec.num_classes).to(device)
+    if ckpt.get("ema") is not None:
+        for n, p in model.named_parameters():
+            if n in ckpt["ema"]:
+                p.data.copy_(ckpt["ema"][n].to(device))
+    else:
+        model.load_state_dict(ckpt["model"])
+    model.eval()
+    metrics = evaluate(model, val_loader, device, num_classes=codec.num_classes, ignore_index=ignore_index)
+    logger.info("mIoU=%.4f fwIoU=%.4f", metrics["miou"], metrics["fw_iou"])
+    per = metrics["per_class_iou"]
+    for i, name in enumerate(codec.class_names):
+        logger.info("  %s IoU=%.4f", name, float(per[i]))
+    out_dir = Path(args.out_dir)
+    if not out_dir.is_absolute():
+        out_dir = root / out_dir
+    out_dir.mkdir(parents=True, exist_ok=True)
+    with (out_dir / "metrics.json").open("w", encoding="utf-8") as f:
+        json.dump(
+            {
+                "miou": float(metrics["miou"]),
+                "fw_iou": float(metrics["fw_iou"]),
+                "per_class_iou": {codec.class_names[i]: float(per[i]) for i in range(len(codec.class_names))},
+            },
+            f,
+            indent=2,
+        )
+    np.save(out_dir / "confusion.npy", metrics["confusion"])
+    mean = np.array([0.485, 0.456, 0.406], dtype=np.float32).reshape(1, 1, 3)
+    std = np.array([0.229, 0.224, 0.225], dtype=np.float32).reshape(1, 1, 3)
+    colors = palette(codec.num_classes)
+    n = 0
+    with torch.no_grad():
+        for batch in tqdm(val_loader, desc="viz"):
+            images = batch["image"].to(device)
+            masks = batch["mask"].to(device)
+            logits = model(images)
+            pred = logits.argmax(dim=1).cpu().numpy()
+            gt = masks.cpu().numpy()
+            for b in range(images.shape[0]):
+                if n >= args.max_viz:
+                    break
+                t = images[b].cpu().permute(1, 2, 0).numpy()
+                rgb = (t * std + mean) * 255.0
+                rgb = np.clip(rgb, 0, 255).astype(np.uint8)
+                save_triplet(
+                    out_dir / f"val_{n:04d}.png",
+                    rgb,
+                    gt[b],
+                    pred[b],
+                    colors,
+                )
+                n += 1
+            if n >= args.max_viz:
+                break
+if __name__ == "__main__":
+    main()

scripts/eval_summary.py ADDED Viewed

	@@ -0,0 +1,259 @@

+#!/usr/bin/env python3
+"""Print segmentation metrics: mIoU (all classes + valid-GT-only), fwIoU, accuracies, GT counts.
+Runs a full validation pass by default (same setup as ``scripts/eval.py``). With
+``--from-checkpoint-only``, only prints metrics stored inside the checkpoint file
+(mIoU and per-class IoU when present); full metrics require a validation forward pass."""
+from __future__ import annotations
+import argparse
+import json
+import math
+import os
+import sys
+from pathlib import Path
+from typing import Any, Dict, List
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+import torch
+from torch.utils.data import DataLoader
+from desert_segmentation.data.dataset import SegmentationDataset
+from desert_segmentation.data.mask_encoding import build_codec_from_config
+from desert_segmentation.data.transforms import build_val_transforms
+from desert_segmentation.metrics.iou import (
+    confusion_to_accuracy_metrics,
+    gt_pixel_counts,
+    valid_class_miou_from_confusion,
+)
+from desert_segmentation.models.factory import create_model
+from desert_segmentation.train.evaluate import evaluate
+from desert_segmentation.utils.config import get_paths, load_config
+from desert_segmentation.utils.logging_utils import setup_logging
+from desert_segmentation.utils.seed import set_seed
+def _load_checkpoint(path: Path, device: torch.device) -> Dict[str, Any]:
+    try:
+        return torch.load(path, map_location=device, weights_only=False)
+    except TypeError:
+        return torch.load(path, map_location=device)
+def _print_table(rows: List[List[str]]) -> None:
+    widths = [max(len(rows[i][c]) for i in range(len(rows))) for c in range(len(rows[0]))]
+    for row in rows:
+        line = "  ".join(row[c].ljust(widths[c]) for c in range(len(row)))
+        print(line)
+def run_from_checkpoint_only(ckpt_path: Path) -> int:
+    ckpt = _load_checkpoint(ckpt_path, torch.device("cpu"))
+    print(f"Checkpoint: {ckpt_path.resolve()}")
+    print()
+    if "miou" in ckpt:
+        print(f"  mIoU (stored):     {float(ckpt['miou']):.6f}")
+    else:
+        print("  mIoU:              (not stored in this file)")
+    names = ckpt.get("class_names")
+    per = ckpt.get("per_class_iou")
+    if per is not None and names is not None:
+        print("  Per-class IoU (stored):")
+        for i, name in enumerate(names):
+            print(f"    [{i}] {name}: {float(per[i]):.6f}")
+    elif per is not None:
+        print("  Per-class IoU (stored):")
+        for i, v in enumerate(per):
+            print(f"    [{i}]: {float(v):.6f}")
+    else:
+        print("  Per-class IoU:     (not stored in this file)")
+    print()
+    print(
+        "Note: fwIoU, global pixel accuracy, and mean class accuracy are not saved in "
+        "checkpoints. Run without --from-checkpoint-only to compute them on the val set."
+    )
+    return 0
+def run_full_eval(args: argparse.Namespace) -> int:
+    root = Path(args.root or ROOT).resolve()
+    cfg = load_config(args.config, root=root)
+    setup_logging()
+    set_seed(int(cfg["train"]["seed"]))
+    paths = get_paths(cfg)
+    raw_ids = cfg["data"]["raw_ids"]
+    names = tuple(cfg["data"]["class_names"])
+    codec = build_codec_from_config(raw_ids, names)
+    ignore_index = int(cfg["data"].get("ignore_index", 255))
+    crop_size = int(cfg["data"]["crop_size"])
+    val_tf = build_val_transforms(crop_size=crop_size, ignore_index=ignore_index)
+    val_ds = SegmentationDataset(
+        paths["val_images"],
+        paths["val_masks"],
+        codec=codec,
+        transform=val_tf,
+        mode="val",
+        crop_size=crop_size,
+        rare_class_crop_prob=0.0,
+        ignore_index=ignore_index,
+        seed=int(cfg["train"]["seed"]),
+    )
+    nw = 0 if os.name == "nt" else int(cfg["data"].get("num_workers", 4))
+    val_loader = DataLoader(
+        val_ds,
+        batch_size=int(cfg["train"].get("val_batch_size", cfg["train"]["batch_size"])),
+        shuffle=False,
+        num_workers=nw,
+        pin_memory=torch.cuda.is_available(),
+    )
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    ckpt_path = Path(args.checkpoint)
+    ckpt = _load_checkpoint(ckpt_path, device)
+    cfg_ck = ckpt["config"]
+    model = create_model(cfg_ck["model"], num_classes=codec.num_classes).to(device)
+    if ckpt.get("ema") is not None:
+        for n, p in model.named_parameters():
+            if n in ckpt["ema"]:
+                p.data.copy_(ckpt["ema"][n].to(device))
+    else:
+        model.load_state_dict(ckpt["model"])
+    model.eval()
+    metrics = evaluate(
+        model,
+        val_loader,
+        device,
+        num_classes=codec.num_classes,
+        ignore_index=ignore_index,
+        desc="eval_summary",
+    )
+    cm = metrics["confusion"]
+    acc = confusion_to_accuracy_metrics(cm)
+    miou_valid = float(valid_class_miou_from_confusion(cm))
+    gt_counts = gt_pixel_counts(cm)
+    miou = float(metrics["miou"])
+    fw_iou = float(metrics["fw_iou"])
+    gpa = float(acc["global_pixel_accuracy"])
+    mca = float(acc["mean_class_accuracy"])
+    per_iou = metrics["per_class_iou"]
+    per_rec = acc["per_class_recall"]
+    def _rec_str(i: int) -> str:
+        v = float(per_rec[i])
+        if math.isnan(v):
+            return "n/a"
+        return f"{v:.6f}"
+    print()
+    print(f"Checkpoint: {ckpt_path.resolve()}")
+    print(f"Val images: {paths['val_images']}")
+    print(f"Val samples: {len(val_ds)}")
+    print()
+    print("  mIoU (all classes):     {:.6f}".format(miou))
+    print("  mIoU (classes w/ GT):   {:.6f}".format(miou_valid))
+    print("  Frequency-weighted IoU: {:.6f}".format(fw_iou))
+    print("  Global pixel accuracy:  {:.6f}".format(gpa))
+    print("  Mean class accuracy:    {:.6f}".format(mca))
+    print("    (mean of per-class recall over classes with GT pixels)")
+    print()
+    table: List[List[str]] = [["cls", "name", "IoU", "recall"]]
+    for i, name in enumerate(codec.class_names):
+        table.append(
+            [
+                str(i),
+                name,
+                f"{float(per_iou[i]):.6f}",
+                _rec_str(i),
+            ]
+        )
+    _print_table(table)
+    print()
+    print("  Val GT pixels per class (full val set):")
+    for i, name in enumerate(codec.class_names):
+        print(f"    [{i}] {name}: {int(gt_counts[i])}")
+    print()
+    payload = {
+        "checkpoint": str(ckpt_path.resolve()),
+        "val_dir": str(paths["val_images"]),
+        "num_val_samples": len(val_ds),
+        "miou": miou,
+        "miou_all_classes": miou,
+        "miou_valid_gt_classes": miou_valid,
+        "fw_iou": fw_iou,
+        "global_pixel_accuracy": gpa,
+        "mean_class_accuracy": mca,
+        "per_class_iou": {codec.class_names[i]: float(per_iou[i]) for i in range(len(codec.class_names))},
+        "per_class_recall": {
+            codec.class_names[i]: (None if math.isnan(float(per_rec[i])) else float(per_rec[i]))
+            for i in range(len(codec.class_names))
+        },
+        "val_gt_pixel_counts": {codec.class_names[i]: int(gt_counts[i]) for i in range(len(codec.class_names))},
+    }
+    if args.json_out:
+        out = Path(args.json_out)
+        if not out.is_absolute():
+            out = root / out
+        out.parent.mkdir(parents=True, exist_ok=True)
+        with out.open("w", encoding="utf-8") as f:
+            json.dump(payload, f, indent=2)
+        print(f"Wrote {out}")
+    return 0
+def main() -> None:
+    parser = argparse.ArgumentParser(description="Segmentation metric summary (val set).")
+    parser.add_argument(
+        "--checkpoint",
+        type=str,
+        default=None,
+        help="Path to .pt checkpoint (default: <root>/checkpoints/best.pt)",
+    )
+    parser.add_argument(
+        "--config",
+        type=str,
+        default=str(ROOT / "desert_segmentation" / "configs" / "default.yaml"),
+    )
+    parser.add_argument("--root", type=str, default=None, help="Workspace root (defaults to repo root)")
+    parser.add_argument(
+        "--from-checkpoint-only",
+        action="store_true",
+        help="Only print mIoU/per-class IoU stored in the file (no forward pass).",
+    )
+    parser.add_argument(
+        "--json-out",
+        type=str,
+        default=None,
+        help="Optional path to write full metrics JSON (relative to --root unless absolute).",
+    )
+    args = parser.parse_args()
+    root = Path(args.root or ROOT).resolve()
+    ck_path = Path(args.checkpoint) if args.checkpoint else root / "checkpoints" / "best.pt"
+    if args.from_checkpoint_only:
+        if not ck_path.is_file():
+            print(f"Error: checkpoint not found: {ck_path}", file=sys.stderr)
+            sys.exit(1)
+        sys.exit(run_from_checkpoint_only(ck_path))
+    args.checkpoint = str(ck_path)
+    args.root = str(root)
+    if not ck_path.is_file():
+        print(f"Error: checkpoint not found: {ck_path}", file=sys.stderr)
+        sys.exit(1)
+    sys.exit(run_full_eval(args))
+if __name__ == "__main__":
+    main()

scripts/infer.py ADDED Viewed

	@@ -0,0 +1,49 @@

+#!/usr/bin/env python3
+"""Run inference on testing/Color_Images; optional ONNX export."""
+from __future__ import annotations
+import argparse
+import logging
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+from desert_segmentation.infer.predict import export_onnx, predict_folder
+from desert_segmentation.utils.config import get_paths, load_config
+from desert_segmentation.utils.logging_utils import setup_logging
+logger = logging.getLogger(__name__)
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--config", type=str, default=str(ROOT / "desert_segmentation" / "configs" / "default.yaml"))
+    parser.add_argument("--checkpoint", type=str, required=True)
+    parser.add_argument("--root", type=str, default=None)
+    parser.add_argument("--out_dir", type=str, default="infer_outputs")
+    parser.add_argument("--limit", type=int, default=None)
+    parser.add_argument("--onnx", type=str, default=None, help="If set, export ONNX to this path and exit")
+    args = parser.parse_args()
+    root = Path(args.root or ROOT).resolve()
+    cfg = load_config(args.config, root=root)
+    setup_logging()
+    if args.onnx:
+        export_onnx(Path(args.checkpoint), Path(args.onnx))
+        logger.info("exported ONNX to %s", args.onnx)
+        return
+    paths = get_paths(cfg)
+    out_dir = Path(args.out_dir)
+    if not out_dir.is_absolute():
+        out_dir = root / out_dir
+    predict_folder(Path(args.checkpoint), paths["test_images"], out_dir, limit=args.limit)
+if __name__ == "__main__":
+    main()

scripts/train.py ADDED Viewed

	@@ -0,0 +1,166 @@

+#!/usr/bin/env python3
+"""Train semantic segmentation model from YAML config."""
+from __future__ import annotations
+import argparse
+import logging
+import os
+import sys
+from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
+import torch
+from torch.utils.data import DataLoader, WeightedRandomSampler
+from desert_segmentation.data.dataset import SegmentationDataset
+from desert_segmentation.data.mask_encoding import build_codec_from_config
+from desert_segmentation.data.transforms import build_train_transforms, build_val_transforms
+from desert_segmentation.losses.combined import build_loss, compute_class_weights_from_freq
+from desert_segmentation.models.factory import create_model
+from desert_segmentation.train.trainer import train
+from desert_segmentation.utils.config import get_paths, load_config
+from desert_segmentation.utils.freq import estimate_pixel_frequencies, per_image_sampling_weights
+from desert_segmentation.utils.logging_utils import setup_logging
+from desert_segmentation.utils.seed import set_seed
+logger = logging.getLogger(__name__)
+def main() -> None:
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--config",
+        type=str,
+        default=str(ROOT / "desert_segmentation" / "configs" / "default.yaml"),
+    )
+    parser.add_argument("--root", type=str, default=None, help="Workspace root (defaults to repo root)")
+    parser.add_argument("--epochs", type=int, default=None, help="Override epochs (smoke tests)")
+    parser.add_argument("--max_train_batches", type=int, default=None, help="Stop each epoch after N batches (smoke tests)")
+    args = parser.parse_args()
+    root = Path(args.root or ROOT).resolve()
+    cfg = load_config(args.config, root=root)
+    if args.epochs is not None:
+        cfg["train"]["epochs"] = int(args.epochs)
+    setup_logging()
+    set_seed(int(cfg["train"]["seed"]))
+    paths = get_paths(cfg)
+    raw_ids = cfg["data"]["raw_ids"]
+    names = tuple(cfg["data"]["class_names"])
+    codec = build_codec_from_config(raw_ids, names)
+    ignore_index = int(cfg["data"].get("ignore_index", 255))
+    crop_size = int(cfg["data"]["crop_size"])
+    train_tf = build_train_transforms(
+        crop_size=crop_size,
+        strong=bool(cfg.get("augmentation", {}).get("strong", True)),
+        ignore_index=ignore_index,
+    )
+    val_tf = build_val_transforms(crop_size=crop_size, ignore_index=ignore_index)
+    train_ds = SegmentationDataset(
+        paths["train_images"],
+        paths["train_masks"],
+        codec=codec,
+        transform=train_tf,
+        mode="train",
+        crop_size=crop_size,
+        rare_class_crop_prob=float(cfg["data"].get("rare_class_crop_prob", 0.35)),
+        ignore_index=ignore_index,
+        seed=int(cfg["train"]["seed"]),
+    )
+    val_ds = SegmentationDataset(
+        paths["val_images"],
+        paths["val_masks"],
+        codec=codec,
+        transform=val_tf,
+        mode="val",
+        crop_size=crop_size,
+        rare_class_crop_prob=0.0,
+        ignore_index=ignore_index,
+        seed=int(cfg["train"]["seed"]),
+    )
+    nw = int(cfg["data"].get("num_workers", 4))
+    if os.name == "nt":
+        nw = 0
+    val_loader = DataLoader(
+        val_ds,
+        batch_size=int(cfg["train"].get("val_batch_size", cfg["train"]["batch_size"])),
+        shuffle=False,
+        num_workers=nw,
+        pin_memory=torch.cuda.is_available(),
+    )
+    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+    model = create_model(cfg["model"], num_classes=codec.num_classes).to(device)
+    freq = estimate_pixel_frequencies(paths["train_masks"], codec, max_files=None)
+    cap = float(cfg.get("loss", {}).get("class_weight_cap", 15.0))
+    class_w = compute_class_weights_from_freq(freq, cap=cap).to(device)
+    logger.info("class pixel frequencies (train masks): %s", freq.tolist())
+    use_weighted_sampler = bool(cfg.get("data", {}).get("weighted_sampler", False))
+    sampler: WeightedRandomSampler | None = None
+    if use_weighted_sampler:
+        eps = float(cfg.get("data", {}).get("weighted_sampler_eps", 1e-6))
+        logger.info("computing per-image sampling weights (scanning train masks)...")
+        sample_w = per_image_sampling_weights(
+            paths["train_masks"],
+            train_ds.image_names,
+            codec,
+            freq,
+            eps=eps,
+        )
+        sampler = WeightedRandomSampler(
+            sample_w,
+            num_samples=len(train_ds),
+            replacement=True,
+            generator=torch.Generator().manual_seed(int(cfg["train"]["seed"])),
+        )
+    train_loader = DataLoader(
+        train_ds,
+        batch_size=int(cfg["train"]["batch_size"]),
+        shuffle=sampler is None,
+        sampler=sampler,
+        num_workers=nw,
+        pin_memory=torch.cuda.is_available(),
+        drop_last=True,
+    )
+    criterion = build_loss(
+        cfg.get("loss", {}),
+        num_classes=codec.num_classes,
+        class_weights=class_w,
+        ignore_index=ignore_index,
+    ).to(device)
+    ckpt_dir = Path(cfg["train"]["checkpoint_dir"])
+    if not ckpt_dir.is_absolute():
+        ckpt_dir = root / ckpt_dir
+    out = train(
+        model,
+        train_loader,
+        val_loader,
+        criterion,
+        device,
+        cfg,
+        num_classes=codec.num_classes,
+        ignore_index=ignore_index,
+        checkpoint_dir=ckpt_dir,
+        class_names=codec.class_names,
+        max_train_batches=args.max_train_batches,
+    )
+    logger.info("finished best_mIoU=%s path=%s", out["best_miou"], out["best_path"])
+if __name__ == "__main__":
+    main()

tests/test_confusion_metrics.py ADDED Viewed

	@@ -0,0 +1,50 @@

+"""Tests for accuracy metrics derived from confusion matrices."""
+import numpy as np
+from desert_segmentation.metrics.iou import (
+    confusion_to_accuracy_metrics,
+    valid_class_miou_from_confusion,
+)
+def test_perfect_confusion():
+    cm = np.eye(3, dtype=np.int64) * 100
+    out = confusion_to_accuracy_metrics(cm)
+    assert out["global_pixel_accuracy"] == 1.0
+    assert out["mean_class_accuracy"] == 1.0
+    assert np.allclose(out["per_class_recall"], [1.0, 1.0, 1.0])
+def test_two_class_mixed():
+    # GT: 100 class0, 100 class1; half wrong each
+    cm = np.array([[50, 50], [50, 50]], dtype=np.int64)
+    out = confusion_to_accuracy_metrics(cm)
+    assert out["global_pixel_accuracy"] == 0.5
+    assert abs(out["mean_class_accuracy"] - 0.5) < 1e-6
+def test_one_class_absent_in_gt():
+    # Only class 0 appears in val GT; 80 correct, 20 predicted as class 1.
+    cm = np.array([[80, 20], [0, 0]], dtype=np.int64)
+    out = confusion_to_accuracy_metrics(cm)
+    assert abs(out["global_pixel_accuracy"] - 0.8) < 1e-9
+    assert abs(out["mean_class_accuracy"] - 0.8) < 1e-9
+    assert np.isnan(out["per_class_recall"][1])
+def test_valid_class_miou_only_classes_with_gt():
+    # Two classes in GT; full mIoU averages zeros for empty rows if any — here both rows have GT.
+    cm = np.array([[90, 10], [10, 90]], dtype=np.int64)
+    # IoU class0: 90/(90+10+10)=90/110, class1: 90/110
+    v = valid_class_miou_from_confusion(cm)
+    iou0 = 90.0 / (90 + 10 + 10)
+    assert abs(v - iou0) < 1e-6
+def test_valid_class_miou_ignores_empty_gt_rows():
+    # Class 1 has no GT pixels; valid-class mIoU averages only class 0.
+    cm = np.array([[80, 20], [0, 0]], dtype=np.int64)
+    v = valid_class_miou_from_confusion(cm)
+    # IoU class 0: TP=80, union = rows[0]+cols[0]-TP = 100+80-80 = 100
+    assert abs(v - 0.8) < 1e-9

tests/test_mask_encoding.py ADDED Viewed

	@@ -0,0 +1,33 @@

+import numpy as np
+import pytest
+from desert_segmentation.data.mask_encoding import RawMaskCodec, default_desert_codec
+def test_roundtrip_known_ids():
+    codec = default_desert_codec()
+    h, w = 32, 48
+    raw = np.full((h, w), 100, dtype=np.uint16)
+    raw[:, :10] = 10000
+    raw[10:20, :] = 7100
+    enc, unk = codec.encode_mask(raw)
+    assert unk == 0.0
+    assert enc.shape == (h, w)
+    back = codec.decode_to_raw(enc)
+    assert np.array_equal(back, raw)
+def test_unknown_pixel_raises():
+    codec = RawMaskCodec(raw_ids=(1, 2), class_names=("a", "b"))
+    raw = np.array([[1, 2], [99, 1]], dtype=np.uint16)
+    with pytest.raises(ValueError):
+        codec.encode_mask(raw)
+def test_lut_all_ids():
+    codec = default_desert_codec()
+    for rid in codec.raw_ids:
+        raw = np.full((4, 4), rid, dtype=np.uint16)
+        enc, unk = codec.encode_mask(raw)
+        assert unk == 0.0
+        assert np.unique(enc).size == 1