SENTINEL — Multimodal AI for Early Water Pollution Detection
Scalable Environmental Network for Temporal Intelligence and Ecological Learning
Entry to the Stockholm Junior Water Prize (2026) · Bryan Cheng & Austin Jin · New York, USA
SENTINEL is a multimodal deep-learning early-warning system for water contamination. It fuses five independent environmental data streams — water chemistry, satellite imagery, microbial DNA, gene-expression stress signals, and aquatic-organism behavior — each read by a dedicated encoder. When one stream flags an anomaly, the others confirm and classify it. Trained entirely on public data on a single NVIDIA RTX 4060 (8 GB).
- Code & architectures: https://github.com/austinjin1/SENTINEL-STOCKHOLM
- These files are PyTorch checkpoints (
torch.load). Model class definitions live in the GitHub repo undersentinel/models/.
Models
Every model here corresponds to a model presented in the SENTINEL paper.
| File | Architecture | Role | Key metric (real, held-out) |
|---|---|---|---|
aquassm.pt |
AquaSSM — continuous-time state-space model (8 channels) | Sensor encoder | AUROC 0.939 [0.934–0.943] |
hydrovit.pt |
HydroViT — CNN–ViT hybrid with band attention | Satellite encoder | water-temp R² 0.893 |
microbiomenet.pt |
MicroBiomeNet — CLR + sparse-gate + transformer | Microbial encoder | macro-F1 0.899 |
toxigene.pt |
ToxiGene — hierarchical sparse (Reactome/AOP) | Molecular encoder | F1 0.886 |
biomotion.pt |
BioMotion — denoising diffusion U-Net | Behavioral encoder | AUROC 0.807 |
sentinel_fusion.pt |
SENTINEL-Fusion — Perceiver IO cross-modal attention | 5-modality fusion | AUROC 0.939 [0.922–0.956] |
sentinel_fusion_heads.pt |
Four output heads (anomaly / type / source / cascade) | Fusion heads | — |
stream_gnn.pt |
Stream-network GNN (NHDPlus topology, 561 sites) | Downstream propagation | AUROC 1.000 |
digital_twin.pt |
Neural-ODE / physics digital twin (≈342K params, 10 state vars) | Ecosystem simulation | MSE 786 (−45.5% vs physics) |
hydrodensenet.pt |
HydroDenseNet — DenseNet121 + SpectralStem + CBAM (8.4M params) | SENTINEL-Lite drone screening | temp R² 0.78, DO R² 0.46 |
species_health.pt |
Keystone-species health & occupancy model | Ecological linkage | see paper §3.3 |
waterborne_disease.pt |
Waterborne-disease risk model (cyanotoxin, Vibrio, Naegleria, schistosomiasis) | Public-health linkage | AUROC 0.988 |
Reported metrics are computed on real public data with held-out splits and bootstrap 95% CIs (paper Table 2). System-level fusion reaches AUROC = 0.992 in the controlled 31-condition ablation and 0.939 on the full real-data holdout.
Training data (SENTINEL-DB)
390M+ records from 19 public sources: USGS NWIS, NEON, EPA WQP/ECOTOX, GRQA, ESA Sentinel-2, Earth Microbiome Project, NCBI GEO, USGS BioData, NHDPlusV2, GBIF. All publicly accessible.
Usage
import torch
ckpt = torch.load("aquassm.pt", map_location="cpu") # state_dict / checkpoint
# Instantiate the matching architecture from the GitHub repo (sentinel/models/...),
# then load_state_dict. See repo README for per-encoder loading examples.
Limitations
Geographic bias (~95% US/Europe training data); per-modality rather than fully integrated 5-modal validation; cannot detect acute instantaneous releases; under strict spatial holdout transcriptomic generalization remains hard (ToxiGene real-GEO F1 = 0.49); SENTINEL-Lite imagery-only predictions are not yet calibrated for high-stakes use. See paper §4 for full discussion.
Citation
@misc{cheng2026sentinel,
title = {SENTINEL: Multimodal Artificial Intelligence for Early Water Pollution Detection},
author = {Cheng, Bryan and Jin, Austin},
year = {2026},
note = {Stockholm Junior Water Prize 2026},
url = {https://github.com/austinjin1/SENTINEL-STOCKHOLM}
}
License: MIT