Eurus / src /eurus /tools /analysis_guide.py
dmpantiu's picture
Upload folder using huggingface_hub
a62555d verified
"""
Analysis Guide Tool
====================
Provides methodological guidance for climate data analysis using python_repl.
This tool returns TEXT INSTRUCTIONS (not executable code!) for:
- What approach to take
- How to structure the analysis
- Quality checks and pitfalls
- Best practices for visualization
The agent uses python_repl to execute the actual analysis.
"""
from typing import Literal
from pydantic import BaseModel, Field
from langchain_core.tools import StructuredTool
# =============================================================================
# ANALYSIS GUIDES
# =============================================================================
ANALYSIS_GUIDES = {
# -------------------------------------------------------------------------
# DATA OPERATIONS
# -------------------------------------------------------------------------
"load_data": """
## Loading ERA5 Data
### When to use
- Initializing any analysis
- Loading downloaded Zarr data
### Workflow
1. **Load data** — Use `xr.open_dataset('path', engine='zarr')` or `xr.open_zarr('path')`.
2. **Inspect dataset** — Check coordinates and available variables.
3. **Convert units** before any analysis:
- Temp (`t2`, `d2`, `skt`, `sst`, `stl1`): subtract 273.15 → °C
- Precip (`tp`, `cp`, `lsp`): multiply by 1000 → mm
- Pressure (`sp`, `mslp`): divide by 100 → hPa
### Quality Checklist
- [ ] Data loaded lazily (avoid `.load()` on large datasets)
- [ ] Units converted before aggregations
- [ ] Coordinate names verified (latitude vs lat, etc.)
### Common Pitfalls
- ⚠️ Loading multi-year global data into memory causes OOM. Keep operations lazy until subsetted.
- ⚠️ Some Zarr stores have `valid_time` instead of `time` — check with `.coords`.
- ⚠️ CRITICAL — LONGITUDE WRAPPING: ERA5 natively uses 0-360° longitudes. If your region is in the Western Hemisphere (Americas, Atlantic) or crosses the Prime Meridian, you MUST convert longitudes to -180/+180 BEFORE slicing or plotting. Use `ds.assign_coords(longitude=(((ds.longitude + 180) % 360) - 180)).sortby('longitude')`. Failing to do this causes maps to be 95% blank space with data crushed into a tiny sliver.
- ⚠️ UNIT SAFETY: When computing temperature DIFFERENCES or ANOMALIES, do NOT subtract 273.15 from the result. A temperature difference in Kelvin is numerically identical to a difference in °C. Subtracting 273.15 from an anomaly produces absurd ±200°C values.
""",
"spatial_subset": """
## Spatial Subsetting
### When to use
- Focusing on a specific region, country, or routing bounding box
- Reducing data size before heavy analysis
### Workflow
1. **Determine bounds** — Find min/max latitude and longitude.
2. **Check coordinate orientation** — ERA5 latitude is often descending (90 to -90).
3. **Slice data** — `.sel(latitude=slice(north, south), longitude=slice(west, east))`.
### Quality Checklist
- [ ] Latitude sliced from North to South (max to min) for descending coords
- [ ] Longitudes match dataset format (convert -180/180 ↔ 0/360 if needed)
- [ ] Result is not empty — verify with `.shape`
### Common Pitfalls
- ⚠️ Slicing `slice(south, north)` on descending coords → empty result.
- ⚠️ Crossing the prime meridian in 0-360 coords requires concatenating two slices.
- ⚠️ Use `.sel(method='nearest')` for point extraction, not exact matching.
- ⚠️ If requested bounds use negative longitudes (e.g., -120 to -80 for US West Coast), ensure you have applied the longitude wrapping from `load_data` FIRST. Otherwise slicing negative values on a 0-360 dataset returns empty data.
- ⚠️ Always check latitude orientation: use `slice(north, south)` if descending, `slice(south, north)` if ascending. Verify with `ds.latitude[0] > ds.latitude[-1]`.
""",
"temporal_subset": """
## Temporal Subsetting & Aggregation
### When to use
- Isolating specific events, months, or seasons
- Downsampling hourly data to daily/monthly
### Workflow
1. **Time slice** — `.sel(time=slice('2023-01-01', '2023-12-31'))`.
2. **Filter** — Seasons: `.sel(time=ds.time.dt.season == 'DJF')`.
3. **Resample** — `.resample(time='1D').mean()` for daily means.
### Quality Checklist
- [ ] Aggregation matches variable: `.mean()` for T/wind, `.sum()` for precip
- [ ] Leap years handled if using day-of-year grouping
### Common Pitfalls
- ⚠️ DJF wraps across years — verify start/end boundaries.
- ⚠️ `.resample()` (continuous) ≠ `.groupby()` (climatological). Don't mix them up.
- ⚠️ Radiation variables (`ssr`, `ssrd`) are accumulated — need differencing, not averaging.
- ⚠️ Hourly data is massive. Resample to daily ('1D') or monthly ('MS') IMMEDIATELY after spatial subsetting to avoid Memory/Timeout errors.
""",
# -------------------------------------------------------------------------
# STATISTICAL ANALYSIS
# -------------------------------------------------------------------------
"anomalies": """
## Anomaly Analysis
### When to use
- "How unusual was this period?"
- Comparing current conditions to "normal"
- Any "above/below average" question
### Workflow
1. **Define baseline** — ≥10 years (30 ideal). E.g. 1991-2020.
2. **Compute climatology** — `clim = ds.groupby('time.month').mean('time')`.
3. **Subtract** — `anomaly = ds.groupby('time.month') - clim`.
4. **Convert units** — Report in °C, mm, m/s (not K, m, Pa).
5. **Assess magnitude** — Compare to σ of the baseline period.
### Data Strategy for Large-Area Anomaly Maps
⚠️ For spatial areas ≥30°×30° (e.g., tropical Pacific), do NOT download 30 years one-by-one!
1. Download target period with 1 `retrieve_era5_data` call
2. Download 3-5 recent years of the same month as baseline (3-5 calls)
3. Average baseline files in `python_repl` → climatology
4. Subtract climatology from target → anomaly map
A 5-year baseline is sufficient for spatial anomaly maps.
### Quality Checklist
- [ ] Baseline ≥10 years (for temporal/small-area analysis; 3-5 years OK for spatial maps)
- [ ] Same calendar grouping for clim and analysis
- [ ] Units converted for readability
- [ ] Spatial context: is anomaly regional or localized?
### Common Pitfalls
- ⚠️ Short baselines amplify noise.
- ⚠️ Daily climatologies with <30yr baseline are noisy → use monthly grouping.
- ⚠️ Be explicit: spatial anomaly vs temporal anomaly.
- ⚠️ CRITICAL MATH BUG: Anomaly in Kelvin EXACTLY EQUALS Anomaly in Celsius. DO NOT subtract 273.15 from a temperature anomaly! If you compute `(SST_K - CLIM_K) - 273.15`, you get absurd ±200°C anomalies. Either convert both to °C BEFORE subtracting, or leave the difference as-is.
### Interpretation
- Positive = warmer/wetter/windier than normal.
- ±1σ = common, ±2σ = unusual (5%), ±3σ = extreme (0.3%).
- Maps: MUST use `RdBu_r` centered at zero via `TwoSlopeNorm`.
""",
"zscore": """
## Z-Score Analysis (Standardized Anomalies)
### When to use
- Comparing extremity across different variables
- Standardizing for regions with different variability
- Identifying statistically significant departures
### Workflow
1. **Compute baseline mean** — Grouped by month for seasonality.
2. **Compute baseline std** — Same period, same grouping.
3. **Standardize** — `z = (value - mean) / std`.
### Quality Checklist
- [ ] Standard deviation is non-zero everywhere
- [ ] Baseline period matches for mean and std
### Common Pitfalls
- ⚠️ Precipitation is NOT normally distributed — use SPI or percentiles instead of raw Z-scores.
- ⚠️ Z-scores near coastlines can be extreme due to mixed land/ocean std.
### Interpretation
- Z = 0: average. ±1: normal (68%). ±2: unusual (5%). ±3: extreme (0.3%).
""",
"trend_analysis": """
## Linear Trend Analysis
### When to use
- "Is it getting warmer/wetter over time?"
- Detecting long-term climate change signals
### Workflow
1. **Downsample** — Convert to annual/seasonal means first.
2. **Regress** — `scipy.stats.linregress` or `np.polyfit(degree=1)`.
3. **Significance** — Extract p-value for the slope.
4. **Scale** — Multiply annual slope by 10 → "per decade".
### Quality Checklist
- [ ] Period ≥20-30 years for meaningful trends
- [ ] Seasonal cycle removed before fitting
- [ ] Significance tested (p < 0.05)
- [ ] Report trend as units/decade
### Common Pitfalls
- ⚠️ Trend on daily data without removing seasonality → dominated by summer/winter swings.
- ⚠️ Short series have uncertain trends — report confidence intervals.
- ⚠️ Autocorrelation can inflate significance — consider using Mann-Kendall test.
- ⚠️ If p > 0.05, you MUST explicitly state the trend is NOT statistically significant. Do not present insignificant trends as real signals.
### Interpretation
- Report as °C/decade. Use stippling on maps for significant areas.
""",
"eof_analysis": """
## EOF/PCA Analysis
### When to use
- Finding dominant spatial patterns (ENSO, NAO, PDO)
- Dimensionality reduction of spatiotemporal data
### Workflow
1. **Deseasonalize** — Compute anomalies to remove the seasonal cycle.
2. **Latitude weighting** — Multiply by `np.sqrt(np.cos(np.deg2rad(lat)))`.
3. **Decompose** — PCA on flattened space dimensions.
4. **Reconstruct** — Map PCs back to spatial grid (EOFs).
### Quality Checklist
- [ ] Seasonal cycle removed
- [ ] Latitude weighting applied
- [ ] Variance explained (%) calculated per mode
- [ ] Physical interpretation attempted for leading modes
- [ ] Maps of EOF patterns MUST include coastlines (use Cartopy) for geographic context
- [ ] Variance explained (%) MUST be explicitly displayed in each plot title
### Common Pitfalls
- ⚠️ Unweighted EOFs inflate polar regions artificially.
- ⚠️ EOFs are mathematical constructs — not guaranteed to correspond to physical modes.
### Interpretation
- EOF1: dominant spatial pattern. PC1: its temporal evolution.
- If EOF1 explains >20% variance, it's highly dominant.
""",
"correlation_analysis": """
## Correlation Analysis
### When to use
- Spatial/temporal correlation mapping
- Lead-lag analysis (e.g., SST vs downstream precipitation)
- Teleconnection exploration
### Workflow
1. **Deseasonalize both variables** — Remove seasonal cycle from both.
2. **Align time coordinates** — Ensure identical time axes.
3. **Correlate** — `xr.corr(var1, var2, dim='time')`.
4. **Lead-lag** — Use `.shift(time=N)` month offsets to test delayed responses.
5. **Significance** — Compute p-values, mask insignificant areas.
### Quality Checklist
- [ ] Both variables deseasonalized
- [ ] p-values computed (p < 0.05 for significance)
- [ ] Sample size adequate (≥30 time points)
### Common Pitfalls
- ⚠️ Correlating raw data captures the seasonal cycle — everything correlates with summer.
- ⚠️ Spatial autocorrelation inflates field significance — apply Bonferroni or FDR correction.
### Interpretation
- R² gives variance explained. Lead-lag peak indicates response time.
- Plot spatial R maps with `RdBu_r`, stipple significant areas.
""",
"composite_analysis": """
## Composite Analysis
### When to use
- Average conditions during El Niño vs La Niña years
- Spatial fingerprint of specific extreme events
- "What does the atmosphere look like when X happens?"
### Workflow
1. **Define events** — Boolean mask of times exceeding a threshold (e.g., Niño3.4 > 0.5°C).
2. **Subset data** — `.where(mask, drop=True)`.
3. **Average** — Time mean of the subset = composite.
4. **Compare** — Subtract climatological mean → composite anomaly.
### Quality Checklist
- [ ] Sample size ≥10 events for robustness
- [ ] Baseline climatology matches the season of the events
- [ ] Significance tested via bootstrap or t-test
### Common Pitfalls
- ⚠️ Compositing n=2 events → noise, not a physical signal.
- ⚠️ Mixing seasons in composite (El Niño in DJF vs JJA) obscures the signal.
### Interpretation
- Shows the typical anomaly expected when event occurs.
- Plot with `RdBu_r` diverging colormap. Stipple significant areas.
""",
"diurnal_cycle": """
## Diurnal Cycle Analysis
### When to use
- Hourly variability within days (afternoon convection, nighttime cooling)
- Solar radiation patterns
### Workflow
1. **Group by hour** — `ds.groupby('time.hour').mean('time')`.
2. **Convert to local time** — ERA5 is UTC. `Local = UTC + Longitude/15`.
3. **Calculate amplitude** — `diurnal_range = max('hour') - min('hour')`.
### Quality Checklist
- [ ] Input data is hourly (not daily/monthly)
- [ ] UTC → local time conversion applied before labeling "afternoon"/"morning"
### Common Pitfalls
- ⚠️ Averaging global data by UTC hour mixes day and night across longitudes.
- ⚠️ Cloud cover (`tcc`) and radiation (`ssrd`) have strong diurnal signals — always check.
### Interpretation
- `blh` and `t2` peak mid-afternoon. Convective precip (`cp`) peaks late afternoon over land, early morning over oceans.
""",
"seasonal_decomposition": """
## Seasonal Decomposition
### When to use
- Separating the seasonal cycle from interannual variability
- Visualizing how a specific year deviates from the normal curve
### Workflow
1. **Compute climatology** — `.groupby('time.month').mean('time')`.
2. **Extract anomalies** — Subtract climatology from raw data.
3. **Smooth trend** — Apply 12-month rolling mean to extract multi-year trends.
### Quality Checklist
- [ ] Baseline robust (≥10 years)
- [ ] Residual = raw - seasonal - trend (should be ~white noise)
### Common Pitfalls
- ⚠️ Day-of-year climatologies over short baselines are noisy — smooth with 15-day window.
### Interpretation
- Separates variance into: seasonal (predictable), trend (long-term), residual (weather noise).
""",
"spectral_analysis": """
## Spectral Analysis
### When to use
- Periodicity detection (ENSO 3-7yr, MJO 30-60d, annual/semi-annual)
- Confirming suspected oscillatory behavior
### Workflow
1. **Prepare 1D series** — Spatial average or single point.
2. **Detrend** — Remove linear trend AND seasonal cycle.
3. **Compute spectrum** — `scipy.signal.welch` or `periodogram`.
4. **Plot as Period** — X-axis = 1/frequency (years or days), not raw frequency.
### Quality Checklist
- [ ] No NaNs in time series (interpolate or drop)
- [ ] Time coordinate evenly spaced
- [ ] Seasonal cycle removed
### Common Pitfalls
- ⚠️ Seasonal cycle dominates spectrum if not removed — drowns everything else.
- ⚠️ Short records can't resolve low-frequency oscillations (need ≥3× the period).
### Interpretation
- Peaks = dominant cycles. ENSO: 3-7yr. QBO: ~28mo. MJO: 30-60d. Annual: 12mo.
""",
"spatial_statistics": """
## Spatial Statistics & Area Averaging
### When to use
- Computing a single time series for a geographic region
- Area-weighted means for reporting
- Field significance testing
### Workflow
1. **Latitude weights** — `weights = np.cos(np.deg2rad(ds.latitude))`.
2. **Apply** — `ds.weighted(weights).mean(dim=['latitude', 'longitude'])`.
3. **Land/sea mask** — Apply if needed (e.g., ocean-only SST average).
### Quality Checklist
- [ ] Latitude weighting applied BEFORE spatial averaging
- [ ] Land-sea mask applied where relevant
- [ ] Units preserved correctly
### Common Pitfalls
- ⚠️ Unweighted averages bias toward poles (smaller grid cells over-counted).
- ⚠️ Global mean SST must exclude land points.
### Interpretation
- Produces physically accurate area-averaged time series.
""",
"multi_variable": """
## Multi-Variable Derived Quantities
### When to use
- Combining ERA5 variables for derived metrics
### Common Derivations
1. **Wind speed** — `wspd = np.sqrt(u10**2 + v10**2)` (or u100/v100 for hub-height).
2. **Wind direction** — `wdir = (270 - np.degrees(np.arctan2(v10, u10))) % 360`.
3. **Relative humidity** — From `t2` and `d2` using Magnus formula.
4. **Heat index** — Combine `t2` and `d2` (Steadman formula).
5. **Vapour transport** — `IVT ≈ tcwv * wspd` (surface proxy).
6. **Total precip check** — `tp ≈ cp + lsp`.
### Quality Checklist
- [ ] Variables share identical grids (time, lat, lon)
- [ ] Units matched before combining (both in °C, both in m/s, etc.)
### Common Pitfalls
- ⚠️ `mean(speed) ≠ speed_of_means` — always compute speed FIRST, then average.
- ⚠️ Wind direction requires proper 4-quadrant atan2, not naive arctan.
### Interpretation
- Derived metrics often better represent human/environmental impact than raw fields.
""",
"climatology_normals": """
## Climatology Normals (WMO Standard)
### When to use
- Computing 30-year normals
- Calculating "departure from normal"
### Workflow
1. **Select base period** — Standard WMO epoch: 1991-2020 (or 1981-2010).
2. **Compute monthly averages** — `normals = baseline.groupby('time.month').mean('time')`.
3. **Departure** — `departure = current.groupby('time.month') - normals`.
### Quality Checklist
- [ ] Exactly 30 years used
- [ ] Same months compared (don't mix Feb normals with March data)
### Common Pitfalls
- ⚠️ Moving baselines make comparisons with WMO climate reports inconsistent.
### Interpretation
- "Normal" = statistical baseline. Departures express how much current conditions deviate.
""",
# -------------------------------------------------------------------------
# CLIMATE INDICES & EXTREMES
# -------------------------------------------------------------------------
"climate_indices": """
## Climate Indices
### When to use
- Assessing ENSO, NAO, PDO, AMO teleconnections
- Correlating local weather with large-scale modes
### Key Indices
- **ENSO (Niño 3.4)**: `sst` anomaly, 5°S-5°N, 170°W-120°W. El Niño > +0.5°C, La Niña < -0.5°C.
- **NAO**: `mslp` difference, Azores High minus Icelandic Low. Positive → mild European winters.
- **PDO**: Leading EOF of North Pacific `sst` (north of 20°N). 20-30yr phases.
- **AMO**: Detrended North Atlantic `sst` average. ~60-70yr cycle.
### Workflow
1. **Extract region** — Use standard geographic bounds.
2. **Compute anomaly** — Area-averaged, against 30yr baseline.
3. **Smooth** — 3-to-5 month rolling mean.
### Quality Checklist
- [ ] Standard geographic bounds strictly followed
- [ ] Rolling mean applied to filter weather noise
- [ ] Latitude-weighted area average
### Common Pitfalls
- ⚠️ Without rolling mean, the index is too noisy for classification.
- ⚠️ Using incorrect region bounds produces a different (invalid) index.
- ⚠️ **MJO PROXY FAILURE:** Do NOT use Skin Temperature (`skt`) or SST to track the MJO over the ocean. The signal is effectively zero (~0.1°C variance). Always use Precipitation (`tp`), Total Column Water Vapour (`tcwv`), or Total Cloud Cover (`tcc`).
### Additional Indices
- **IOD (Indian Ocean Dipole)**: `sst` anomaly diff between Western (50-70°E, 10°S-10°N) and Eastern (90-110°E, 10°S-0°) poles.
""",
"extremes": """
## Extreme Event Analysis
### When to use
- Heat/cold extremes, heavy precipitation, tail-risk assessment
- Threshold exceedance frequency
### Workflow
1. **Define threshold** — Absolute (e.g., T > 35°C) or percentile-based (> 95th pctl of baseline).
2. **Create mask** — Boolean where condition is met.
3. **Count** — Sum over time for extreme days per year/month.
4. **Trend** — Check if frequency is increasing over time.
### Quality Checklist
- [ ] Percentiles from robust baseline (≥30 years)
- [ ] Use daily data, not monthly averages
- [ ] Units converted before applying thresholds
### Common Pitfalls
- ⚠️ 99th percentile on monthly averages misses true daily extremes entirely.
- ⚠️ Absolute thresholds (e.g., 35°C) are region-dependent — 35°C is normal in Sahara, extreme in London.
### Interpretation
- Increasing frequency of extremes = non-linear climate change impact.
- Report as "N days/year exceeding threshold" or "return period shortened from X to Y years".
""",
"drought_analysis": """
## Drought Analysis
### When to use
- Prolonged precipitation deficits
- Agricultural/hydrological impact assessment
- SPI (Standardized Precipitation Index) proxy
### Workflow
1. **Extract precip** — Use `tp` in mm (×1000 from meters).
2. **Accumulate** — Rolling sums: `tp.rolling(time=3).sum()` for 3-month SPI.
3. **Standardize** — `(accumulated - mean) / std` → SPI proxy.
4. **Cross-check** — Verify with `swvl1` (soil moisture) for ground-truth.
### Quality Checklist
- [ ] Monthly data used (not hourly)
- [ ] Baseline ≥30 years for stable statistics
- [ ] Multiple accumulation periods tested (1, 3, 6, 12 months)
### Common Pitfalls
- ⚠️ Absolute precipitation deficits are meaningless in deserts — always standardize.
- ⚠️ Gamma distribution fit (proper SPI) is better than raw Z-score for precip.
- ⚠️ CRITICAL BASELINE LENGTH: You MUST use a minimum 30-year baseline (e.g., 1991-2020) to compute the mean and std for SPI standardization. Computing z-scores on a 5-year period (e.g., using 2020-2024 as both study and reference period) is statistically invalid and creates artificial extreme spikes.
### Interpretation
- SPI < -1.0: Moderate drought. < -1.5: Severe. < -2.0: Extreme.
""",
"heatwave_detection": """
## Heatwave Detection
### When to use
- Identifying heatwave events using standard definitions
- Assessing heat-related risk periods
### Workflow
1. **Daily data** — Must be daily resolution (resample hourly if needed).
2. **Threshold** — 90th percentile of `t2` per calendar day from baseline.
3. **Exceedance mask** — `is_hot = t2_daily > threshold_90`.
4. **Streak detection** — Find ≥3 consecutive hot days using rolling sum ≥ 3.
### Quality Checklist
- [ ] Daily data (not monthly!)
- [ ] `t2` converted to °C
- [ ] Threshold is per-calendar-day (not a single annual value)
- [ ] Duration criterion applied (≥3 days)
### Common Pitfalls
- ⚠️ Monthly data — physically impossible to detect heatwaves.
- ⚠️ A single hot day is not a heatwave — duration matters.
- ⚠️ Nighttime temperatures (`t2` at 00/06 UTC) also matter for health impact.
- ⚠️ Using a flat seasonal anomaly threshold (e.g., "Summer Mean > +2°C") is NOT a heatwave detection method. This produces unphysical spatial artifacts. Heatwaves are discrete DAILY extreme events requiring per-calendar-day thresholds.
### Marine Heatwave Extension
- For ocean/SST heatwaves, use daily mean SST (not daily max).
- Marine heatwaves require ≥5 consecutive days above the 90th percentile threshold.
- Use a long baseline (e.g., 1991-2020) with a ±5-day smoothed calendar-day threshold.
### Interpretation
- Heatwaves require BOTH intensity (high T) AND duration (consecutive days).
- Report: number of events per year, mean duration, max intensity.
""",
"atmospheric_rivers": """
## Atmospheric Rivers Detection
### When to use
- Detecting AR events from integrated vapour transport proxy
- Extreme precipitation risk at landfall
### Workflow
1. **Extract** — `tcwv` + `u10`, `v10`.
2. **Compute IVT proxy** — `ivt = tcwv * np.sqrt(u10**2 + v10**2)`.
3. **Threshold** — IVT proxy > 250 kg/m/s (approximate).
4. **Shape check** — Feature should be elongated (>2000km long, <1000km wide).
### Quality Checklist
- [ ] Acknowledge this is surface-wind proxy (true IVT needs pressure-level data)
- [ ] Cross-validate with heavy `tp` at landfall
- [ ] Check for persistent (≥24h) plume features
### Common Pitfalls
- ⚠️ Tropical moisture pools are NOT ARs — wind-speed multiplier is essential to distinguish.
- ⚠️ This surface proxy underestimates true IVT — use conservative thresholds.
### Interpretation
- High `tcwv` + strong directed wind at coast = extreme flood risk.
- Map with `YlGnBu` for moisture intensity.
""",
"blocking_events": """
## Atmospheric Blocking Detection
### When to use
- Identifying persistent high-pressure blocks from MSLP
- Explaining prolonged heatwaves, droughts, or cold spells
### Workflow
1. **Extract** — `mslp` in hPa (÷100 from Pa).
2. **Compute anomalies** — Daily anomalies from climatology.
3. **Detect** — Find positive anomalies > 1.5σ persisting ≥5 days.
4. **Location** — Focus on mid-to-high latitudes (40-70°N typically).
### Quality Checklist
- [ ] 3-5 day rolling mean applied to filter transient ridges
- [ ] Persistence criterion enforced (≥5 days)
- [ ] Mid-latitude focus
### Common Pitfalls
- ⚠️ Fast-moving ridges are NOT blocks — persistence is key.
- ⚠️ Blocks in the Southern Hemisphere are rarer and weaker.
### Interpretation
- Blocks force storms to detour, causing prolonged rain on flanks and drought/heat underneath.
""",
"energy_budget": """
## Surface Energy Budget
### When to use
- Analyzing radiation balance and surface heating
- Solar energy potential assessment
### Workflow
1. **Extract radiation** — `ssrd` (incoming solar), `ssr` (net solar after reflection).
2. **Convert units** — J/m² to W/m² by dividing by accumulation period (3600s for hourly).
3. **Compute albedo proxy** — `albedo ≈ 1 - (ssr / ssrd)` where ssrd > 0.
4. **Seasonal patterns** — Group by month to see radiation cycle.
### Quality Checklist
- [ ] Accumulation period properly accounted for (hourly vs daily sums)
- [ ] Division by zero protected (nighttime ssrd = 0)
- [ ] Units clearly stated: W/m² or MJ/m²/day
### Common Pitfalls
- ⚠️ ERA5 radiation is ACCUMULATED over the forecast step — must difference consecutive steps for instantaneous values.
- ⚠️ `ssr` already accounts for clouds and albedo — don't double-correct.
### Interpretation
- Higher `ssrd` - High solar potential. Low `ssr/ssrd` ratio → high cloudiness or reflective surface (snow/ice).
""",
"wind_energy": """
## Wind Energy Assessment
### When to use
- Wind power density analysis
- Turbine hub-height wind resource mapping
### Workflow
1. **Use hub-height winds** — `u100`, `v100` (100m, not 10m surface winds).
2. **Compute speed** — `wspd100 = np.sqrt(u100**2 + v100**2)`.
3. **Power density** — `P = 0.5 * rho * wspd100**3` where rho ≈ 1.225 kg/m³.
4. **Capacity factor** — Fraction of time wind exceeds cut-in speed (~3 m/s) and stays below cut-out (~25 m/s).
5. **Weibull fit** — Fit shape (k) and scale (A) parameters to the wind speed distribution.
### Quality Checklist
- [ ] Using 100m winds, NOT 10m (turbines don't operate at surface)
- [ ] Power density in W/m²
- [ ] Seasonal variation checked (winter vs summer)
### Common Pitfalls
- ⚠️ Using 10m winds severely underestimates wind energy potential.
- ⚠️ Mean wind speed misleads — power depends on speed CUBED, so variability matters enormously.
### Interpretation
- Power density >400 W/m² = excellent wind resource.
- Report Weibull k parameter: k < 2 = gusty/variable, k > 3 = steady flow.
""",
"moisture_budget": """
## Moisture Budget Analysis
### When to use
- Understanding precipitation sources
- Tracking moisture plumes and convergence zones
### Workflow
1. **Extract** — `tcwv` (precipitable water), `tcw` (total column water incl. liquid/ice).
2. **Temporal evolution** — Track `tcwv` changes to infer moisture convergence.
3. **Relate to precip** — Compare `tcwv` peaks with `tp` to see conversion efficiency.
4. **Spatial patterns** — Map `tcwv` to identify moisture corridors.
### Quality Checklist
- [ ] Distinguish `tcwv` (vapour only) from `tcw` (vapour + liquid + ice)
- [ ] Units: kg/m² (equivalent to mm of water)
### Common Pitfalls
- ⚠️ High `tcwv` doesn't guarantee rain — need a lifting mechanism.
- ⚠️ `tcw - tcwv` gives cloud water + ice content (proxy for cloud thickness).
### Interpretation
- `tcwv` > 50 kg/m² in tropics = moisture-laden atmosphere primed for heavy precip.
""",
"convective_potential": """
## Convective Potential (Thunderstorm Risk)
### When to use
- Thunderstorm forecasting and climatology
- Severe weather risk assessment
### Workflow
1. **Extract CAPE** — Already available as `cape` variable (J/kg).
2. **Classify risk** — Low (<300), Moderate (300-1000), High (1000-2500), Extreme (>2500 J/kg).
3. **Combine with moisture** — High CAPE + high `tcwv` → heavy convective storms.
4. **Check trigger** — Fronts, orography, or strong daytime heating (`t2` diurnal cycle).
### Quality Checklist
- [ ] CAPE alone is insufficient — need a trigger mechanism
- [ ] Check `blh` (boundary layer height) — deep BLH aids convective initiation
### Common Pitfalls
- ⚠️ CAPE = potential energy, not a guarantee. High CAPE + strong capping inversion = no storms.
- ⚠️ CAPE is most meaningful in afternoon hours — avoid pre-dawn values.
### Interpretation
- CAPE > 1000 J/kg with deep BLH (>2km) and high `tcwv` = significant thunderstorm risk.
""",
"snow_cover": """
## Snow Cover & Melt Analysis
### When to use
- Tracking snow accumulation and melt timing
- Climate change impacts on snowpack
### Workflow
1. **Extract** — `sd` (Snow Depth in m water equivalent).
2. **Seasonal cycle** — Track start/end of snow season per grid point.
3. **Melt timing** — Find the date when `sd` drops below threshold.
4. **Trend** — Check if snow season is shortening over decades.
5. **Compare with `stl1`/`t2`** — Warming soil accelerates melt.
### Quality Checklist
- [ ] Units: meters of water equivalent
- [ ] Focus on mid/high latitudes and mountain regions
- [ ] Inter-annual variability large — use multi-year analysis
### Common Pitfalls
- ⚠️ ERA5 snow depth is modeled, not observed — cross-reference with station data.
- ⚠️ Rain-on-snow events can cause rapid melt not captured well in reanalysis.
### Interpretation
- Earlier melt = less summer water supply. Map with `Blues`, reversed for snowless areas.
""",
# -------------------------------------------------------------------------
# VISUALIZATION
# -------------------------------------------------------------------------
"visualization_spatial": """
## Spatial Map Visualization
### When to use
- Mapping absolute climate fields (Temp, Wind, Precip, Pressure)
### Workflow
1. **Figure** — `fig, ax = plt.subplots(figsize=(12, 8))`.
2. **Meshgrid** — `lons, lats = np.meshgrid(data.longitude, data.latitude)`.
3. **Plot** — `ax.pcolormesh(lons, lats, data, cmap=..., shading='auto')`.
4. **Colorbar** — ALWAYS: `plt.colorbar(mesh, ax=ax, label='Units', shrink=0.8)`.
5. **Cartopy** — Optional: add coastlines, land fill. Graceful fallback if not installed.
### Quality Checklist
- [ ] Figure 12×8 for maps
- [ ] Colormap matches variable:
- Temp: `RdYlBu_r` | Wind: `YlOrRd` | Precip: `YlGnBu`
- Pressure: `viridis` | Cloud: `Greys` | Anomalies: `RdBu_r`
- [ ] NEVER use `jet`
- [ ] Colorbar has label with units
- [ ] CARTOPY IS MANDATORY: Always use `cartopy.crs` projections with `ax.coastlines()` and `ax.add_feature(cfeature.BORDERS)`. Maps without coastlines appear as meaningless color blobs.
- [ ] Always pass `transform=ccrs.PlateCarree()` to `pcolormesh`/`contourf` when using Cartopy.
- [ ] For Arctic regions (latitude > 60°N), use `ccrs.NorthPolarStereo()` instead of `PlateCarree` to avoid extreme distortion.
- [ ] For US regional maps, add `cfeature.STATES` for state boundaries.
- [ ] NEVER use `Greys` colormap for humidity or precipitation. Use `YlGnBu` or `BrBG`.
- [ ] Categorical/binary maps (like hotspot masks) should use a categorical legend, not a continuous 0-1 colorbar.
### Common Pitfalls
- ⚠️ Diverging cmap on absolute data is misleading — diverging only for anomalies.
- ⚠️ Missing `shading='auto'` triggers deprecation warning.
""",
"visualization_timeseries": """
## Time Series Visualization
### When to use
- Temporal evolution of a variable at a point or region
### Workflow
1. **Area average** — `ts = data.mean(dim=['latitude', 'longitude'])` (with lat weighting!).
2. **Figure** — `fig, ax = plt.subplots(figsize=(10, 6))`.
3. **Raw line** — `ax.plot(ts.time, ts, linewidth=1.5)`.
4. **Smoothing** — Add rolling mean overlay with contrasting color.
5. **Date formatting** — `fig.autofmt_xdate(rotation=30)`.
### Quality Checklist
- [ ] Figure 10×6
- [ ] Y-axis has explicit units
- [ ] Legend included if multiple lines
- [ ] Trend line if requested: dashed with slope annotation
### Enhancements
- **Uncertainty band**: `ax.fill_between(time, mean-std, mean+std, alpha=0.2)`
- **Event markers**: `ax.axvline(date, color='red', ls='--')`
- **Twin axis**: `ax2 = ax.twinx()` for second variable
- **Date formatting**: Always use proper date labels (e.g., `mdates.DateFormatter('%b %d')`), NEVER raw day-of-month integers (1, 2, ... 31).
- **Y-axis range**: Do not set y-limits too narrow to artificially exaggerate peaks. Keep ranges physically reasonable.
- **Dual axes coloring**: If using `ax2 = ax.twinx()`, color the y-tick labels to match the corresponding line colors.
- **Grid lines**: Always add `ax.grid(True, alpha=0.3)` for precise value comparison.
### Common Pitfalls
- ⚠️ Hourly data over 10+ years → unreadable block of ink. Resample to daily first.
""",
"visualization_anomaly_map": """
## Anomaly Map Visualization
### When to use
- Diverging data: departures, trends, z-scores
- Any map that has positive AND negative values
### Workflow
1. **Center at zero** — `from matplotlib.colors import TwoSlopeNorm`.
2. **Norm** — `norm = TwoSlopeNorm(vmin=data.min(), vcenter=0, vmax=data.max())`.
3. **Plot** — `pcolormesh(..., cmap='RdBu_r', norm=norm)`.
4. **Stippling** — Overlay significance: `contourf(..., levels=[0, 0.05], hatches=['...'], colors='none')`.
### Quality Checklist
- [ ] Zero is EXACTLY white/neutral in the colorbar
- [ ] Warm/dry = Red; Cool/wet = Blue
- [ ] Precip anomalies: consider `BrBG` instead of `RdBu_r`
### Common Pitfalls
- ⚠️ Without `TwoSlopeNorm`, skewed data makes 0 appear colored → reader is misled.
- ⚠️ Symmetric vmin/vmax (`vmax = max(abs(data))`) can also work but wastes color range.
- ⚠️ CARTOPY IS MANDATORY for anomaly maps — always add `ax.coastlines()` and `ax.add_feature(cfeature.BORDERS)`.
- ⚠️ ROBUST COLORBAR LIMITS: NEVER use raw `data.min()` and `data.max()` for anomaly map limits. A single outlier cell can result in ±200°C scale making the map unreadable. Always use percentile-based limits: `vmax = np.nanpercentile(np.abs(data), 98)`.
- ⚠️ Always pass `transform=ccrs.PlateCarree()` when plotting with Cartopy.
""",
"visualization_wind": """
## Wind & Vector Visualization
### When to use
- Circulation patterns, wind fields, quiver/streamline plots
### Workflow
1. **Speed background** — `wspd` with `pcolormesh` + `YlOrRd`.
2. **Subsample vectors** — `skip = (slice(None, None, 5), slice(None, None, 5))` to avoid solid black.
3. **Quiver** — `ax.quiver(lons[skip], lats[skip], u[skip], v[skip], color='black')`.
4. **Alternative** — `ax.streamplot()` for flow visualization (less cluttered).
### Quality Checklist
- [ ] Background heatmap shows magnitude
- [ ] Vectors sparse enough to be readable
- [ ] Wind barbs: `ax.barbs()` for meteorological display
### Common Pitfalls
- ⚠️ Full-resolution quiver = completely black, unreadable mess. MUST subsample vectors.
- ⚠️ Check arrow scaling — default autoscale can make light winds invisible.
- ⚠️ REFERENCE ARROW MANDATORY: Always add `ax.quiverkey(q, 0.9, 1.05, 10, '10 m/s', labelpos='E')`. Without this, arrow magnitudes are uninterpretable.
- ⚠️ CARTOPY IS MANDATORY: Add `ax.coastlines()` and `ax.add_feature(cfeature.BORDERS)` to all wind maps.
- ⚠️ Always pass `transform=ccrs.PlateCarree()` to quiver/streamplot when using Cartopy.
### Interpretation
- Arrows = direction, background color = magnitude. Cyclonic rotation = storm.
""",
"visualization_comparison": """
## Multi-Panel Comparison
### When to use
- Before/after, two periods, difference maps
- Multi-variable side-by-side
### Workflow
1. **Grid** — `fig, axes = plt.subplots(1, 3, figsize=(18, 6))`.
2. **Panels 1 & 2** — Absolute values with SHARED `vmin`/`vmax`.
3. **Panel 3** — Difference (A-B) with diverging cmap centered at zero.
### Quality Checklist
- [ ] Panels 1 & 2 share EXACT same vmin/vmax (otherwise visual comparison is invalid)
- [ ] Panel 3 has its own divergent colorbar centered at zero
- [ ] Titles clearly label what each panel shows
### Common Pitfalls
- ⚠️ Auto-scaled panels = impossible to compare visually. Always lock limits.
- ⚠️ Use Cartopy projections for ALL map panels: `subplot_kw={'projection': ccrs.PlateCarree()}`. Add `ax.coastlines()` to each.
- ⚠️ Always pass `transform=ccrs.PlateCarree()` to each panel's plotting call.
""",
"visualization_profile": """
## Hovmöller Diagrams
### When to use
- Lat-time or lon-time cross-sections
- Tracking wave propagation, ITCZ migration, monsoon onset
### Workflow
1. **Average out one dimension** — e.g., average across latitudes to get (lon, time).
2. **Transpose** — X=Time, Y=Lon/Lat.
3. **Plot** — `contourf` or `pcolormesh`, figure 12×6.
### Quality Checklist
- [ ] X-axis uses date formatting
- [ ] Y-axis labels state the averaged geographic slice
- [ ] Colormap matches variable type
### Common Pitfalls
- ⚠️ Swapping axes makes the diagram unintuitive. Time → X-axis convention.
### Proxy Selection for Hovmöller
- ⚠️ For MJO tracking over the ocean, DO NOT use Skin Temperature (`skt`) — the signal is too weak (~0.1°C). Use Convective Precipitation (`cp`), Total Precipitation (`tp`), Total Column Water Vapour (`tcwv`), or Total Cloud Cover (`tcc`).
- ⚠️ Remove the seasonal cycle (subtract 30-day running mean) to isolate intraseasonal signals like MJO (30-60 day periods).
- ⚠️ Longitude axis must use standard geographic convention (-180 to +180), not 0-360.
### Interpretation
- Diagonal banding = propagating waves/systems. Vertical banding = stationary patterns.
""",
"visualization_distribution": """
## Distribution Visualization
### When to use
- Histograms, PDFs, box plots
- Comparing two time periods or regions
### Workflow
1. **Flatten** — `.values.flatten()`, drop NaNs.
2. **Shared bins** — `np.linspace(min, max, 50)`.
3. **Plot** — `ax.hist(data, bins=bins, alpha=0.5, density=True, label='Period')`.
4. **Median/mean markers** — Vertical lines with annotation.
### Quality Checklist
- [ ] `density=True` for comparing different-sized samples
- [ ] `alpha=0.5` for overlapping distributions
- [ ] Legend when comparing multiple distributions
### Common Pitfalls
- ⚠️ Raw counts (not density) skew comparison between periods with different sample sizes.
- ⚠️ Too few bins = lost detail. Too many = noisy. 30-50 bins is usually good.
### Interpretation
- Rightward shift = warming. Flatter + wider = more variability = more extremes.
""",
"visualization_animation": """
## Animated/Sequential Maps
### When to use
- Monthly/seasonal evolution of a field
- Event lifecycle (genesis → peak → decay)
### Workflow
1. **Global limits** — Find absolute vmin/vmax across ALL timesteps.
2. **Multi-panel grid** — `fig, axes = plt.subplots(2, 3, figsize=(18, 12))` for 6 timesteps.
3. **Lock colorbars** — Same vmin/vmax on every panel.
4. **Shared colorbar** — Remove per-panel colorbars, add one at the bottom.
### Quality Checklist
- [ ] Colorbar limits LOCKED across all panels (no jumping colors)
- [ ] Timestamps clearly labeled on each panel
- [ ] Static grid preferred over video (headless environment)
### Common Pitfalls
- ⚠️ Auto-scaled panels flash/jump between frames — always lock limits.
- ⚠️ MP4/GIF generation may fail in headless — use PNG grids instead.
- ⚠️ Use Cartopy projections for ALL map panels: `subplot_kw={'projection': ccrs.PlateCarree()}`. Add `.coastlines()` to each axis.
""",
"visualization_dashboard": """
## Summary Dashboard
### When to use
- Comprehensive overview: map + time series + statistics in one figure
- Publication-ready event summaries
### Workflow
1. **Layout** — `fig = plt.figure(figsize=(16, 10))` + `matplotlib.gridspec`.
2. **Top row** — Large spatial map (anomaly or mean field).
3. **Bottom left** — Time series of regional mean.
4. **Bottom right** — Distribution histogram or box plot.
### Quality Checklist
- [ ] `plt.tight_layout()` or `constrained_layout=True` to prevent overlap
- [ ] Consistent color theme across all panels
- [ ] Clear panel labels (a, b, c)
### Common Pitfalls
- ⚠️ Cramming too much into small figure → illegible text. Scale figure size up.
- ⚠️ Different aspect ratios between map and time series need explicit gridspec ratios.
- ⚠️ MIXED PROJECTION DANGER: Cartopy projections must ONLY be applied to MAP axes. If you add `projection=ccrs.PlateCarree()` to a time series or histogram panel, it will break the plot. Use `fig.add_subplot(gs[...], projection=ccrs.PlateCarree())` ONLY for spatial map panels.
- ⚠️ For dashboards showing Americas/Atlantic regions (e.g., Hurricane Otis), always wrap longitudes to -180/+180 and use `ax.set_extent([west, east, south, north])`.
""",
"visualization_contour": """
## Contour & Isobar Plots
### When to use
- Pressure maps with isobars
- Temperature isotherms
- Any smoothly varying field where specific levels matter
### Workflow
1. **Define levels** — `levels = np.arange(990, 1040, 4)` for MSLP isobars.
2. **Filled contour** — `ax.contourf(lons, lats, data, levels=levels, cmap=...)`.
3. **Contour lines** — `cs = ax.contour(lons, lats, data, levels=levels, colors='black', linewidths=0.5)`.
4. **Labels** — `ax.clabel(cs, inline=True, fontsize=8)`.
### Quality Checklist
- [ ] Level spacing is physically meaningful (e.g., 4 hPa for MSLP)
- [ ] Contour labels don't overlap
- [ ] Filled + line contours combined for best readability
### Common Pitfalls
- ⚠️ Too many levels → cluttered, unreadable. 10-15 levels max.
- ⚠️ Non-uniform level spacing requires manual colorbar ticks.
- ⚠️ CARTOPY IS MANDATORY: Use `subplot_kw={'projection': ccrs.PlateCarree()}` and add `ax.coastlines()`.
- ⚠️ Always pass `transform=ccrs.PlateCarree()` to `contour` and `contourf` calls.
### Interpretation
- Tightly packed isobars = strong pressure gradient = high winds.
""",
"visualization_correlation_map": """
## Spatial Correlation Maps
### When to use
- Showing where a variable correlates with an index (e.g., ENSO vs global precip)
- Teleconnection mapping
### Workflow
1. **Compute index** — 1D time series (e.g., Niño3.4 SST anomaly).
2. **Correlate** — `xr.corr(index, spatial_field, dim='time')` → 2D R-map.
3. **Significance** — Compute p-values from sample size and R.
4. **Plot** — Map R values with `RdBu_r` centered at zero. Stipple p < 0.05.
### Quality Checklist
- [ ] Both index and field deseasonalized
- [ ] R-map centered at zero (TwoSlopeNorm or symmetric limits)
- [ ] Significant areas stippled or hatched
- [ ] Sample size ≥30 stated
### Common Pitfalls
- ⚠️ Raw data correlations dominated by shared seasonal cycle.
- ⚠️ Field significance: many grid points → some will be significant by chance. Apply FDR correction.
### Interpretation
- R > 0: in-phase with index. R < 0: out-of-phase. |R| > 0.5 = strong relationship.
""",
# -------------------------------------------------------------------------
# MARITIME ANALYSIS
# -------------------------------------------------------------------------
"maritime_route": """
## Maritime Route Risk Analysis
### When to use
- Analyzing weather risks along calculated shipping lanes
- Voyage planning and hazard assessment
### Workflow
1. **Route** — Call `calculate_maritime_route` → waypoints + bounding box.
2. **Data** — Download `u10`, `v10` for route bbox, target month, last 3 years.
3. **Wind speed** — `wspd = np.sqrt(u10**2 + v10**2)`.
4. **Extract** — Loop waypoints: `.sel(lat=lat, lon=lon, method='nearest')`.
5. **Risk classify** — Safe (<10), Caution (10-17), Danger (17-24), Extreme (>24 m/s).
6. **Statistics** — P95 wind speed at each waypoint, % time in each risk category.
### Quality Checklist
- [ ] Bounding box from route tool used DIRECTLY (don't convert coords)
- [ ] 3-year period for climatological context, not just one date
- [ ] Risk categories applied at waypoint level
### Common Pitfalls
- ⚠️ Global hourly downloads → timeout. Subset tightly to route bbox.
- ⚠️ Don't use bounding box mean — extract AT waypoints for route-specific risk.
""",
"maritime_visualization": """
## Maritime Route Risk Visualization
### When to use
- Plotting route risk maps with waypoint-level risk coloring
### Workflow
1. **Background** — Map mean `wspd` with `pcolormesh` + `YlOrRd`.
2. **Route line** — Dashed line connecting waypoints.
3. **Waypoint scatter** — Color by risk: Green (<10), Amber (10-17), Coral (17-24), Red (>24 m/s).
4. **Labels** — "ORIGIN" and "DEST" annotations.
5. **Legend** — Custom 4-category legend (mandatory).
### Quality Checklist
- [ ] 4-category risk legend ALWAYS included
- [ ] Origin/Destination labeled
- [ ] Colormap: `YlOrRd` for wind speed
- [ ] Saved to PLOTS_DIR
### Common Pitfalls
- ⚠️ No legend → colored dots are meaningless to the user.
- ⚠️ Route line + waypoints must be on top (high zorder) to not be hidden by background.
- ⚠️ CARTOPY IS MANDATORY: Always add `ax.coastlines()` — without land boundaries it is impossible to see where the route passes relative to coastlines (e.g., Suez Canal, Malacca Strait).
- ⚠️ Always pass `transform=ccrs.PlateCarree()` to route scatter/line plotting calls when using Cartopy.
""",
}
# =============================================================================
# ARGUMENT SCHEMA
# =============================================================================
class AnalysisGuideArgs(BaseModel):
"""Arguments for analysis guide retrieval."""
topic: Literal[
# Data operations
"load_data",
"spatial_subset",
"temporal_subset",
# Statistical analysis
"anomalies",
"zscore",
"trend_analysis",
"eof_analysis",
# Advanced analysis
"correlation_analysis",
"composite_analysis",
"diurnal_cycle",
"seasonal_decomposition",
"spectral_analysis",
"spatial_statistics",
"multi_variable",
"climatology_normals",
# Climate indices & extremes
"climate_indices",
"extremes",
"drought_analysis",
"heatwave_detection",
"atmospheric_rivers",
"blocking_events",
# Domain-specific
"energy_budget",
"wind_energy",
"moisture_budget",
"convective_potential",
"snow_cover",
# Visualization
"visualization_spatial",
"visualization_timeseries",
"visualization_anomaly_map",
"visualization_wind",
"visualization_comparison",
"visualization_profile",
"visualization_distribution",
"visualization_animation",
"visualization_dashboard",
"visualization_contour",
"visualization_correlation_map",
# Maritime
"maritime_route",
"maritime_visualization",
] = Field(
description="Analysis topic to get guidance for"
)
# =============================================================================
# TOOL FUNCTION
# =============================================================================
def get_analysis_guide(topic: str) -> str:
"""
Get methodological guidance for climate data analysis.
Returns text instructions for using python_repl to perform the analysis.
"""
guide = ANALYSIS_GUIDES.get(topic)
if not guide:
available = ", ".join(sorted(ANALYSIS_GUIDES.keys()))
return f"Unknown topic: {topic}. Available: {available}"
return f"""
# Analysis Guide: {topic.replace('_', ' ').title()}
{guide}
---
Use python_repl to implement this analysis with your downloaded ERA5 data.
"""
# =============================================================================
# TOOL DEFINITIONS
# =============================================================================
analysis_guide_tool = StructuredTool.from_function(
func=get_analysis_guide,
name="get_analysis_guide",
description="""
Get methodological guidance for climate data analysis.
Returns workflow steps, quality checklists, and pitfall warnings for:
- Data: load_data, spatial_subset, temporal_subset
- Statistics: anomalies, zscore, trend_analysis, eof_analysis
- Advanced: correlation_analysis, composite_analysis, diurnal_cycle,
seasonal_decomposition, spectral_analysis, spatial_statistics,
multi_variable, climatology_normals
- Climate: climate_indices, extremes, drought_analysis, heatwave_detection,
atmospheric_rivers, blocking_events
- Domain: energy_budget, wind_energy, moisture_budget, convective_potential, snow_cover
- Visualization: visualization_spatial, visualization_timeseries,
visualization_anomaly_map, visualization_wind, visualization_comparison,
visualization_profile, visualization_distribution, visualization_animation,
visualization_dashboard, visualization_contour, visualization_correlation_map
- Maritime: maritime_route, maritime_visualization
Use this BEFORE writing analysis code in python_repl.
""",
args_schema=AnalysisGuideArgs,
)
# Visualization guide - alias for backward compatibility
visualization_guide_tool = StructuredTool.from_function(
func=get_analysis_guide,
name="get_visualization_guide",
description="""
Get publication-grade visualization instructions for ERA5 climate data.
CALL THIS BEFORE creating any plot to get:
- Correct colormap choices
- Standard value ranges
- Required map elements
- Best practices
Available visualization topics:
- visualization_spatial: Maps with proper projections
- visualization_timeseries: Time series plots
- visualization_anomaly_map: Diverging anomaly maps
- visualization_wind: Quiver/streamline plots
- visualization_comparison: Multi-panel comparisons
- visualization_profile: Hovmöller diagrams
- visualization_distribution: Histograms/PDFs
- visualization_animation: Sequential map grids
- visualization_dashboard: Multi-panel summaries
- visualization_contour: Isobar/isotherm plots
- visualization_correlation_map: Spatial correlation maps
- maritime_visualization: Route risk maps
""",
args_schema=AnalysisGuideArgs,
)