Compound Poisson–Lognormal Monte Carlo Risk Model
Author: Prof. Hernan Huwyler, CIAO MBA CPA
Affiliation: IE Law School Center for Risk and Compliance | Capgemini Applied AI Lab
License: CC BY 4.0
Version: 1.0.0
Last Updated: 2025
Model Overview
This dataset and accompanying Python model implement a compound frequency–severity Monte Carlo simulation engine for operational and AI risk quantification.
The model answers the core question facing Chief Risk Officers, CFOs, and AI Governance practitioners:
Given uncertainty in how often loss events occur and how severe each event is, what is the probability distribution of total annual loss — and what capital reserve does that distribution require?
Mathematical Foundation
Frequency model: N ~ Poisson(λ)
Each simulation period draws the number of loss events from a Poisson distribution with rate parameter λ (expected events per year). The Poisson distribution is the standard actuarial and operational risk choice for independent, random event counts.
Severity model: Lᵢ ~ Lognormal(μ, σ)
Each individual loss is drawn from a lognormal distribution, which is standard for financial loss severity because it enforces non-negativity, captures heavy right tails, and is consistent with regulatory frameworks including Basel III operational risk, Solvency II internal models, and NIST SP 800-30 risk quantification guidance.
Calibration method: μ and σ are calibrated analytically from business-facing inputs: a confidence interval [lower, upper] that the practitioner believes contains the central mass of individual loss severity at a specified probability level. This eliminates the need to estimate lognormal parameters directly from sparse loss data.
Aggregate loss: S = Σᵢ₌₁ᴺ Lᵢ
The total annual loss is the sum of all individual losses in the period. When N = 0, S = 0.
Simulation: 100,000 independent trials by default, vectorized using NumPy for performance. The law of large numbers ensures stable tail estimates at this sample size.
Risk Metrics Produced
| Metric | Formula | Regulatory Reference |
|---|---|---|
| Mean Loss | E[S] | Baseline planning |
| Median Loss | P50(S) | Central tendency |
| Standard Deviation | Std(S) | Volatility measure |
| Value at Risk 95% | P95(S) | Basel III, Solvency II |
| Conditional VaR 95% | E[S | S > VaR95] | Expected Shortfall |
| Reserve Percentile | Pₖ(S) user-defined | ICAAP capital buffer |
| Exceedance Curve | P(S ≥ x) | Catastrophe modeling |
| Return Period | 1 / P(S ≥ x) | Infrastructure risk |
Regulatory and Framework Mapping
ISO/IEC 42001 — AI Management System
| Percentile Range | ISO 42001 Control Theme |
|---|---|
| ≥ 95th | OC-8: Incident Response and Recovery |
| 75th–94th | OC-4: Risk Treatment and Controls |
| < 75th | OC-2: Risk Assessment and Identification |
NIST AI Risk Management Framework (AI RMF)
| Loss Range | NIST AI RMF Function |
|---|---|
| Extreme tail (≥ 99th) | RESPOND + RECOVER |
| High risk (75th–98th) | DETECT + RESPOND |
| Baseline (< 75th) | GOVERN + MAP |
EU AI Act Risk Tiers
- Unacceptable risk: Scenarios with CVaR exceeding regulatory capital thresholds
- High risk: Loss scenarios at or above VaR(95%) with systemic or rights-impacting AI
- Limited risk: Base-case scenarios with adequate reserve coverage
- Minimal risk: Best-case scenarios below reserve threshold
Basel III and Solvency II
- P99: Standard for Basel III Advanced Measurement Approach operational risk capital
- P99.5: Solvency II Solvency Capital Requirement standard
- P99.9: ICAAP extreme stress scenario buffer
Dataset Contents
data/train.csv — 100,000 simulation trials
Primary dataset. One row per Monte Carlo trial. Contains aggregate loss outcome, event frequency, average severity, scenario classification, exceedance probability, VaR and CVaR flags, and all calibration parameters for full reproducibility.
data/test.csv — 288 stress scenarios
Validation dataset. One row per stress scenario combining six lambda values, four severity ranges, four confidence levels, and three random seeds. Contains full percentile distribution per scenario for sensitivity analysis.
data/percentile_table.csv — Percentile distribution table
Structured percentile summary with regulatory mapping. One row per percentile point from P1 to P99.9. Directly usable in risk reports, board presentations, and regulatory submissions.
Python Model — Key Features
from compound_risk_model import RiskModel, RiskModelConfig
cfg = RiskModelConfig(
simulations = 100_000,
lower = 1_000.0, # Lower severity bound (monetary units)
upper = 2_000.0, # Upper severity bound (monetary units)
confidence_level = 0.80, # Probability mass in [lower, upper]
events = 4.0, # Expected loss events per year (Poisson λ)
reserve = 0.75, # Reserve percentile for capital planning
seed = 123 # Reproducibility seed
)
model = RiskModel.from_interval(cfg)
model.summary()
model.plot_loss_exceedance_curve()
model.plot_loss_distribution()
model.plot_scatter()
model.plot_heatmap()