|
This project delivers a transparent, Python first earthquake risk model for teaching and sandboxing. The pipeline cleans and de clusters historical catalogs (Gardner–Knopoff), applies Stepp completeness, and fits Gutenberg–Richter to obtain event rates. We then simulate stochastic events and scenarios, compute shaking with an NGA West2 like GMPE plus Vs30 site terms, and translate intensity to damage and loss using occupancy specific vulnerability with secondary and spatial correlation. Policy terms are applied to highlight the cat signature: a tiny AAL relative to TIV but a fat-tailed loss distribution. We present both AEP (aggregate per year) and OEP (largest event per year) curves to separate portfolio vs event viewpoints. De clustering lowers annual rates (and AAL) while leaving per event tails similar—an important modeling choice shown in Results. Scenario replays (Northridge, Loma Prieta) and an interactive page let you tweak b, Mmax, and deductibles to see how risk shifts. Assumptions, seeds, and data versions are documented for reproducibility.
Declustering lowers annual rates (λ) and AAL; per-event tail remains similar. |
Data and PipelineSources
|
Processing Steps
What We Did. We removed non-earthquake entries (e.g., quarry blasts), coerced key fields to valid types, and normalized timestamps to UTC. Invalid coordinates and implausible magnitudes were dropped, duplicate event IDs were de-duplicated, and the catalog was stably sorted in time. Because the raw feed mixes magnitude scales, we standardized all magnitudes to moment magnitude (Mw) for analysis: Mw/Mwr/Mh were kept as-is; ML/MLr, Mb, Ms, and Md/Mc were converted to Mw using conservative linear mappings with explicit validity ranges; unrecognized or low-quality types (Ma/Fa/Mfa/"uk") and out-of-range values were excluded. We then applied a working floor of Mw ≥ 3.0 to align with completeness. Key findings. Sharp increases in measured seismic events in both the 1930s and 1970s corresponding to advancements in seismic measuring technology. The 1933 Long Beach earthquake sparked the creation of a network of integrated seismometers in the state. The network was expanded again in the 1970s following the 1971 San Fernando earthquakes. Concurrent with these expansions were the development of two seismic measurement scales, both at Caltech: the Richter scale in 1935 and the Moment Magnitude scale in the late 1970s. We applied the Gardner–Knopoff (1974) declustering to separate dependent seismicity (foreshocks, aftershocks, swarms) from independent mainshocks. GK uses magnitude-dependent time and distance windows: larger events cast longer and wider “influence” windows, within which smaller events are treated as dependents of the mainshock. We interpolated a standard GK lookup table to obtain a continuous window function and added conservative minimum floors for time and radius. Implementation. After time-sorting the catalog, we performed a forward sweep: for each candidate mainshock, we flagged subsequent events that occurred within its GK time window and within the great-circle haversine radius, provided they were not larger than the mainshock (with a small magnitude wiggle allowance). Each mainshock spawned a stable cluster_id, and dependents inherited that cluster while also recording a parent_id. We followed with a backward look-back sweep so potential foreshocks just prior to a mainshock, but inside the same GK window, were also assigned—reducing sequence fragmentation. Sequence consolidation. We built cluster representatives (maximum-magnitude event per cluster) and used a union–find merge step: representatives were merged only when their time windows mutually overlapped and their epicenters were closer than a strict fraction of the smaller GK radius. This dual gate (time ∩ space) prevents accidental merges while consolidating genuine multi-burst sequences. Controls & outputs. The procedure is fully parameterized—time/radius scale factors, minimum window floors, merge-distance fraction, and magnitude wiggle—so sensitivity checks are straightforward. We output a clustered catalog (all events with cluster_id, parent_id, is_mainshock) and a declustered catalog (mainshocks only), alongside simple sanity summaries to confirm expected behavior. Overall, the approach is conservative, reproducible, and aligned with standard practice for preparing catalogs for rate and Gutenberg–Richter analyses. Key findings. Major California sequences appear as spikes: 1933 Long Beach, 1952 Kern County, 1971 San Fernando, 1980 Mammoth Lakes, 1983 Coalinga, 1986 Chalfont Valley Series, 1989 Loma Prieta, 1992 Landers/Big Bear, 1994 Northridge, 1999 Hector Mine, 2003 San Simeon, and 2019 Ridgecrest. ![]()
We estimate G–R parameters on the declustered, completeness-filtered catalog within a fixed, recent window. The objective is a defensible magnitude–frequency model where b controls the small/large event balance and a sets overall productivity. Preparation. Using the Stepp-derived completeness segments, we keep only events with M≥Mc(t) , then restrict to a reproducible LOOKBACK_YEARS window (50 years in this run). We build the frequency–magnitude distribution (FMD) at a fixed bin_width (0.1), computing both incremental counts and cumulative counts N(M ≥ m). Lower bound and model family. The fit is performed on a stable tail above Mmin (respecting the Stepp floor; here Mmin=3.0). We compare:
Estimation & selection. For any chosen lower bound (and candidate break, if piecewise), we use the Aki–Utsu MLE for b with a binning correction:bMLE = log10(e) / (mean(M) − (Mmin − Δ/2)), then aMLE = log10 N(M ≥ Mmin) + b · Mmin, and σb ≈ b/√N. We grid-search Mb and select the model by AICc. OLS on the cumulative FMD is retained as a QC overlay only. Anchoring for simulation. To carry realism into the Monte Carlo, we “SIM-anchor” the piecewise model so its predicted N(M ≥ 4.0) exactly matches the observed rate in the fitting window. This preserves mid-magnitude productivity while still honoring the selected slopes above and below the break. Notes & safeguards. We fit only above completeness, enforce continuity at the break, report AICc for model choice, and publish observed-vs-predicted tables for transparency. The resulting parameters feed the stochastic event generator and all downstream hazard calculations. Key findings. Selecting a broad enough time window and adequate minimum magnitude while also passing a Kolmogorov-Smirnov test proved challenging. After consideration and interpretation of the Frequency-Magnitude fit, we determined a two-slope model would better fit the data. The magnitude 4.3 was selected as it was high enough to best capture the split and low enough for proper event credibility. Some noise still remains at the higher magnitudes but such is to be expected when managing low frequency data.
Magnitude–frequency (log10 N–M) with Gutenberg–Richter fit; straight line indicates b-value. ![]() Magnitude–frequency (log10 N–M) with two-slope Gutenberg–Richter fit; straight lines indicate b-values. This module synthesizes a teaching exposure, maps assets to a reusable hazard grid, and attaches site-condition metrics (Vs30) both at asset locations and at grid centroids. The workflow is parameterized, auditable, and saves intermediate artifacts for reuse. 1) Exposure synthesis (locations & attributes) We load a California state polygon in WGS84 and sample asset coordinates with a metro-weighted strategy plus a uniform rural tail. Metro points are drawn by Gaussian jitter around named metro centers with safeguards to ensure points fall inside the California polygon; rural points are sampled uniformly within the state outline. ![]() Synthetic exposure: metro-biased sampling with a statewide rural tail. 2) Hazard grid construction & asset mapping We build a regular lon/lat grid over the California bounding box using a configurable spacing D (degrees). Each grid cell stores its polygon and centroid (lon_c, lat_c). Assets are assigned to the cell thatcontains them via a spatial join; assets outside the grid’s bounding box use anearest-cell fallback to guarantee a mapping.
3) Vs30 sampling at asset locations We read a Vs30 raster (vs30_slope.tif) and sample values at each asset location. Coordinates are transformed to the raster CRS as needed. We use a bilinear VRT to obtain smooth values and treat raster nodata as NaN. Any nodata or out-of-bounds points are re-sampled with a nearest-neighbor VRT as a targeted fallback. Final values are clamped to a plausible range and remaining invalids are set to a neutral default.
4) Vs30 at grid centroids (for hazard caching) To support hazard precomputation, we also sample Vs30 at each grid cell’s centroid using the same bilinear-then-nearest procedure. This creates a reusable site-condition layer for the grid, avoiding per-run raster I/O during hazard evaluation.
Reproducibility & tuning Random seed Exposure sampling uses a fixed RNG seed for reproducible layouts. Metro mix Adjust weights and jitter to emphasize/relax urban concentration; rural share keeps statewide coverage. Grid resolution D trades accuracy for performance; finer grids increase file size and join costs. Raster sampling Bilinear for smooth Vs30; nearest only for nodata/out-of-bounds recovery. Together, these steps yield a consistent hand-off from catalog curation to hazard and loss modeling: assets are geolocated, binned into grid cells, and enriched with Vs30 both at the asset and grid levels for fast, repeatable downstream computations.
HazardWe simulate earthquake events from the fitted Gutenberg–Richter model and compute site-specific shaking using an OpenQuake GMPE. The pipeline supports finite-fault geometry for large events, realistic aleatory variability (between/within), spatial correlation, and produces per-asset intensity measures for downstream loss modeling.
Inputs Event sampling Finite-fault geometry (large events) For M ≥ FF_MIN_MAG, we replace point sources with a rectangular planar fault: length/width follow Wells–Coppersmith-type scaling; strike is sampled around a CA-typical NW–SE trend; dip and top-of-rupture (ztor) follow shallow strike-slip heuristics and are capped by a seismogenic thickness. We then compute OpenQuake distances (Rjb, Rrup, Rx). Smaller events use a point-source fallback with depth/rake defaults.
Ground-motion model Deterministic ShakeMap add-on Four selected historical/what-if scenarios for Northridge, Loma Prieta, Landers, and Ridgecrest are included by sampling USGS ShakeMap rasters (MMI/PGA) at asset points. We use GDAL’s WarpedVRT for safe on-the-fly reprojection (bilinear with a nearest fallback) and convert between MMI↔PGA as needed.
Safeguards & QA This hazard stage produces the auditable, per-asset shaking dataset that drives the vulnerability/financial layers—while remaining flexible (finite faults, spatial correlation, ShakeMaps) and reproducible via explicit controls and saved artifacts.
Example ShakeMap of the 1994 Northridge earthquake displaying ground shaking intensity at our portfolio asset locations (marker size/color scale with IM).
GMPE residuals split into within‑ and between‑event components; QQ panel checks normality. VulnerabilityThis stage converts per-asset ground motions into damage and monetary loss. We use auditable vulnerability curves driven by spectral acceleration at each building’s period (SA(T1)), apply secondary uncertainty (with optional spatial correlation), and stream results to disk for large simulations.
Inputs Damage model For each occupancy, the mean damage ratio (MDR) is a capped logistic in log-IM: Secondary uncertainty We sample a lognormal factor with mean 1: exp(σ_SU·z − 0.5·σ_SU²), where z ~ N(0,1). σSU is curve- and era-dependent. Loss calculation Safeguards & QA Result: a transparent loss engine that honors building-period effects, uncertainty (including spatial correlation), and portfolio heterogeneity—yielding per-event, per-asset losses ready for financial terms and risk aggregation.
Asset Vulnerability Curve
Financial TermsIn our final module , we turn per-asset gross losses from the Vulnerability stage into financial losses by applying per-risk occurrence terms (deductibles, limits, optional per-occurrence caps), then rolls them up to per-event totals. As a result, per‑risk deductibles and occurrence limits compress small/medium losses while preserving the tail. We compare Gross vs Net PML (OEP) at standard return periods to make the effect concrete.
Gross vs Net PML (OEP) at 5/10/50/100/250/500 years. Results |