How It Works 3 STAGES · CONCURRENT · SUB-2 MIN

From continent to pixel, concurrently.

Every storm in the country is watched, scored live, and the moment a hail signature emerges — segmented to the pixel while monitoring continues uninterrupted.

STAGE 01

Monitor everything.

Every storm in the country — tracked. Radar scans land every two minutes across the national network; each storm cell is matched, tagged, and followed from birth to dissipation.

continent-wide 2-min cadence always-on

MONITOR continuous coverage

hail 12^%

hail 41^%

hail 86^%

STAGE 02

Score every storm, live.

Anvil assigns a calibrated hail probability to every tracked storm, refreshed every scan. Physics-trained and continuously validated against millions of verified ground-truth reports.

physics-trained 83 features calibrated

STAGE 03

Segment. Analyze every pixel.

The moment a storm crosses threshold it forks into deep analysis while monitoring continues uninterrupted. The storm volume is sliced into a pixel grid and every cell scored independently — probability, size, severity, trajectory — resolved to the meter. 47 columns per pixel, refreshed every scan.

per-pixel probability per-pixel size per-pixel trajectory

storm object

48 pixels · 47 cols each

DELIVERED

Hyperlocal Alerts

Per-address precision. Push, SMS, email, and webhook — dispatched the second a pixel lights up near you.

Permanent Archive

Every pixel saved forever. Queryable history for any location — events, damage paths, climatology.

Structured Data

Real-time feeds, historical exports, and webhook events delivered to the systems your business runs on.

Forward-Looking Risk

Hail outlooks, property-specific threat scores, and long-range risk — informed by pixel-level history.

Observing Network 5 SOURCES · CONTINENTAL · 2-MIN CADENCE

What Anvil sees.

Continental-scale multi-sensor fusion refreshed every two minutes — the raw observational surface the model is trained and scored on.

Radar NEXRAD · MRMS national network · 1 km mosaic

Numerical models HRRR 80+ variables · 3 km grid

Satellite · lightning GOES · GLM storm-top · electrical

Surface obs ASOS · mesonet ground-level context

Ground truth LSR · SPC · CoCoRaHS continuous training loop

Benchmark QA VALIDATED · 28,070 TRACKS · 8.8M STORMS

Anvil leads on every test that matters.

Head-to-head against NOAA/CIMSS ProbSevere v3 and NOAA's MESH radar baseline. 24 months of live data, 8.8 million tracked storms, 28,070 verified hail tracks.

1.6 ×

More hail caught

VS PROBSEVERE V3

87 %

Verified when we fire

MULTI-EVIDENCE PRECISION

4.3 ×

Cleaner in winter

MESH FP-PER-CATCH VS ANVIL

82 %

Giant hail detected

REPORTS ≥ 50 MM

Head-to-head ANVIL · PROBSEVERE · MESH

The receipts.

Track-level verification on the same 24-month window. One row per metric. Trophy marks where Anvil wins outright.

Metric	Anvil Hail Sentinel	ProbSevere v3 NOAA / CIMSS	MESH NOAA physics baseline	Verdict
Real hail caught	15,104 of 28,070	9,262 33%	12,195 43%	1.6× more than ProbSevere
Storms the model even scores	99%	34%	100%	ProbSevere refuses to score 66%
Giant hail caught (≥ 2″)	82%	49%	67%	82 of every 100 caught
Hail only Anvil saw	6,082 storms	—	—	47% of ProbSevere's blind spots

Evaluation window: 2024-02-01 → 2026-01-31 · 8,811,943 storm tracks · 28,070 verified hail tracks. Model: anvil_v1_r16-2026q2-v2. Comparison unit: per storm track (aggregated over all scans of a tracked cell). MESH treated as binary detection at ≥ 25.4mm (NOAA severe threshold). Ground-truth sourced from NOAA LSR, SPC reports, and CoCoRaHS, deduped within 5 km · 1 min bins.

Physically impossible CLEAN FALSE-POSITIVE TEST · 3 REGIMES

The false-positive test that can’t be gamed.

Fire rate in three regimes where hail is physically impossible — no CAPE, no growth zone, no cold-cloud ice. A fire here is a false positive regardless of whether anyone filed a report.

Fire rate per million tracks, measured only in environments where hail is meteorologically impossible. No growth zone, no updraft, no instability. A model firing here is firing on nothing — no underreporting caveat applies.

No instability

MUCAPE < 100 J/kg · weak mid-level lapse

1,083,382tracks in regime

Anvil171 /M

ProbSevere v3193 /M

MESH3,686 /M

MESH fires 21.6× more often than Anvil here

Tropical warm rain

Freezing level > 4.5 km · rain-dominant

1,192,799tracks in regime

Anvil286 /M

ProbSevere v31,076 /M

MESH3,633 /M

MESH fires 12.7× more often than Anvil here

Winter graupel

Surface < 5°C · WBZ < 2 km · no CAPE

71,692tracks in regime

Anvil70 /M

ProbSevere v30 /M

MESH6,193 /M

MESH fires 88.8× more often than Anvil here

The underreporting gap RADAR SAW IT · NO ONE REPORTED IT

Hail nobody files.

Report databases miss rural, overnight, and mountain hail. Grading recall against the radar's own polarimetric signature instead of filed reports shows how wide the reporting gap really is.

96%

Of 135,826 storm tracks that showed the textbook polarimetric hail signature, no report was ever filed.

That is the gap every raw “false alarm” number ignores. Rural hail, overnight hail, hail that melted before anyone drove out to see it — none of it makes the verification record. When Anvil fires on a storm and no one filed a report, the odds it was real hail are still enormous.

28,070

Filed reports over 24 months

LSR + SPC + CoCoRaHS, deduped, severe only (≥ 19 mm). For every reported hail track, the radar saw 4.8× more tracks with a textbook hail signature.

Lower bound

20%

strict report match

treats every unreported hailstorm as a false fire

Upper bound

87%

any independent evidence

match, ensemble, sustained MESH, or radar signature

Strict

report within 15 km · 10 min

Expanded

report within 50 km · 60 min

Corroborated

any independent evidence

Anvil

Hail Sentinel

77,469 fires

20%

68%

87%

ProbSevere v3

NOAA / CIMSS

60,605 fires

15%

43%

78%

MESH

NOAA radar baseline

97,709 fires

12%

34%

76%

What backs an Anvil fire

Share of Anvil's 77,469 firing tracks supported by each evidence type. Categories overlap — one fire can have multiple sources of backing.

20%

Strict report match

15 km / ±10 min

68%

Wide neighborhood

50 km / ±60 min

50%

Ensemble agreement

≥ 2 of 3 models fired

13%

Sustained MESH

≥ 5 scans ≥ 25.4 mm

25%

Polarimetric hail sig

Z↑ ZDR↓ RhoHV↓

The scenarios that break legacy systems COLD · MARGINAL · GIANT

Winter, small hail, catastrophic hail.

The three buckets where legacy models fail — MESH’s winter graupel confusion, the marginal bucket where most claims originate, and giant hail where a miss is a loss event. Anvil leads in all three.

Winter false alarms

Cold-season cost

Wrong fires per real catch, Nov–Mar. Where MESH confuses winter graupel for hail.

Anvil3.8×Lowest

ProbSevere v35.8×

MESH16.3×

Anvil fires 4.3× cleaner than MESH in winter

Marginal hail

19–32 mm catch rate

Sub-severe hail that still damages vinyl siding, vehicle paint, and roofing granules.

Anvil30%Highest

ProbSevere v320%

MESH24%

Anvil catches 6 pts more marginal hail than next best

Giant hail

≥ 50 mm catch rate

Baseball+ hail. Total-loss roof events. Recall is everything — miss one and the claim is a surprise.

Anvil82%Highest

ProbSevere v349%

MESH67%

82 of every 100 giant events detected

Marginal reports 19–25 mm “marble to quarter” 2,421 truth events

Anvil

30.0%

CSI 0.300

ProbSevere

19.7%

CSI 0.197

MESH

23.8%

CSI 0.238

Anvil’s discrimination quality is +26% vs the next-best model on this bucket

Severe reports 25–50 mm “quarter to golf ball” 22,070 truth events

Anvil

51.8%

CSI 0.518

ProbSevere

31.8%

CSI 0.318

MESH

41.7%

CSI 0.417

Anvil’s discrimination quality is +24% vs the next-best model on this bucket

Giant reports ≥ 50 mm “baseball+” 3,579 truth events

Anvil

82.2%

CSI 0.822

ProbSevere

49.2%

CSI 0.492

MESH

67.3%

CSI 0.673

Anvil’s discrimination quality is +22% vs the next-best model on this bucket

Each bucket shows two honest metrics side by side. POD (the bar) rewards catching real hail; CSI (the chip) penalises false alarms too — so a detector that fires on every pixel can’t game it. MESH edges Anvil on raw POD for marginal hail because it fires indiscriminately, but its CSI lags in every bucket. Anvil wins the composite metric at every size.

Lead time ANVIL FIRES FIRST · SHARED INTERSECTION

Anvil fires first.

On tracks where both models fire in advance, who gets to the call sooner. Apples-to-apples — no coverage bias.

AnvilvsProbSevere v3

83%of tracks, Anvil fires first or at the same scan

Anvil fires first

31%

Tied on same scan

51%

PSv3 fires first

18%

Across 3,634 storm tracks where both warned in advance — Anvil is never slower on 83% of them, with a +0.8 min average time advantage.

AnvilvsMESH

83%of tracks, Anvil fires first or at the same scan

Anvil fires first

21%

Tied on same scan

62%

MESH fires first

17%

Across 4,700 storm tracks where both warned in advance — Anvil matches MESH's radar-speed detection on 83% of tracks, while firing ~50% fewer false alarms in the same window (see the false-alarm cost panel above).

Lead time on the shared set — 2,981 tracks where all three models warned in advance

Anvil15.7min avg

ProbSevere15.2min avg

MESH14.8min avg

7,247tracks Anvil warned in advance

4,818tracks PSv3 warned in advance

6,451tracks MESH warned in advance

28,070total tracks producing hail

For every storm track that produced a verified hail report, we measured when each model first crossed the alerting threshold and compared it to the moment hail reached the ground. Comparing only on tracks where both models warned in advance removes the selection bias from per-model averages — PSv3's raw lead average is inflated by the fact that it mostly fires on long-lived, organised storms.

Methodology 24-MONTH WINDOW · 3 TRUTH SOURCES

Model under test: Anvil v1 r16 (anvil_v1_r16-2026q2-v2) vs NOAA/CIMSS ProbSevere (v2 through 2025-08-05, v3 thereafter) vs NOAA MESH/POSH.

Evaluation window: 2024-02-01 → 2026-01-31 (24 months). Hail reports sourced from NOAA Storm Prediction Center, NWS Local Storm Reports, and CoCoRaHS, deduped within 5 km · 1 min bins, restricted to ≥ 19.05 mm (0.75″ severe threshold).

Verification unit: per storm track (not per radar scan). A storm track is counted once, verified if any of its scans matched any ground-truth report within 15 km / ±10 min. A model "fired" if its score ever crossed threshold (prob > 0.3 for Anvil/PSv3, MESH ≥ 25.4 mm) on any scan of that track. This removes per-scan timing bias (models shouldn’t be penalised for firing early in a storm’s life before the hail reaches the ground and the observer files a report).

Two truth channels. Because ~30–50% of severe hail never reaches the report database (Wendt & Jirak 2021; Blair 2020), we also verify against the polarimetric hail signature on the radar itself: composite reflectivity ≥ 55 dBZ, differential reflectivity ≤ 1 dB, correlation coefficient ≤ 0.95, and a 50 dBZ echo top ≥ 8 km. Recall is reported against each channel independently.

Physics-anchored false-positive test. Precision measured only against incomplete reports systematically penalises every model for catching real hail. The cleaner test is fire rate in environments where hail is meteorologically impossible — three regimes gated on HRRR (MUCAPE, freezing level, wet-bulb zero, surface temperature). A fire in those regimes is a false positive regardless of reports.

Feature Architecture 83 FEATURES · 6 DOMAINS · ~3× PROBSEVERE V3

How Anvil sees.

83 engineered features across six domains — more than three times the 25-feature surface of ProbSevere v3. Each domain captures a different physical signature of a hail-producing storm.

I · Radar foundations raw observables polarimetric + reflectivity + motion

II · Microphysical discrimination hail vs rain vs graupel learned signal combinations

III · 3D convective architecture updraft structure vertical extent + organization

IV · Environmental context HRRR state instability · shear · moisture

V · Spatiotemporal tracking motion + lifecycle intensification derivatives

VI · Ground-truth learning closed feedback loop LSR · SPC · CoCoRaHS · damage

Research paper in preparation — 2026