Why Your Groundwater Model Is Already Out of Date (And What to Do About It)

The Groundwater Modeling Problem Nobody Talks About

There are roughly 60,000 leaking underground storage tank (LUST) releases in the EPA's active remediation backlog, and over 1,343 active Superfund sites on the National Priorities List.(1,2) The average remediation timeline for both exceeds 10 years.(3) Most of these sites share one thing: either no numerical groundwater model at all, or a static model built once by a consultant, delivered as a PDF report, and never updated again.

That's not a criticism of the consultants who built them. It's a structural problem. The science of groundwater contamination modeling has advanced considerably over the past two decades - ensemble calibration methods, automated parameter estimation, uncertainty quantification. None of it has meaningfully reached standard practice.

A 2024 study of operational groundwater modeling workflows found the gap plainly: "Important academic advances in groundwater modelling over the past two decades have not translated into practical application. Uncertainty quantification is rarely undertaken."(4) None of the workflows analyzed included any form of automated data assimilation.(4)

This article explains why that gap exists, what it costs, and how automated groundwater modeling is beginning to close it.

‍

What Groundwater Contamination Modeling Actually Does

When regulators ask where a contamination plume is now, where it will be in ten years, and when concentrations will cross thresholds near neighboring receptors - those questions cannot be answered by measurement alone. Tools like GWSDAT provide excellent spatiotemporal analysis of where a plume has been. Predicting future behavior requires a numerical model.

A groundwater numerical model subdivides the subsurface into a three-dimensional grid of cells - typically 50,000 to 500,000 for a contaminated site - each representing a discrete volume of earth.(5) Every cell is assigned hydraulic conductivity, porosity, contaminant transport properties, and biodegradation rates. The model solves physics equations across the entire grid, producing simulated groundwater levels and contaminant concentrations at every point in the domain.

The challenge is calibration. Hydraulic conductivity alone can vary by three to five orders of magnitude within a single site.(6) Fine-grained clay lenses sitting next to coarse gravel channels, buried infrastructure nobody mapped, fill material with unknown properties - all create flow pathways the model needs to capture while the sparse well network can barely detect them.

A typical contaminated site model contains 50,000 to 500,000 grid cells but is calibrated against data from only 10 to 30 monitoring wells.(5) That ratio - vastly more unknowns than observations - is not a solvable problem in the traditional sense. It is a fundamentally underdetermined problem.

Why a Single "Best Fit" Model Is the Wrong Answer

The standard response to this underdetermined problem is to calibrate once and report a single result. A consultant adjusts model parameters until simulated water levels match measured values, runs one simulation, and delivers one prediction.

That prediction looks precise. It isn't.

With sparse data - twenty wells spread across a multi-hectare site - many different parameter combinations can produce equally good fits to the observations. A high-conductivity zone in one location could be offset by a low-conductivity zone elsewhere; both configurations match the measured water levels. This is what hydrogeologists call the Curse of Equifinality: a single "best fit" model conceals many other equally plausible representations of the subsurface.

"The plume will reach the property boundary in 12 years" sounds actionable. But it hides the question that actually drives capital planning and regulatory negotiation: what is the confidence level of that assessment?

Ensemble methods address this by treating calibration as a probabilistic exercise rather than a point estimation problem. Instead of calibrating once, calibration is performed multiple times across hundreds of different parameter sets. Each produces a slightly different model. The spread across all predictions generates uncertainty bounds at every grid cell - a measured range reflecting how much the data actually constrains the answer.

*Single Prediction vs. Ensemble Approach. Source: LiORA PlumeFutures white paper, Allan MacGregor, CTO, 2026.*

The resulting output - "there is a 90% probability the plume reaches the property boundary between 8 and 16 years" - is less tidy than a single number. It is far more useful for decisions that involve real money and real regulatory consequences.

Where Most Industry Practice Sits Today

Most contaminated site work operates at what might be called Level 0 or Level 1 on a groundwater modeling maturity spectrum:

Level 0 - Static model. Built once by a consultant, delivered as a PDF, never updated. The model reflects conditions at the time of construction. When conditions change - new sampling data arrives, remediation progresses, seasonal patterns shift - the model doesn't.

Level 1 - Periodic refresh. Model recalibrated annually or at regulatory five-year reviews. Still manual. Still delivers periodic static snapshots rather than continuously updated predictions.

Level 2 - Continuous shadow. Real-time sensor data displayed alongside model predictions. Humans compare the two and decide when recalibration is warranted.

Level 3 - Predictive shadow. Automated recalibration incorporating new data as it arrives. Ensemble-based predictions with uncertainty quantification and forward projections under multiple remediation scenarios simultaneously.

*Groundwater Modelling Maturity Spectrum. Source: LiORA PlumeFutures white paper, Allan MacGregor, CTO, 2026.*

The tools required to reach Level 3 already exist and are free. MODFLOW6, the USGS groundwater simulation standard accepted by regulators across North America, has been open source since the 1980s.(7) PEST++, the parameter estimation software that enables ensemble-based calibration, is also free and open source. The barrier is not technology. It is operational: nobody has automated the pipeline that turns raw site data into a calibrated, uncertainty-quantified groundwater model and keeps it current as new data arrives.

The Data Problem: What Quarterly Sampling Actually Produces

Standard practice across the roughly 60,000 sites in the EPA's LUST backlog is quarterly grab sampling - producing two to four concentration measurements per monitoring well per year.(1) At a 40-well site, that equals 80 to 160 total data points annually. Large contaminated sites can spend $200,000 per year on this sampling;(8) smaller sites typically spend $10,000 to $50,000.(9)

That data poverty has a direct statistical consequence for plume management.

The Mann-Kendall trend test - the standard non-parametric method regulators use to determine whether a contaminant plume is stable, growing, or shrinking - requires substantial data to reach meaningful conclusions. With two to four samples per well per year, the test cannot reliably distinguish real trends from sampling noise for 10 to 12 years.(8) It is not a flaw in the test. It is a statistical power problem. You simply don't have enough observations.

The consequence is that plume stability determinations - the metric that tells regulators a site is heading toward closure - take a decade by design under standard monitoring practice.

*Annual Data Points per Monitoring Well. Source: LiORA PlumeFutures webinar, 2024.*

Continuous sensor data changes this math entirely. LiORA's NDIR (Non-Dispersive Infrared) subsurface gas sensors produce approximately 17,000 measurements per sensor per year, sampling every 30 minutes.(8) Across a 40-sensor network, that equals 680,000 data points annually versus 160 from quarterly sampling at the same site. With that data density feeding an automated groundwater model, plume stability can be determined within approximately six months rather than ten to twelve years.(8)

The model isn't smarter. The data is denser.

*Time to Determine Plume Stability. Source: LiORA PlumeFutures webinar, 2024.*

What Automated Groundwater Modeling Produces

PlumeFutures, LiORA's automated groundwater modeling platform, is built on MODFLOW6 - the same simulator regulators already trust - with an automated pipeline that handles grid construction, parameter estimation, ensemble calibration, and forward projection.

What goes in: Historical sampling records, Phase II Environmental Site Assessment reports, and digital elevation models. No sensors are required to initialize a model. When LiORA's continuous sensors are deployed, the platform additionally ingests 17,000 measurements per sensor per year to continuously refine calibration.(8)

What comes out: Twenty-year forward projections of contaminant plume migration, evaluated simultaneously under three remediation scenarios:

Baseline - no intervention. The "do nothing" trajectory used as a reference for comparison.

NSZD (Natural Source Zone Depletion) - enhanced natural biodegradation. When LiORA sensors are deployed, NSZD rates are calculated from measured soil gas concentration gradients rather than literature estimates, making the scenario site-specific rather than generic.

Biostimulation - active remediation via nutrient or electron acceptor injection to stimulate microbial degradation of petroleum hydrocarbons.

Every prediction includes uncertainty bounds - not a single line on a graph, but a distribution of outcomes reflecting parameter uncertainty across the ensemble of calibrated models.

Three core decision metrics drive the outputs:

Time-to-stability. When will the plume stop changing? A plume is considered stable when concentration and area change by less than 1% per year.(10) This determines whether a site is heading toward closure or still evolving.

Time-to-fence-line. When will contamination reach the property boundary or nearest sensitive receptor? This is the metric that triggers regulatory intervention and drives capital allocation decisions.

Portfolio triage. In a typical portfolio of contaminated sites, approximately 90% are stable or naturally attenuating and need monitoring but not intervention. The remaining 10% have migrating or growing plumes that require action.(8) Identifying that 10% without commissioning individual studies for every property is the portfolio-scale problem automated modeling was built for.

What This Means for Site Managers

When sensors are deployed, PlumeFutures performs monthly data fusion. Each month, new sensor data is incorporated into the existing calibrated model. The model recalibrates against both historical and new observations. Forward projections regenerate with updated parameters.

Over time, uncertainty generally decreases in areas with good sensor coverage. Where it increases - where new data conflicts with prior assumptions - that is the model surfacing what the original site characterization missed. Buried infrastructure deflecting flow. Geological heterogeneity not captured in initial boring logs. That increase in uncertainty may be the most valuable output in some cases: it tells you where to look next.

For site managers and environmental engineers, the practical shift is from "model it once and hope" to "model it continuously and know." Plume stability determination moves from a decade-long waiting game to a six-month assessment, which means closure conversations with regulators can start years earlier. For anyone managing a portfolio of 20 or 50 or 200 contaminated sites, the question "which ones actually need attention this quarter?" becomes answerable without a separate study for each property.

What Automated Groundwater Modeling Cannot Do (Yet)

Environmental professionals detect overselling immediately. Automated groundwater modeling has clear limits that are worth stating directly.

It is not a digital twin. Automated modeling at Level 3 - ensemble predictions with uncertainty quantification - does not close the loop. It does not recommend remediation actions and does not execute them. The predictions inform human decisions. Moving to Level 4, prescriptive models that recommend specific interventions, requires regulatory frameworks for model-driven decision-making that do not yet exist in most jurisdictions.

It does not eliminate subsurface uncertainty. More data reduces epistemic uncertainty - things you could know but don't yet. Some uncertainty is irreducible. The subsurface will always have more heterogeneity than any practical number of sensors can measure. The ensemble approach quantifies this honestly rather than hiding it behind a single prediction that implies more precision than the data supports.

Spatial resolution is bounded by well locations. Continuous sensors provide extraordinary temporal density - 17,000 measurements per year per location - but spatial density does not change. The space between wells is interpolated. More wells improve spatial resolution. Sensors solve the temporal problem, not the spatial one.

Not every site type is equally well-served. The automated pipeline is built for petroleum hydrocarbon contamination at sites with sufficient historical data to initialize a model. Complex mixed-contaminant sites, fractured bedrock geology, or sites with minimal data histories require more manual expert interpretation than the automated pipeline currently handles.

Regulatory acceptance is evolving. MODFLOW-based models are widely accepted because MODFLOW is the USGS standard. Continuously updated, automated models are newer territory. In most jurisdictions today, this is a supplemental use situation - the monitoring requirement is not going away, but the monitoring data becomes more useful and the resulting predictions more defensible.

Frequently Asked Questions

What is groundwater contamination modeling?

Groundwater contamination modeling uses numerical simulation to predict how contaminants move through the subsurface over time. A model subdivides the site into a three-dimensional grid, assigns hydraulic and transport properties to each cell, and solves physics equations to calculate contaminant concentrations at every location and time step. This enables prediction of where a plume will be in the future - answering the regulatory questions that measurement alone cannot address.

What is MODFLOW and why do regulators accept it?

MODFLOW is a modular finite-difference groundwater flow model developed and maintained by the US Geological Survey since 1984.(7) It is peer-reviewed, open source, and the de facto standard for groundwater simulation in North America and internationally. Regulators accept MODFLOW-based models because the underlying science is established, the software is publicly available for independent review, and decades of case studies support its application across a wide range of site conditions.

How long does it take to determine plume stability?

With traditional quarterly sampling - two to four data points per well per year - the Mann-Kendall trend test requires 10 to 12 years to reach statistical significance for plume stability determination.(8) With continuous sensor data providing 17,000 measurements per sensor per year, the same determination can be made in approximately six months.(8) The difference is data density, not modeling sophistication.

What is the difference between a deterministic and ensemble groundwater model?

A deterministic model calibrates once and produces a single prediction. An ensemble model calibrates multiple times across hundreds of different parameter sets and produces a distribution of predictions. The ensemble approach quantifies uncertainty at every grid cell rather than hiding it behind a single point estimate. This produces outputs like "90% probability the plume reaches the property boundary between 8 and 16 years" rather than "the plume will reach the boundary in 12 years" - less tidy, but far more useful for capital planning and regulatory negotiation.

What is plume stability and why does it matter for site closure?

Plume stability means concentration and area are changing by less than 1% per year.(10) It is the primary metric regulators use to assess whether a site is heading toward closure or still evolving. Demonstrating plume stability is typically required before risk-based closure conversations can begin. Because traditional quarterly sampling requires a decade to establish stability statistically, accelerating that determination has direct impact on closure timelines and the cost of carrying a site through remediation.

Can groundwater models be used for portfolio-level decisions?

Yes. Portfolio triage is one of the most valuable applications of automated modeling. In a typical portfolio of contaminated sites, approximately 90% are stable or naturally attenuating and need monitoring but not active intervention. The remaining 10% have migrating or growing plumes that require attention.(8) Identifying that 10% without commissioning individual studies for every property is the portfolio-scale problem that automated modeling at scale is designed to solve.

Does automated groundwater modeling replace hydrogeologists?

No. Automated modeling handles the mechanical work: grid construction, parameter estimation, calibration iterations, scenario projection. That frees hydrogeologists to do what they are trained for - interpreting results in geological context, exercising professional judgment about remediation strategy, and explaining model outputs to regulators and stakeholders. The bottleneck has always been the hydrogeologist hand-tuning parameters in a desktop GUI one site at a time. Automation removes that bottleneck without removing the professional judgment that makes model outputs defensible.

References

US Environmental Protection Agency, Office of Underground Storage Tanks. Approximately 60,000 leaking underground storage tank (LUST) releases remain in the active remediation backlog as of 2025. EPA, "UST Performance Measures," updated annually. https://www.epa.gov/ust/ust-performance-measures

US Government Accountability Office (GAO), 2025. There are 1,343 active sites on the EPA's National Priorities List (Superfund). https://www.gao.gov/products/gao-20-73

US Government Accountability Office (GAO), reports on Superfund and LUST programme performance. Average remediation timelines exceed 10 years for both Superfund NPL sites and LUST sites.

Study of Swedish groundwater modelling practices, published 2024, PubMed Central (PMC). Surveyed operational groundwater models and found most used fewer than 25 parameters, many calibrated manually with fewer than 10. No automated parameter estimation or ensemble methods employed. None of the analyzed workflows included automated data assimilation.

Standard hydrogeological modelling practice. Grid cell counts vary by site complexity; 50,000 to 500,000 cells is representative of typical contaminated site models using MODFLOW. See also: Freeze, R.A. and Cherry, J.A., Groundwater (Prentice-Hall, 1979).

Hydraulic conductivity variation. See Freeze, R.A. and Cherry, J.A., Groundwater (Prentice-Hall, 1979). Within a single contaminated site, three to five orders of magnitude variation is common due to geological heterogeneity.

US Geological Survey (USGS). MODFLOW first released in 1984 as a modular finite-difference groundwater flow model. MODFLOW6 is the current version, open source. https://www.usgs.gov/software/modflow-6-usgs-modular-hydrologic-model

PlumeFutures webinar, "Predictive Risk Assessment in Groundwater Contamination," Steve Siciliano (CEO, LiORA), 2024. Source for: 17,000 measurements/sensor/year (30-minute interval × 48/day × 365 days); plume stability in 6 months vs. 10-12 years; Mann-Kendall statistical power limitations; $200,000/year sampling cost at large sites; 500,000+ data points/month from sensor network; 90% portfolio stability figure.

US Environmental Protection Agency, cost analysis of LUST programme site monitoring. Smaller contaminated sites typically spend $10,000 to $50,000 per year on quarterly groundwater sampling programmes.

PlumeFutures scenario planning demo, LiORA, 2024. Plume stability defined as concentration and area changing by less than 1% per year.

For groundwater contamination modeling, plume stability assessment, and portfolio-scale monitoring optimization, contact LiORA at joinliora.com/contact.

‍

Author

Allan MacGregor

As Head of Software Engineering at LiORA, Allan MacGregor brings 18 years of experience scaling platforms for massive datasets in regulated environments. Previously CTO at Humi, where he led technology strategy through rapid growth and a successful acquisition, Allan serves as LiORA's primary technical lead for AI model architecture, training infrastructure, and deployment. He is focused on closing the gap between academic advances in groundwater modeling and what actually runs in production at contaminated sites.