Implications of sample size, rareness, and commonness for derivation of environmental benchmarks and criteria from field and laboratory data.


Exponent, 1150 Connecticut Avenue NW, Suite 1100, Washington, DC, 20036, USA. Electronic address: [Email]


Tabulations of numerical concentration-based environmental benchmarks are commonly used to inform decisions on managing chemical exposures. Benchmarks are usually set at levels below which there is a low likelihood of adverse effects. Given the widespread use of tables of benchmarks, it is reasonable to expect that they are adequately reliable and fit for purpose. The degree to which a derived benchmark reflects an actual effect level or statistical randomness is critically important for the reliability of a numerical benchmark value. These expectations may not be met for commonly-used benchmarks examined in this study. Computer simulations of field sampling and toxicity testing reveal that small sample size and confounding from uncontrolled factors that affect the interpretation of toxic effects contribute to uncertainties that might go unrecognized when deriving benchmarks from data sets. The simulations of field data show that it is possible to derive a benchmark even when no toxicity is present. When toxicity is explicitly included in simulations, imposed effect threshold levels could not always be accurately determined. Simulations were also used to examine the influence of mixtures of chemicals on the determination of toxicity thresholds of chemicals within the mixtures. The simulations showed that data sets that appear large and robust can contain many smaller data sets associated with specific biota or chemicals. The sub-sets of data with small sample sizes can contribute to considerable statistical uncertainty in the determination of effects thresholds and can indicate that effects are present when they are absent. The simulations also show that less toxic chemicals may appear toxic when they are present in mixtures with more toxic chemicals. Because of confounding in the assignment of toxicity to individuals chemicals within mixtures, simulations showed that derived toxicity thresholds can be less than the actual toxicity thresholds. A set of best practices is put forward to guard against the potential problems identified by this work. These include conducting an adequate process of determining and implementing Data Quality Objectives (DQOs), evaluating implications of sample size, designing appropriate sampling and evaluation programs based on this information, using an appropriate tiered evaluation strategy that considers the uncertainties, and employing a weight of evidence approach to narrow the uncertainties to manageable and identified levels. The work underscores the importance of communicating the uncertainties associated with numerical values commonly included in tables for screening and risk assessment purposes to better inform decisions.


Environmental benchmarks,Randomness,Sample size,Statistical artifacts,Thresholds,