2 types of commonly used capillary columns, how do you prepare column chromatography, why column choice is important in hplc, how does column diameter affect column chromatography, what is non-target analysis.
--> Jul 21 2022 --> Read 3507 Times -->
From gas chromatography and HPLC to ion-exchange, gel permeation and affinity, there are various different types of chromatography to choose from. But there are also some other ways in which these methods can be categorised. One is the distinction between targeted and non-target analysis. Read on as we discuss what exactly non-target analysis means…
Targeted vs non-target analysis
The best way to define non-target analysis is relative to the targeted analysis most people are more familiar with. Put simply, targeted analysis refers to the process of looking for certain, known chemicals. Researchers will know their characteristics, such as retention times and mass spectra, so they can target them specifically.
As a result, targeted analysis does not assess other chemicals that are present in a given sample. By limiting the amount of data being measured, they can increase selectivity and sensitivity.
So what about non-target analysis? Well, in short, it’s quite the opposite. Non-target analysis (NTA) aims to identify all chemicals present in a given sample. While it’s not possible to detect everything, researchers can use NTA to identify both known and unknown components for a better idea of a sample’s makeup.
One of the key differences is that NTA aims to maximise the number of components being detected and observe changes, while targeted analysis looks at a small number of components. As a result, NTA required far more data to be processed.
Applications of non-target analysis
The central benefit of non-target analysis is that it gives a more complete picture of what makes up a sample. That lends it to a wide range of applications, including food testing, environmental analysis, metabolomics and the oil industry.
If researchers want to find a biomarker for a certain disease, for example, they’ll need to conduct non-target analysis to get a broad idea of the components within patient samples – and lots of them.
Another common example is the presence of harmful chemicals in the environment. Non-target analysis can be used to produce a more comprehensive sample set for future work, as discussed in the article ‘Comprehensive, Non-Target Characterisation of Blinded Environmental Exposome Standards Using GCxGC and High Resolution Time-of-Flight Mass Spectrometry’.
Demand going forward
Non-target analysis is expected to become more in-demand and dominant in the coming years with new technologies developed specifically with NTA in mind. That includes high-speed and high-resolution mass spectrometry as well as multidimensional chromatographic separations. Most importantly, that technology will need to be able to process high amounts of data into useful chemical information.
That doesn’t mean, however, that targeted analysis will become obsolete or even less important. Once chemicals have been identified, targeted analysis is critical to identify them in new samples and continually monitor them. That could be the identification of known biomarkers for a disease in a given patient’s sample, to use one of the scenarios above.
The following techniques are used by regulatory agencies, environmental scientists, and industries to monitor and manage dioxin contamination effectively. He...
Analysing for Dioxins
--> Nov 22 2023 --> Read 466 Times -->
In an era where food safety is paramount, rigorous testing and analysis are essential to protect consumers from potentially harmful additives. One such addit...
Ensuring Food Safety: Analysing Butylated Hydroxytoluene (BHT) with Adva...
--> Nov 15 2023 --> Read 354 Times -->
The pervasive issue of microplastic pollution in our oceans and waterways has garnered significant attention in recent years, raising concerns about its pote...
Analysing for Microplastics in Water Using Pyrolysis
--> Nov 08 2023 --> Read 688 Times -->
Analysing shellfish using chromatography for food safety, quality control, and regulatory compliance. Shellfish, including molluscs and crustaceans, are wide...
Chromatography Techniques used in Shellfish Analysis
--> Aug 14 2023 --> Read 1276 Times -->
Chromatography today - buyers' guide 2022.
In This Edition Modern & Practical Applications - Accelerating ADC Development with Mass Spectrometry - Implementing High-Resolution Ion Mobility into Peptide Mapping Workflows Chromatogr...
View all digital editions
Charge variant AEX-MS analysis of IgG4-based mAbs!
Incorporating Green Chemistry for Robust, Rugge...
What makes the dynamic binding capacity so impo...
Analysing for Microplastics in Water Using Pyro...
Medlab Middle East
Feb 05 2024 Dubai, UAE
Feb 19 2024 Berlin, Germany
Feb 24 2024 San Diego, CA, USA
16th International Conference & Expo on Chromatography Techniques
Feb 27 2024 Dubai, UAE
China Lab 2024
Mar 05 2024 Guangzhou, China
View all events
International Labmate Limited Oak Court Business Centre Sandridge Park, Porters Wood St Albans Hertfordshire AL3 6PH United Kingdom
T +44 (0)1727 858 840 F +44 (0)1727 840 310 E [email protected]
Our other channels
Copyright © 2023 Chromatography Today. All rights reserved.
- Terms & Conditions
Non-targeted analysis (NTA) and suspect screening analysis (SSA): a review of examining the chemical exposome
- 1 School of Engineering, Brown University, Providence, RI, 02912, USA. [email protected].
- 2 Agricultural & Environmental Chemistry Graduate Group, University of California, Davis, Davis, CA, 95616, USA.
- 3 Department of Epidemiology, Brown University, Providence, RI, 02912, USA.
- 4 Exposure and Biomonitoring Division, Environmental Health Science and Research Bureau, Health Canada, Ottawa, ON, Canada.
- 5 Department of Environmental Health Sciences, Yale School of Public Health, New Haven, CT, 06520, USA.
- 6 Department of Chemistry, Faculty of Science, University of Chile, Santiago, RM, Chile.
- 7 School of Public Health, San Diego State University, San Diego, CA, USA.
- 8 Office of Research and Development, U.S. Environmental Protection Agency, Washington, DC, USA.
- 9 School of Engineering, Brown University, Providence, RI, 02912, USA.
- 10 National Institute of Standards and Technology, 100 Bureau Dr, Gaithersburg, MD, 20899, USA.
- 11 Department of Environmental Health & Engineering, Johns Hopkins University, Baltimore, MD, 21205, USA.
- 12 Risk Sciences and Public Policy Institute, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA.
- 13 Division of Biology, Chemistry and Materials Science, Office of Science and Engineering Laboratories, Center for Devices and Radiological Health, Food and Drug Administration, Silver Spring, MD, 20993, USA.
- PMID: 37380877
- PMCID: PMC10403360
- DOI: 10.1038/s41370-023-00574-6
Non-targeted analysis (NTA) and suspect screening analysis (SSA) are powerful techniques that rely on high-resolution mass spectrometry (HRMS) and computational tools to detect and identify unknown or suspected chemicals in the exposome. Fully understanding the chemical exposome requires characterization of both environmental media and human specimens. As such, we conducted a review to examine the use of different NTA and SSA methods in various exposure media and human samples, including the results and chemicals detected. The literature review was conducted by searching literature databases, such as PubMed and Web of Science, for keywords, such as "non-targeted analysis", "suspect screening analysis" and the exposure media. Sources of human exposure to environmental chemicals discussed in this review include water, air, soil/sediment, dust, and food and consumer products. The use of NTA for exposure discovery in human biospecimen is also reviewed. The chemical space that has been captured using NTA varies by media analyzed and analytical platform. In each media the chemicals that were frequently detected using NTA were: per- and polyfluoroalkyl substances (PFAS) and pharmaceuticals in water, pesticides and polyaromatic hydrocarbons (PAHs) in soil and sediment, volatile and semi-volatile organic compounds in air, flame retardants in dust, plasticizers in consumer products, and plasticizers, pesticides, and halogenated compounds in human samples. Some studies reviewed herein used both liquid chromatography (LC) and gas chromatography (GC) HRMS to increase the detected chemical space (16%); however, the majority (51%) only used LC-HRMS and fewer used GC-HRMS (32%). Finally, we identify knowledge and technology gaps that must be overcome to fully assess potential chemical exposures using NTA. Understanding the chemical space is essential to identifying and prioritizing gaps in our understanding of exposure sources and prior exposures. IMPACT STATEMENT: This review examines the results and chemicals detected by analyzing exposure media and human samples using high-resolution mass spectrometry based non-targeted analysis (NTA) and suspect screening analysis (SSA).
Keywords: Chemical space; Environmental media; Exposome; High-resolution mass spectrometry; Non-targeted analysis; Suspect screening analysis.
© 2023. The Author(s).
- Research Support, U.S. Gov't, Non-P.H.S.
- Research Support, N.I.H., Extramural
- Dust / analysis
- Environmental Pollutants* / analysis
- Plasticizers / analysis
- Water / analysis
- Environmental Pollutants
Present-Day Practice of Non-Target Chemical Analysis
- Open access
- Published: 01 June 2022
- volume 77 , pages 537–549 ( 2022 )
You have full access to this open access article
- B. L. Milman 1 , 2 &
- I. K. Zhurkovich 2
Explore all metrics
Cite this article
We review the main techniques, procedures, and information products used in non-target analysis (NTA) to reveal the composition of substances. Sampling and sample preparation methods are preferable that ensure the extraction of analytes from test samples in a wide range of analyte properties with the most negligible loss. The necessary techniques of analysis are versions of chromatography–high-resolution tandem mass spectrometry (HRMS), yielding individual characteristics of analytes (mass spectra, retention properties) to accurately identify them. The prioritization of the analytical strategy discards unnecessary measurements and thereby increases the performance of the NTA. Chemical databases, collections of reference mass spectra and retention characteristics, algorithms, and software for processing HRMS data are indispensable in NTA.
Avoid common mistakes on your manuscript.
Non-target chemical analysis is the determination of components of test samples unknown to the analyst (“known unknowns” and “unknown unknowns,” Table 1 ). In general, the detection of specific analytes belonging to a set of several million common individual compounds is most probable [ 1 , 2 ]. The non-target chemical analysis takes an increasingly important place in modern scientific research of chemists and biochemists and the practical activities of technologists and engineers. Three factors determine this trend. The first reflects the growing need in such analytical determinations associated with the emerging environmental pollutants, more profound control of food quality, a consistent increase in attention to human health, etc. The two other factors enable the implementation of such analyses. Here, we should point out the current high level of the development of analytical methodology, resulting from the development of new types and models of chromatographs and mass spectrometers and new options for extracting compounds from test samples. Finally, we should note the rapid development of computer science, accompanied by improved characteristics of computers and their networks; the emergence of new databases; the creation of algorithms and corresponding software that enable the efficient manipulation with large volumes of the obtained data and reference information. These factors caused a sharp increase in the number of publications in the field of NTA: more than half of scientific articles have been published in the last 5 years ( Fig. 1 ).
Dynamics of the number of publications in the field of non-target chemical analysis. Data as of the beginning of 2021 assessed by summing up the number of articles and other documents found in the Google Scholar system for various English terms denoting this type of analysis (see Table 1 ).
The NTA methodology, used in almost all fields of chemical analytics, was considered in numerous reviews often associated with individual types of test samples ( Table 2 ). The techniques and methods of NTA are constantly being improved, and it makes sense to capture the current level of its general development, which is typical for most objects of analysis. Such general characteristics of the NTA are considered in this article as a concise review. Along with the techniques of analysis, sample preparation, and processing of information, the issues of the performance of NTA and the level of its errors are discussed. However, there is no way to estimate such errors reliably in most cases. The correctness of the NTA results can often be judged if only modern good practice has been implemented. This means that the corresponding work includes the main necessary stages of analysis, instruments, software, and databases, which are briefly considered in this review.
The review deals with low-molecular weight compounds; the most significant publications of recent years, containing references to previous studies, are predominantly cited. We note new guidelines for carrying out NTA in specific scientific fields [ 16 , 18 , 21 ]; these publications are of particular significance in starting work in this analytics field.
SUMMARY OF NON-TARGET ANALYSIS
In the broadest sense, NTA includes the determination of all components of a sample the composition of which is unknown to the analyst prior to the experiment. Prioritization (see below) may correct the number and nature of the compounds to be determined. The NTA stages formally coincide for most test samples ( Fig. 2 ). The result of the analysis covers compounds found and identified in the sample. Quantitative determinations are not usually associated with NTA but may follow the identification of analytes. Absolute quantitation requires suitable analytical methods and reference materials. Semiquantitative determination is possible in using compounds with a similar structure as standards or when assessing the relative sensitivity of techniques to different compounds [ 12 ].
Schematic representation of NTA. (a) Sample; analytes (c) are separated by extraction (b) and injected into the chromatograph-mass spectrometer (d). The operation of the instrument is controlled by a computer (e) equipped with numerous programs (f) and linked to various databases (g). Analytical standards (h) are provided for the final identification of analytes.
The main techniques of NTA are various variants of chromatography–mass spectrometry (CMS). A combination of gas chromatography and electron ionization mass spectrometry (GC–EI-MS) using one quadrupole mass analyzer is indispensable for volatile compounds. In the determination of nonvolatile analytes, including most biologically important compounds, high-performance or ultra-performance liquid chromatography (HPLC/UPLC) is used in combination with high-resolution tandem mass spectrometry (HRMS 2 ); electrospray is used as an ionization device (electrospray ionization, ESI). Other versions of mass spectrometry are also helpful but, as a rule, less advantageous. Recently, a valuable addition to CMS, ion mobility spectrometry, has been developed [ 22 ]. In the detection and structure elucidation of new compounds not described in the scientific publications (“unknown unknowns”), it is advisable to separate them from the test samples and use NMR spectroscopy for identification (in addition to MS).
Non-target chemical analysis is predominantly concerned with organic (bio-organic) compounds; this subject of NTA is mainly considered in the review. Nevertheless, it may also include the determination of elements if the molecular form of their presence is considered (speciation analysis); in this case, a combination of HPLC and inductively coupled plasma mass spectrometry (ICP–MS) is used [ 23 ]. Other types of “inorganic” NTA are significant, rather, in a historical sense.
PERFORMANCE OF NON-TARGET ANALYSIS
We propose to define the indicator in the title of the section as the proportion of compounds detected and/or correctly identified by NTA. For screening, the performance of NTA can be expressed by the value of P s and the following equation:
The screening result consists of the detected components of the test sample and of a preliminary conclusion about their structures. A “full-fledged” NTA, required to identify the components of the sample as completely and reliably as possible, is characterized by the P NTA value,
In general, it is difficult to estimate the P s and P NTA values, because the composition of the test sample and the number of the components (the denominator of the fractions) are unknown. The numerator of these fractions is easier to calculate, although the reliability of the corresponding identification may be unclear (see below). P NTA can be estimated in a model situation, for example, using special artificial mixtures of compounds of interest for the field of analysis under consideration [ 24 , 25 ] and estimating the number of the identified and unidentified compounds. The value of P NTA , in this case, is the indicator of true positive results [ 26 , 27 ], which in test experiments with mixtures produced by the U.S. Environmental Protection Agency did not exceed 65% for methods based on HPLC–HRMS 2 [ 24 ].
The quality of these procedures, that is, sampling, transportation, and storage of samples, must exclude any loss of the desired analytes (leading to false-negative results (FN), Table 3 ) or the contamination of a sample with foreign substances (false-positive results, FP). The necessary standard requirements for such procedures are formulated [ 16 , 18 , 21 , 30 ], although it is difficult to assert that they are met in all current studies.
The sampling technique affects the representation of different analytes in the sampled substance and the occurrence of false results ( Table 3 ). In a passive sampling of air or water, the ratio between the analytes may depend on the type of the adsorbent. In single water samples taken by a grab, there is no such discrimination between the components of mixtures [ 7 ].
Foreign compounds (resulting in false-positive determinations) in the test sample are relatively easy to detect in analyzing blank samples (prepared from matrices, materials, solvents, reagents, etc. [ 16 ]). The identification of false negatives is more complicated. It is necessary to establish the loss/decomposition/transformation (including biotransformation) of analytes before or during analysis using internal standards, which is challenging in NTA because analytes are unknown. It is recommended to use additives in the test samples, that is, one compound from each class (group) of expected substances [ 21 ] or a group representing a particular range of physicochemical characteristics of analytes, for example, the n -octanol–water distribution coefficient K ow [ 8 ]. Similarly, test mixtures are prepared to ensure the quality of the analysis [ 18 ].
This necessary procedure is the separation of analytes from the collected samples, accompanied by the preconcentration of the former substances. Most analyzes of liquid (aqueous) samples use liquid–liquid extraction or solid-phase extraction (SPE). Different analytes have nonidentical physicochemical properties; therefore, varied selectivity of these procedures and further losses of analytes are inevitable. Chemical reactions (decomposition, dehydration, oxidation, polymerization) unwittingly accompanying sample preparation procedures lead in some cases to false results. The incompleteness of the chemical procedures included in the sample preparation process also contributes to them ( Table 3 ) [ 16 ].
Water–methanol and water–acetonitrile mixtures were used as extractants for blood, blood plasma, and other biological matrices, ensuring close recoveries of many metabolites. By changing the composition of extractants, one could alternately extract and then determine the components of biological matrices with different polarities [ 18 ]. The selection of ternary systems as extractants for blood plasma was also promising; for example, using an acetonitrile–isopropanol–water mixture for extracting polar and medium-polar analytes in the range of 25 orders of magnitude of their K ow values [ 17 ]. In biological matrices, rapid transformations (enzymatic reactions) are inevitable, which in one way or another affect the recovery of the analyte [ 18 ]. These processes are stopped by freezing samples or by adding cold solvents.
Homogenization, another procedure implemented at the beginning of the preparation of solid samples, can affect the analysis results if a representative sample is not formed [ 21 ]. In these cases, samples are ground in special mills, with special measures taken to suppress the activity of enzymes to the components of samples of plant and animal origin. Analytes are extracted from such ground samples, for example, using the popular QuEChERS procedure [ 11 , 12 ].
In some cases, at relatively high concentrations of analytes, a direct analysis (direct injection into a chromatograph) of liquid samples, such as wastewater and urine, can be performed without significant loss of many analytes. In such cases, the preliminary dilution of the initial samples (dilute-and-shoot methods) is often practiced [ 8 ]. Nevertheless, false-negative results remain very likely as well. A comparison of such procedures and SPE demonstrated that in the latter case, a more significant number of polar compounds could be detected in water than when relatively large volumes (0.5–1 mL) of water sample are directly injected into a liquid chromatograph [ 9 ].
As in target analysis, chromatography is the primary separation method in NTA, although the use of capillary electrophoresis [ 31 ] and ion mobility spectrometry (in addition to liquid chromatography) [ 32 ] can be successful in some situations. The incomplete chromatographic separation of analytes distorts the retention parameters and mass spectra, leading to errors in identifying major analytes and the loss of minor components of the test mixtures ( Table 3 ). Complex samples containing tens, hundreds, thousands of components are the most difficult to analyze because the corresponding chromatograms contain many peaks, a significant part of which overlap to some extent ( Fig. 3 ). Therefore, the deconvolution procedure is implemented in chromatography–mass spectrometry, that is, separation into individual chromatograms corresponding to individual components of mixtures, using their mass spectrometric signals ( Fig. 4 ; for software, see [ 16 ]).
Example of a complex chromatogram of a brain tissue sample; UPLC–HRMS 2 ; horizontal axis, retention time (min). Adapted from [ 33 ].
Example of deconvolution of a chromatographic peak in GC–MS. Mass spectra: horizontal axis, m / z values; vertical axis, relative intensity (%). The complex chromatographic peak is divided into three Gaussian signals. The second and third mass spectra are very similar. Reference retention parameters and, ultimately, analytical standards are needed to distinguish between these analytes. An alternative is the use of MS 2 and HRMS. Adapted from [ 34 ].
Two main types of chromatography—gas and liquid—are used to determine volatile and nonvolatile compounds, respectively.
In GC, nonvolatile compounds are somehow lost. Derivatization (silylation and other reactions) can improve the chances for their determination, although incomplete derivatization or evaporation of the most volatile derivatives can lead to false-negative results [ 16 ]. Two-dimensional gas chromatography (GC–GC) results in a more efficient separation of mixtures and solves some basic analytical problems of qualitative analysis II [ 27 ], namely, identification (characterization, authentication, etc.) of the test samples themselves [ 35 , 36 ]. The conventional combination of GC–MS 1 has been supplemented in recent years by GC–GC with more advanced versions of mass spectrometry: HRMS and MS 2 [ 35 , 36 ].
Reversed-phase (RP) liquid chromatography is conventionally used to determine nonvolatile compounds, including a version that ensures better separation (RP–UPLC). In recent years, hydrophilic interaction liquid chromatography (HILIC) has gained popularity, demonstrating better separation characteristics in the case of highly polar analytes compared to RP-HPLC/UPLC, where these analytes are not retained by the column [ 17 , 18 ]. Nevertheless, HILIC is characterized by more significant variations in retention times and a more frequent manifestation of matrix effects, demonstrating the ionization suppression of individual analytes in a mass spectrometer, which is typical for CMS [ 18 ]. This can lead to false-negative results due to the poor prediction of the retention times or elution order of analytes and insignificant mass spectrometric signals, respectively.
A mass spectrometer is the primary instrument for identifying analytes in NTA [ 26 , 27 ]. Determining “known unknowns” volatile compounds is a relatively simple task solved by comparing experimental EI mass spectra with corresponding reference/library spectra. The spectra of this type are pretty well reproduced and recorded for the vast majority of common volatile compounds (see below).
For nonvolatile compounds, the greatest possibilities of NTA are ensured by high-resolution mass spectrometry and tandem mass spectrometry, primarily in the V/UPLC–ESI-HRMS 2 combination. In this tandem, the first mass spectrometer is a quadrupole mass analyzer or an ion trap, and the second is a time-of-flight mass analyzer or a Orbitrap ion trap. Between two mass analyzers, there is a cell of collisions of ions (precursor ions) with a gas target, where fragment ions are formed necessary for identifying analytes.
Note that the highest mass resolution is achieved in ion cyclotron resonance instruments [ 37 ], which are inaccessible to most chemical laboratories due to their high cost. Orbitrap technique and methodology have recently been referred to as “high-resolution accurate-mass mass spectrometry” [ 38 ]. The use of mass spectrometers with the usual “unity” resolution can, to some extent, lead to success [ 39 ] in solving less complex NTA problems. Various analytical methods are based on a combination of different variants of chromatography (RP or HILIC, HPLC or UPLC) and mass spectrometry (different mass analyzers, positive or negative ions, etc.).
Those high-resolution instruments ensure the accuracy of measuring the masses ( m / z values) of precursor ions at the parts-per-million level, while that of fragment ions is somewhat worse. Precursor ions are sampled from the full set of their peaks in the MS 1 spectra (data-independent acquisition) or can be preset for the expected compounds (data-dependent acquisition). The superposition of some mass peaks of precursor ions may require their deconvolution and separation of MS 2 spectra [ 40 ]. One should keep in mind that, in the ESI process, analyte molecules form both protonated molecules of the main isotopic forms and adduct ions, including cationized molecules and charged particles of other isotopic forms (“isotopologues”). The exclusion of the latter ions from consideration (“componentization” [ 8 ]) significantly simplifies the processing of chromatography–mass-spectrometric data. For identification, one often resorts to comparing experimental spectra with reference ones. Instruments incorporating conventional ion traps result in MS 2 spectra less comparable with spectra recorded using other types of tandem mass spectrometers [ 27 , 41 ].
The nature of false results, which may be a consequence of the use of MS, is reflected in Table 3 (see also [ 18 ]).
The trueness and reliability of identification are determined by the compliance of the obtained data with particular criteria. In target analysis, these criteria based on chromatographic and mass spectrometric data are well defined [ 26 , 27 , 42 ]. In NTA, the situation is much less clear.
The most reliable identification with obtaining true-positive results is achieved by (a) the matching of chromatographic and mass peaks in the co-analysis of a sample containing the analyte and the corresponding analytical standard [ 26 , 27 , 42 , 43 ]. (b) The identification based on the similarity of the experimental and reference mass spectra and chromatographic retention parameters is in second place in terms of reliability. The similarity can be expressed by its conventional index (see below) or by the degree of closeness of the intensities of several main peaks of the same (differing only within the error) masses. The trueness of identifying volatile compounds from the libraries of reference EI mass spectra is generally ~80%; for nonvolatile analytes and ESI-MS 2 mass spectra, the situation is more uncertain, the proportion of true-positive results varies widely [ 26 , 27 , 41 ].
The second identification method (b) can be as reliable as the first option (a) but under certain conditions: the values of the indicated quantities must be obtained under similar experimental conditions (the same types and models of instruments, similar modes of data acquisition), and these values must be unique for the identified compound. Other methods of identification (interpretation of data, comparison with predicted spectra and chromatographic characteristics [ 27 , 43 , 44 ]) are less reliable but, apparently, applicable to the selection of candidates for identification.
If several such candidate compounds have common substructures, the identification at this stage is called group identification [ 26 , 27 ]. Such a group of compounds can be identified by “molecular networks,” that is, graphs of structurally similar compounds constructed from similar mass spectra [ 45 , 46 ].
The term “identification level” is popular in English publications ( Table 1 ). Its meaning is interpreted in detail in  and clarified in Table 4 with our comments. The identification level correlates with the confidence of identification (proportion of true-positive results).
There are particular quantitative indicators of reliability [ 26 , 27 , 43 ]. These include α- and β-criteria for accepting statistical hypotheses in considering identification as a procedure for testing them. The concept of identification points, matching values of the measured mass spectrometric and chromatographic quantities, taking into account their different significance, is also popular. The indices of the similarity of mass spectra, such as the point function or the probability of their matching, can also be considered as a particular measure of reliability. Recently, a general reliability scale has been proposed that considers identification levels, the degree of matching of retention characteristics, and the number of identification points [ 47 ].
Several types of information products are indispensable for carrying out NTA.
Mass spectral libraries are primary sources of reference information. The largest of them are presented in Table 5 . The situation is relatively good with the EI-MS 1 spectra of volatile compounds: the libraries contain most of the known and most important compounds of this class. Libraries of MS 2 spectra and even HRMS 2 spectra, predominantly related to nonvolatile analytes, that is, most biologically essential compounds, began to be created much later. The mass spectra of many compounds are not available in the libraries, and the MS 2 spectra as a whole are not reproduced well enough; they depend on the type of tandem mass spectrometer and the collision energy leading to the fragmentation of precursor ions [ 26 , 27 , 41 ]. The need to upgrade and improve the quality of tandem mass spectra libraries is widely recognized [ 41 , 43 ].
Retention parameter databases. The NIST database includes 447 285 gas chromatographic retention indices (RI) for 139 693 compounds [ 49 ]. In HPLC/UPLC, the concept of retention indices is less applicable than in GC, but related works and methods for estimating these quantities are emerging [ 55 ].
Chemical databases. The largest databases containing information about chemical compounds are listed in Table 6 . The information contained in them supplementing the experimental CMS data is called a priori or meta information. It is practical and even necessary in selecting and ranking compounds, candidates for identification.
In using these databases, three circumstances should be considered. First, the ChemSpider database allows searching for molecular formulas using the experimentally found masses of molecular ions (MS 1 ) or protonated (cationized) molecules (MS 2 ). Second, these databases enable estimation of the popularity/abundance of chemical compounds by the number of sources of information about them, the amount of such meta information, etc. [ 1 , 2 , 27 ]. In testing identification hypotheses [ 26 , 27 ], starting with the most prevalent compounds (ceteris paribus) makes sense. Third, the information available in the database on the preparation and properties of chemical compounds and their presence in various objects is also helpful in selecting candidates for identification.
Prediction of mass spectra and retention parameters. Several methods, algorithms, and corresponding software for predicting mass spectra ( in silico mass spectra) are based on machine learning, heuristics (ion fragmentation rules), combinatorics (enumeration and estimation of the probabilities of the appearance of various combinations of atoms of the initial ion), quantum chemical calculations, and mixed principles [ 59 ]. On average, the results of spectrum predictions turn out to be moderately correct. For example, we studied the possibilities of distinguishing structural isomers using one of machine learning methods. The rate of true-positive results in comparing the predicted and experimental spectra was ~50–60% [ 44 ], which is far from the worst result in the considered area of calculations [ 59 ]. Nevertheless, methods for predicting mass spectra are rapidly developing, and one can expect an improvement in their efficiency. Even now, this methodology can be used in NTA for (a) calculating the mass spectra of candidates for identification, which are selected by the mass of precursor ions and search in chemical databases, and (b) discarding the most dissimilar in silico spectra.
Despite an extensive collection of experimental GC retention indices (see above) is available, some studies are underway to predict these indicators as well, at least to test the effectiveness of computational methods. The application of the machine learning methodology enables calculating their values with satisfactory accuracy [ 60 ]. Similar prediction algorithms have been used for calculating HPLC retention indices; the obtained data did not have substantial independent significance for identification but still improved its results based on the prediction of mass spectra [ 55 ].
In the case of liquid chromatography (LC), relative retention times are more often predicted rather than retention indices [ 61 – 63 ]. Some prediction results are rather satisfactory. Predictions of relative retention indices for 80 000 compounds included in the METLIN database ( Table 5 ) show that in 70% of cases, the corresponding analytes are among the three most probable candidates for identification [ 62 ].
Various software. Chromatography–mass spectrometry data processing programs are indispensable in analyzing complex samples that yield numerous chromatographic peaks. Good NTA practice implies carrying out the following procedures in automatic mode [ 4 , 6 , 8 , 16 , 40 , 64 , 65 ]:
• Deconvolution of chromatographic peaks with separation of signals of individual components and their mass spectra;
• Filtration of mass peaks to remove background, weak peaks, and outliers;
• Annotations of peaks in mass spectra: assigning mass values and even chemical structures of corresponding ions derived from exact masses ( m / z values) and isotopic pattern, to peaks;
• Comparison of mass spectra and retention characteristics with the corresponding reference data; evaluation of their similarity;
• Formation of in-house libraries of mass spectra;
• Mutual adjustment of different chromatograms in retention times and/or masses of ions of reference compounds for comparing different samples.
Commercially produced equipment is equipped with appropriate software; programs are also supplied by other companies or organizations [ 16 , 64 ].
Algorithms and software for multivariate statistical analysis (chemometry) should also be mentioned, which help group and classify studied samples, for example, food samples, based on NTA results [ 10 , 12 , 66 ].
Numerous innovations in analytical instrumentation and computer science have made it possible to determine many dozens, hundreds, and even thousands of organic (bioorganic) compounds simultaneously, including those unknown to the analyst before the experiment, in the most complex matrices (biological and medical objects, food, environmental objects, etc.). Among analytical instruments, high-resolution tandem mass spectrometers coupled with chromatographs, which have entered analytical laboratories in a significant number in the last 10–15 years, have become paramount. Simultaneously, significant progress has been made in informatics, which has led to the emergence of large databases and new software for processing CMS data. Advances in instrumentation and computer science materialized into an explosive growth of works in the field of NTA.
Two aspects of these publications should be highlighted. Many relevant subject reviews were devoted to particular methods and/or objects of analysis. A review of these works makes it possible to outline the general practice of NTA, to identify its standard techniques, the use of specific methods of sample preparation, analysis, extraction, and processing of information. The other part of the discussed scientific publications is that they reflect a new, fairly complete analysis of a variety of objects in all their geographical, biological, natural, industrial, etc. diversity. Such studies using the NTA methodology are ongoing, and new evidence is expected regarding the previously unknown composition of substances. The new data obtained in recent years and those expected, probably, need a broad generalization, which is of interest to various chemists.
Milman, B.L. and Zhurkovich, I.K., Molecules , 2021, vol. 26, no. 8, p. 2394. https://doi.org/10.3390/molecules26082394
Article CAS PubMed PubMed Central Google Scholar
Mil’man, B.L. and Zhurkovich, I.K., Analitika , 2020, vol. 10, no. 6, p. 464. https://doi.org/10.22184/2227-572X.2020.10.6.464.469
Article Google Scholar
Schymanski, E.L., Singer, H.P., Slobodnik, J., Ipolyi, I.M., Oswald, P., Krauss, M., Schulze, T., Haglund, P., Letzel, T., Grosse, S., Thomaidis, N.S., Bletsou, A., Zwiener, C., Ibanez, M., Portolэys, T., De Boer, R., Reid, M.J., Onghena, M., Kunkel, U., Schulz, W., Guillon, A., Noyon, N., Leroy, G., Bados, P., Bogialli, S., Stipaničev, D., Rostkowski, P., and Hollender, J., Anal. Bioanal. Chem. , 2015, vol. 407, no. 21, p. 6237. https://doi.org/10.1007/s00216-015-8681-7
Article CAS PubMed Google Scholar
Hollender, J., Schymanski, E.L., Singer, H.P., and Ferguson, P.L., Environ. Sci. Technol. , 2017, vol. 51, no. 20, p. 11505. https://doi.org/10.1021/acs.est.7b02184
Ccanccapa-Cartagena, A., Pico, Y., Ortiz, X., and Reiner, E.J., Sci. Total Environ. , 2019, vol. 687, p. 355. https://doi.org/10.1016/j.scitotenv.2019.06.057
Ljoncheva, M., Stepišnik, T., Džeroski, S., and Kosjek, T., Trends Environ. Anal. Chem. , 2020, vol. 28, e00099. https://doi.org/10.1016/j.teac.2020.e00099
Article CAS Google Scholar
Menger, F., Gago-Ferrero, P., Wiberg, K., and Ahrens, L., Trends Environ. Anal. Chem. , 2020, vol. 28, e00102. https://doi.org/10.1016/j.teac.2020.e00102
Schulze, B., Jeon, Y., Kaserzon, S., Heffernan, A.L., Dewapriya, P., O’Brien, J., Ramos, M.J.G., Gorji, S.G., Mueller, J.F., Thomas, K.V., and Samanipour, S., TrAC, Trends Anal. Chem. , 2020, vol. 133, 116063. https://doi.org/10.1016/j.trac.2020.116063
Kutlucinar, K.G. and Hann, S., Electrophoresis , 2021, vol. 42, no. 4, p. 490. https://doi.org/10.1002/elps.202000256
Riedl, J., Esslinger, S., and Fauhl-Hassek, C., Anal. Chim. Acta , 2015, vol. 885, p. 17. https://doi.org/10.1016/j.aca.2015.06.003
Knolhoff, A.M. and Croley, T.R., J. Chromatogr. A , 2016, vol. 1428, p. 86. https://doi.org/10.1016/j.chroma.2015.08.059
Fisher, C.M., Croley, T.R., and Knolhoff, A.M., TrAC, Trends Anal. Chem. , 2021, vol. 136. https://doi.org/10.1016/j.trac.2021.116188
Chen, C., Wohlfarth, A., Xu, H., Su, D., Wang, X., Jiang, H., Feng, Y., and Zhu, M., Anal. Chim. Acta , 2016, vol. 944, p. 37. https://doi.org/10.1016/j.aca.2016.09.034
Oberacher, H. and Arnhard, K., TrAC, Trends Anal. Chem. , 2016, vol. 84, p. 94. https://doi.org/10.1016/j.trac.2015.12.019
Mollerup, C.B., Dalsgaard, P.W., Mardal, M., and Linnet, K., Drug Test. Anal. , 2017, vol. 9, no. 7, p. 1052. https://doi.org/10.1002/dta.2120
Mastrangelo, A., Ferrarini, A., Rey-Stolle, F., Garcia, A., and Barbas, C., Anal. Chim. Acta , 2015, vol. 900, p. 21. https://doi.org/10.1016/j.aca.2015.10.001
Cajka, T. and Fiehn, O., Anal. Chem. , 2016, vol. 88, no. 1, p. 524. https://doi.org/10.1021/acs.analchem.5b04491
Pezzatti, J., Boccard, J., Codesido, S., Gagnebin, Y., Joshi, A., Picard, D., Gonzalez-Ruiz, V., and Rudaz, S., Anal. Chim. Acta , 2020, vol. 1105, p. 28. https://doi.org/10.1016/j.aca.2019.12.062
Hubert, J., Nuzillard, J.M., and Renault, J.H., Phytochem. Rev ., 2017, vol. 16, no. 1, p. 55. https://doi.org/10.1007/s11101-015-9448-7
Aydoğan, C., Anal. Bioanal. Chem. , 2020, vol. 412, no. 9, p. 1973. https://doi.org/10.1007/s00216-019-02328-6
Caballero-Casero, N., Belova, L., Vervliet, P., Antignac, J.P., Castaño, A., Debrauwer, L., López, M.E., Huber, C., Klanova, J., Krauss, M., Lommen, A., Mol, H.G.J., Oberacher, H., Pardo, O., Price, E.J., Reinstadler, V., Vitale, C.M., Van Nuijs, A.L.N., and Covaci, A., TrAC, Trends Anal. Chem. , 2021, vol. 136, 116201. https://doi.org/10.1016/j.trac.2021.116201
Mairinger, T., Causon, T.J., and Hann, S., Curr. Opin. Chem. Biol ., 2018, vol. 42, p. 9. https://doi.org/10.1016/j.cbpa.2017.10.015
Lorenc, W., Hanć, A., Sajnóg, A., Barałkiewicz, D., Mass Spectrom. Rev. , 2020, no. 1, p. 32. https://doi.org/10.1002/mas.21662
Sobus, J.R., Grossman, J.N., Chao, A., Singh, R., Williams, A.J., Grulke, C.M., Richard, A.M., Newton, S.R., McEachran, A.D., and Ulrich, E.M., Anal. Bioanal. Chem. , 2019, vol. 411, no. 4, p. 835. https://doi.org/10.1007/s00216-018-1526-4
Knolhoff, A.M., Premo, J.H., and Fisher, C.M., Anal. Chem. , 2021, vol. 93, no. 3, p. 1596. https://doi.org/10.1021/acs.analchem.0c04036
Mil’man, B.L., Vvedenie v khimicheskuyu identifikatsiyu (Introduction to Chemical Identification), St. Petersburg: VVM, 2008. 180 s.
Milman, B.L., Chemical Identification and Its Quality Assurance , Berlin: Springer, 2011.
Book Google Scholar
Milman, B.L. and Zhurkovich, I.K., J. Anal. Chem. , 2020, vol. 75, no. 4, p. 443. https://doi.org/10.1134/S1061934820020124
NORMAN Database System. http://www.norman-network.com/nds. Accessed June 5, 2021.
Stevens, V.L., Hoover, E., Wang, Y., and Zanetti, K.A., Metabolites , 2019, vol. 9, no. 8, p. 156. https://doi.org/10.3390/metabo9080156
Article CAS PubMed Central Google Scholar
García, A., Godzien, J., López-Gonzálvez, Á., and Barbas, C., Bioanalysis , 2017, vol. 9, no. 1, p. 99. https://doi.org/10.4155/bio-2016-0216
Mairinger, T., Causon, T.J., and Hann, S., Curr. Opin. Chem. Biol. , 2018, vol. 42, p. 9. https://doi.org/10.1016/j.cbpa.2017.10.015
Geng, C., Guo, Y., Qiao, Y., Zhang, J., Chen, D., Han, W., Yang, M., and Jiang, P., Neuropsychiatr. Dis. Treat. , 2019, vol. 15, p. 1939. https://doi.org/10.2147/NDT.S203870
Koek, M.M., Jellema, R.H., van der Greef, J., Tas, A.C., and Hankemeier, T., Metabolomics , 2011, vol. 7, no. 3, p. 307. https://doi.org/10.1007/s11306-010-0254-3
Aspromonte, J., Wolfs, K., and Adams, E., J. Pharm. Biomed. Anal. , 2019, vol. 176, 112817. https://doi.org/10.1016/j.jpba.2019.112817
Franchina, F.A., Zanella, D., Dubois, L.M., and Focant, J.F., J. Sep. Sci. , 2021, vol. 44, no. 1, p. 188. https://doi.org/10.1002/jssc.202000855
Ghaste, M., Mistrik, R., and Shulaev, V., Int. J. Mol. Sci. , 2016, vol. 17, no. 6, p. 816. https://doi.org/10.3390/ijms17060816
Strupat, K., Scheibner, O., and Bromirski, M., High-resolution, accurate-mass orbitrap mass spectrometry-definitions, opportunities, and advantages, Thermo Technical Note, 2013, no. 64287. https://assets.thermofisher.com/TFS-Assets/CMD/Technical-Notes/tn-64287-hram-orbitrap-ms-tn64287-en.pdf. Accessed June 6, 2021.
Alon, T. and Amirav, A., J. Am. Soc. Mass Spectrom. , 2021, vol. 32, no. 4, p. 929. https://doi.org/10.1021/jasms.0c00419
Samanipour, S. and Reid, M.J., Bæk, K., and Thomas, K.V., Environ. Sci. Technol. , 2018, vol. 52, no. 8, p. 4694. https://doi.org/10.1021/acs.est.8b00259
Oberacher, H., Sasse, M., Antignac, J.P., Guitton, Y., Debrauwer, L., Jamin, E.L., Schulze, T., Krauss, M., Covaci, A., Caballero-Casero, N., Rousseau, K., Damont, A., Fenaille, F., Lamoree, M., and Schymanski, E.L., Environ. Sci. Eur ., 2020, vol. 32, no. 1, p. 1. https://doi.org/10.1186/s12302-020-00314-9
Mil’man, B.L. and Zhurkovich, I.K., Anal. Kontrol’ , 2020, vol. 24, no. 3, p. 164. https://doi.org/10.15826/analitika.2020.24.3.003
Milman, B.L., TrAC, Trends Anal. Chem. , 2015, vol. 69, p. 24. https://doi.org/10.1016/j.trac.2014.12.009
Milman, B.L., Ostrovidova, E.V., and Zhurkovich, I.K., Mass Spectrom. Lett ., 2019, vol. 10, no. 3, p. 93. https://doi.org/10.5478/MSL.2019.10.3.93
Global Natural Products Social Molecular Networking. https://gnps.ucsd.edu/ProteoSAFe/static/gnps-splash.jsp. Accessed June 6, 2021.
Vincenti, F., Montesano, C., Di Ottavio, F., Gregori, A., Compagnone, D., Sergi, M., and Dorrestein, P., Front. Chem. , 2020, vol. 8, 572952. https://doi.org/10.3389/fchem.2020.572952
Rochat, B., J. Am. Soc. Mass Spectrom. , 2017, vol. 28, no. 4, p. 709. https://doi.org/10.1007/s13361-016-1556-0
Wiley Registry of Mass Spectral Data, 12th ed. http://www.sisweb.com/software/wiley-registry.htm#1. Accessed June 7, 2021.
The NIST 20 Mass spectral library. http://www.sisweb.com/software/ms/nist.htm#stats. Accessed June 7, 2021.
METLIN. https://metlin.scripps.edu/landing_page. php?pgcontent=mainPage. Accessed June 7, 2021.
MONA – MassBank of North America. https://mona.fiehnlab.ucdavis.edu/spectra/statistics?tab=0. Accessed June 7, 2021.
MassBank. https://massbank.eu/MassBank/Contents. Accessed June 7, 2021.
mzCloud. http://www.mzcloud.org. Accessed June 7, 2021.
The human metabolome database (HMDB). https://hmdb.ca. Accessed June 7, 2021.
Samaraweera, M.A., Hall, L.M., Hill, D.W., and Grant, D.F., Anal. Chem ., 2018, vol. 90, no. 21, p. 12752. https://doi.org/10.1021/acs.analchem.8b03118
CAS. http://www.cas.org/about/cas-content. Accessed June 7, 2021.
PubChem. https://pubchem.ncbi.nlm.nih.gov. Accessed June 7, 2021.
ChemSpider. http://www.chemspider.com. Accessed June 7, 2021.
Krettler, C.A. and Thallinger, G.G., Briefings Bioinf ., 2021, vol. 22, no. 6, bbab073. https://doi.org/10.1093/bib/bbab073
Matyushin, D.D. and Buryak, A.K., IEEE Access , 2020, vol. 8, p. 223140. https://doi.org/10.1109/ACCESS.2020.3045047
McEachran, A.D., Mansouri, K., Newton, S.R., Beverly, B.E., Sobus, J.R., and Williams, A.J., Talanta , 2018, vol. 182, p. 371. https://doi.org/10.1016/j.talanta.2018.01.022
Domingo-Almenara, X., Guijas, C., Billings, E., Montenegro-Burke, J.R., Uritboonthai, W., Aisporna, A.E., Chen, E., Benton, H.P., and Siuzdak, G., Nat. Commun. , 2019, vol. 10, 5811. https://doi.org/10.1038/s41467-019-13680-7
Witting, M. and Bocker, S., J. Sep. Sci. , 2020, vol. 43, nos. 9–10, p. 1746. https://doi.org/10.1002/jssc.202000060
Kind, T., Tsugawa, H., Cajka, T., Ma, Y., Lai, Z., Mehta, S.S., Wohlgemuth, G., Barupal, D.K., Showalter, M.R., Arita, M., and Fiehn, O., Mass Spectrom. Rev. , 2018, vol. 37, no. 4, p. 513. https://doi.org/10.1002/mas.21535
Helmus, R., Ter, LaakT.L., Van Wezel, A.P., De Voogt, P., and Schymanski, E.L., J. Cheminf ., 2021, vol. 13, no. 1, 1. https://doi.org/10.1186/s13321-020-00477-w
Cavanna, D., Righetti, L., Elliott, C., and Suman, M., Trends Food Sci. Technol ., 2018, vol. 80, p. 223. https://doi.org/10.1016/j.tifs.2018.08.007
Authors and affiliations.
Institute of Experimental Medicine, 197376, St. Petersburg, Russia
B. L. Milman
Clinical Research Center of Toxicology, Federal Medical-Biological Agency of Russia, 192019, St. Petersburg, Russia
B. L. Milman & I. K. Zhurkovich
You can also search for this author in PubMed Google Scholar
Correspondence to B. L. Milman .
The author declares that they have no conflicts of interest.
Translated by O. Zhukova
Rights and permissions
Open Access. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .
Reprints and Permissions
About this article
Milman, B.L., Zhurkovich, I.K. Present-Day Practice of Non-Target Chemical Analysis. J Anal Chem 77 , 537–549 (2022). https://doi.org/10.1134/S1061934822050070
Received : 19 June 2021
Revised : 08 August 2021
Accepted : 09 August 2021
Published : 01 June 2022
Issue Date : May 2022
DOI : https://doi.org/10.1134/S1061934822050070
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- non-target analysis
- mass spectrometry
- Find a journal
- Publish with us
- View all journals
- My Account Login
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Review Article
- Open access
- Published: 29 December 2017
Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA
- Jon R. Sobus 1 ,
- John F. Wambaugh 2 ,
- Kristin K. Isaacs 1 ,
- Antony J. Williams 2 ,
- Andrew D. McEachran 3 ,
- Ann M. Richard 2 ,
- Christopher M. Grulke 2 ,
- Elin M. Ulrich 1 ,
- Julia E. Rager 3 nAff4 ,
- Mark J. Strynar 1 &
- Seth R. Newton 1
Journal of Exposure Science & Environmental Epidemiology volume 28 , pages 411–426 ( 2018 ) Cite this article
Tens-of-thousands of chemicals are registered in the U.S. for use in countless processes and products. Recent evidence suggests that many of these chemicals are measureable in environmental and/or biological systems, indicating the potential for widespread exposures. Traditional public health research tools, including in vivo studies and targeted analytical chemistry methods, have been unable to meet the needs of screening programs designed to evaluate chemical safety. As such, new tools have been developed to enable rapid assessment of potentially harmful chemical exposures and their attendant biological responses. One group of tools, known as “non-targeted analysis” (NTA) methods, allows the rapid characterization of thousands of never-before-studied compounds in a wide variety of environmental, residential, and biological media. This article discusses current applications of NTA methods, challenges to their effective use in chemical screening studies, and ways in which shared resources (e.g., chemical standards, databases, model predictions, and media measurements) can advance their use in risk-based chemical prioritization. A brief review is provided of resources and projects within EPA’s Office of Research and Development (ORD) that provide benefit to, and receive benefits from, NTA research endeavors. A summary of EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT) is also given, which makes direct use of ORD resources to benefit the global NTA research community. Finally, a research framework is described that shows how NTA methods will bridge chemical prioritization efforts within ORD. This framework exists as a guide for institutions seeking to understand the complexity of chemical exposures, and the impact of these exposures on living systems.
The last decade has witnessed pronounced transformations in approaches for linking chemical exposures to human and ecological health. Toxicity testing methods that support chemical safety evaluations have evolved rapidly, ushering in an era defined by high-throughput screening (HTS) and chemical prioritization [ 1 , 2 ]. Two US-based testing programs—the Toxicity Testing in the 21st Century (Tox21) Federal Consortium and the EPA Toxicity Forecaster (ToxCast) project—have together evaluated over 8000 chemical substances across hundreds of bioassays [ 3 , 4 , 5 ]. Efforts are underway to map the derived bioactivity data to key events along adverse outcome pathways (AOPs) in support of 21st century risk assessments and regulatory decisions [ 6 , 7 ]. Risk-based decisions, however, are weakened without quantitative knowledge of exposure, processes that link exposure and target dose, and the impact of target dose on AOPs [ 8 , 9 , 10 , 11 ]. Noting this challenge, the exposure science community has mirrored recent advances in toxicity testing, developing both predictive and empirical methods for rapid acquisition of chemical exposure data [ 8 , 9 , 12 , 13 ]. Many measurement-based methods are borne out of successes in the metabolomics field. For example, high-resolution mass spectrometry (HRMS), a common metabolomics tool, now allows rapid characterization of hundreds to thousands of compounds in a given environmental (e.g., surface water), residential (e.g., house dust), or biological (e.g., serum) sample. Whereas metabolomics has mostly eyed endogenous compounds, the emerging field of “exposomics” has broadened the analytical focus to include xenobiotic compounds [ 14 , 15 ]. Popular open metabolomics databases, in fact, are expanding to include large lists of man-made compounds, as well as known and predicted metabolites of xenobiotics and naturally-occurring compounds [ 16 , 17 , 18 , 19 ]. Furthermore, software developers are adapting existing tools, and developing new tools, to better meet the needs of the growing exposomics community [ 20 , 21 ]. In time, these adaptations will enable fully integrated research workflows that seamlessly bridge empirical knowledge of stressors and biological adaptations to those stressors [ 10 , 18 ].
The concept of the “exposome” was introduced in 2005 by Dr. Christopher Wild as a way to represent all life-course environmental exposures from the prenatal period onwards [ 22 ]. Since that time, exposomics, like any nascent field, has evolved in concept, definition, and practice. While multiple definitions now exist, it is generally agreed upon that the exposome represents the totality of exposures experienced by an individual (human or other), and that these exposures reflect exogenous and endogenous stressors originating from chemical and non-chemical sources [ 23 , 24 ]. By definition, chemical components of the exposome are measureable in media with which a receptor comes into contact. For humans, these media include—but are not limited to—food, air, water, consumer products (e.g., lotions), articles (e.g., clothing), house dust, and building materials. Biological media further offer a window into the exposome, and have been a focus of many analytical efforts [ 25 , 26 , 27 , 28 ].
In most instances, analytical chemistry-based exposome research has moved away from “targeted” methods and towards suspect screening analysis (SSA) and non-targeted analysis (NTA) methods. Suspect screening studies are those in which observed but unknown features (generally defined in HRMS experiments by an accurate mass, retention time [RT], and mass spectrum) are compared against a database of chemical suspects to identify plausible hits [ 21 , 29 ]. True NTA (also called “untargeted”) studies are those in which chemical structures of unknown compounds are postulated without the aid of suspect lists [ 21 , 29 ]. While clear differences exist in the methods used for SSA and NTA, the term “non-targeted analysis” is commonly used to describe both SSA and NTA experiments. As such, the abbreviation “NTA” is used here in a general sense to describe this entire genre of research. Within this NTA realm, emphasis is generally placed on characterizing compounds that are unknown or poorly studied, and, more importantly, on examining compounds that are significantly related to an exposure source (environmental forensics), health status, or some other measure of interest. NTA studies are gaining in popularity [ 30 ], but the rapid and accurate characterization of large suites of chemical unknowns remains challenging. Appropriate resources and efficient methods must therefore be identified to propel NTA methods away from a niche field and into mainstream public health laboratories.
EPA’s Office of Research and Development (ORD) has pioneered many HTS strategies for toxicity testing, exposure forecasting, and risk-based prioritization over the past decade. In support of these efforts, EPA’s ToxCast project, administered within the National Center for Computational Toxicology (NCCT), has procured and manages a rich library of individual chemicals [ 5 ]. NCCT further develops, curates, and manages databases and dashboards that house information on these and many other compounds of relevance to environmental health. Whereas these collective tools are the basis for EPA’s HTS activities (designed to potentially inform regulatory decisions), they have seldom been considered as resources for the exposomics research community, and remain underutilized in NTA experiments. In a recent article [ 31 ], we demonstrated the power of ORD resources for guiding novel NTA workflows. Our pilot-scale study showed that ORD tools can be effectively used to identify, prioritize, and confirm novel compounds in samples of house dust. It further indicated that certain novel compounds (i.e., those never before measured in house dust) are ubiquitous environmental contaminants and likely to activate specific biological pathways. Additional studies have reported similar findings based on analyses of house dust and other media [ 32 , 33 , 34 ]. Together, these studies underscore a limited understanding of the compounds present in our environments. Yet, they also highlight a need for, and clear advantage of, integrating NTA research efforts with those already established to support risk-based chemical prioritization.
The purpose of this article is to provide a clear road map for integrating NTA research with current chemical screening initiatives. The article first discusses NTA methods as tools for discovering the exposome. It then provides a brief history and synopsis of current activities within ORD, with specific emphasis on activities that relate to NTA research. A summary of an EPA-led collaborative trial is then presented, which exploits ORD resources to advance NTA research efforts. A multi-step framework is finally offered, which is being used by EPA scientists to maximize data used in, and knowledge gained from, NTA experiments. The information provided herein will enable NTA practitioners to make greater use of valuable resources that service 21st century chemical testing programs. It will further allow scientists and decision makers to make direct use of NTA data when performing risk-based chemical prioritizations. Together these actions will enable more efficient, comprehensive, and relevant evaluations of chemical safety.
Methods, results, and discussion
Nta as a tool for exposome research.
The concept of the exposome has been in existence for more than a decade. During this period, a number of modified definitions have been proposed to place emphasis on: 1) external vs. internal exposure sources (e.g., the “eco-exposome” [ 8 ] and the “endogenous exposome” [ 35 ]); 2) research applications for specific media (e.g., the “blood exposome” [ 24 , 36 ] and the “tooth exposome” [ 37 ]); and 3) general analytical strategies (e.g., “top-down exposomics” vs. “bottom-up exposomics” [ 23 , 38 ]). Regardless of the definition and application, it is generally agreed that NTA methods are a key to discovering the breadth of all exposures, and more importantly, which exposures are associated with disease. Different portions of the exposome have now been characterized using suites of analytical tools, which range from low resolution gas chromatography mass spectrometry (GC/MS) platforms, to ultra-high resolution Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR/MS) platforms. Many recent investigations have focused on polar organic compounds, which are often readily detected using liquid chromatography (LC) coupled with high resolution Orbitrap or time-of-flight mass spectrometry (TOF/MS) [ 28 ]. Hybrid systems, such as quadrupole-Orbitrap and quadrupole-TOF mass spectrometers (Q-TOF/MS), further enable compound identification using both precursor ion detection in full-scan MS mode, and product ion detection in MS/MS mode. These HRMS hybrid systems are quickly becoming the most commonly used tools in NTA laboratories [ 28 ].
High-resolution MS instruments generate data on thousands of molecular features, which represent unknown compounds generally described in terms of their monoisotopic masses, retention times, and isotope distributions. In some cases, these data are accompanied by fragmentation spectra (via MS/MS analysis) and predicted molecular formulae. The job of the analyst is to proffer chemical structures that are consistent with these observed features. Current guidance recommends binning structures based upon the certainty of assignment [ 39 ]. “Tentative candidates” are proposed structures that are consistent with experimental data, but not necessarily unequivocal top hits. “Probable structures” are those not confirmed with standards but named as top candidates using library spectrum matches and other diagnostic evidence (e.g., RTs associated with a specific method). Finally, “confirmed structures” are those that have been verified using a reference standard.
Multiple tentative candidates can exist for a given molecular feature. As such, it is expected that, for a given NTA experiment, the number of compounds within each bin will be ordered as follows: tentative candidates > probable structures > confirmed structures. Exact ratios across bins vary from lab-to-lab and medium-to-medium based on available resources (e.g., authentic standards), tools (e.g., MS/MS-enabled platforms), and experience/expertise. Yet, it is clear that the number of unknowns will continue to outweigh the number of knowns for the foreseeable future. The goal, then, is to enable knowledge-based ascension, for any feature of interest (e.g., those associated with measures of biological perturbation), from labeling as a tentative candidate, to probable structure, to confirmed compound.
The rise from tentative candidate to probable structure is conditional upon the availability of sufficient diagnostic evidence. Probable structures are generally those that have high-scoring library spectrum matches, relatively large numbers of sources or references in public databases, and predicted retention behavior that is consistent with observations about the unknowns [ 39 ]. A number of open access tools exist for ranking tentative candidates and naming probable structures (e.g., MetFrag [ http://c-ruttkies.github.io/MetFrag/ ] and STOFF-IDENT [ https://www.lfu.bayern.de/stoffident/#!home ]. These tools, as well as those available from instrument vendors, often rely on large public databases (e.g., ChemSpider [ http://www.chemspider.com/ ] and PubChem [ https://pubchem.ncbi.nlm.nih.gov/ ]) for the initial identification of tentative candidates, and subsequent ranking based on data sources/references. Some tools predict and evaluate retention behavior using logP-based or logD-based models that vary in sophistication [ 40 ]. Finally, to enable spectral matching, most tools utilize existing reference spectra, which are available via vendors and open databases (e.g., mzCloud™ [ https://www.mzcloud.org/ ], MassBank [ http://www.massbank.jp/index.html?lang=en ], and MoNA [ http://mona.fiehnlab.ucdavis.edu/ ]), or theoretical spectra, which are generated from fragmentation prediction tools such as CFM-ID ( http://cfmid.wishartlab.com/ ), MetFrag ( http://c-ruttkies.github.io/MetFrag/ ), and MAGMa [ 41 ].
The combination of these approaches has proven successful in the characterization of unknowns in a variety of media. Yet, opportunities exist to further enhance these tools for future investigations. For example, there is a growing need extend screening libraries to include not just known parent chemicals, but predicted metabolites and environmental degradants—compounds which are believed to comprise a substantial portion of the exposome [ 16 ]. Indeed, as stated in a recent review by Escher and colleagues “…a very small number of the thousands of compounds detectable in a sample can actually be identified, leaving the largest fraction of chemicals at the level of a known accurate mass (or molecular formula) and retention time. Any improvements here rely strongly on a better assignment of likely structures… based on a prediction of fragmentation, ionization, or chromatographic retention times supported by more comprehensive mass spectra databases” [ 10 ]. From these statements it is clear that significant improvements to NTA workflows are needed, as are appropriate resources (e.g., chemicals on which to build reference databases and model training sets) that can enable these improvements. The following section details projects and resources within ORD that are now being used, by EPA scientists and the broader scientific community, to enhance NTA methods and workflows.
Highlights from EPA’s Office of Research and Development
High throughput bioactivity screening and the toxcast project.
In 2007, the National Research Council (NRC) of the National Academies of Science (NAS) published “Toxicity Testing in the 21st Century”, a report calling for greater focus on mechanistic (i.e., pathway-based) understanding of toxicity [ 2 ]. At that time, the advent of HTS had enabled the pharmaceutical industry to: 1) rapidly screen many hundreds or thousands of chemicals; 2) screen against targets having greater relevance to humans; and 3) make specific inferences pertaining to the biological pathways involved with toxicity [ 42 ]. In many cases, the potential for bioactivity within human or specific ecological species could be targeted using in vitro methods, along with proteins and cells derived from tissues of the species in question. Noting these advancements, and the recommendations of the NRC, the National Institutes of Health (NIH) National Toxicology Program (NTP), the NIH National Center for Advancing Translational Sciences (NCATS), and EPA formed the Federal Tox21 consortium, which was soon joined by the US Food and Drug Administration (FDA). The goal of this consortium was to use modern HTS approaches to better assess chemical toxicity, especially for many thousands of chemicals for which little or no toxicity data were available [ 1 ]. To date, over 8000 chemical substances (including pharmaceuticals, plasticizers, pesticides, fragrances, and food additives) have been tested, robotically and uniformly at the NCATS intramural testing facility, across over 100 HTS assays (consisting of nuclear receptor target assays and cell-based viability assays) [ 3 ].
The EPA-contributed portion of Tox21 includes more than 3800 unique compounds. Many of these compounds have undergone additional HTS across more than 800 assay endpoints as part of a separate EPA testing program, known as the ToxCast project [ 4 ]. This EPA testing program has expanded in tandem with the Tox21 program, enlisting a number of contract-administered, commercially available assay systems, many of which were originally developed to service the pharmaceutical industry’s drug discovery programs. ToxCast assay technologies span a broad suite of high and medium-throughput screening targets and cell-based systems, and provide for more extensive biological screening of EPA’s ToxCast library, effectively complementing the available Tox21 assays. ToxCast testing has been conducted in phases. The Phase I library included 310 compounds, which were primarily pesticides that have been well characterized by animal toxicity studies, along with small sets of high-priority environmental chemicals (e.g., bisphenol A [BPA]) and toxicologically active metabolites (e.g., mono(2-ethylhexyl)phthalate [MEHP]). Phase II testing examined Phase I chemicals across new assays. It further broadened the chemical library to include more than 700 industrial chemicals, known toxicants and carcinogens, alternative “green” chemicals, food-additives, and failed pharmaceuticals. Phase II testing also included ~800 additional chemicals that underwent limited testing in endocrine-relevant assays only. A rolling “Phase III” is ongoing with the goals of: 1) broadening assay endpoint coverage across the nearly 1800 compounds in the Phase II library, 2) expanding upon the Phase II library with newly added priority chemicals, and 3) applying strategic testing to the larger EPA Tox21 library [ 5 ].
ToxCast HTS is typically conducted in concentration-response format, with statistical analysis used to estimate the concentration of chemical needed to cause bioactivities in any given assay [ 4 ]. Many chemical-assay combinations are inactive at even the highest tested concentration [ 43 , 44 ]. Those assays that show systematic response with concentration are referred to as “hits”, with a portion of assay hits occurring at concentrations below ranges of cytotoxicity. A series of statistically-derived and biologically-derived models for predicting in vivo effects have been developed using ToxCast HTS hits as predictors, and archival in vivo animal studies as evaluation data. To date, some pathways are better covered than others due to available technologies and EPA priorities (e.g., there are 18 assays that indicate activity related to estrogen receptor alpha (ERα) activation [ 45 ]). ToxCast assay results have been made publically available by multiple means at the conclusion of each testing phase, and at regular intervals [ 4 ].
Most ToxCast and Tox21 assays have focused on parent compound effects [ 46 ]. There are two primary reasons for a lack of testing on metabolites, degradants, and transformation products. First, some of these compounds are highly reactive and cannot be effectively assessed until metabolically competent systems are created. Second, sufficient quantities of these compounds are needed to provide to multiple testing facilities–many metabolites, degradants, and transformation products are not available on the market, and have therefore not been tested [ 5 ]. An extensive library of thousands of ToxCast chemicals does exist, however, allowing independent laboratories to perform experiments on matched chemical samples [ 5 ]. Nominations for new test chemicals are welcomed, with the biggest limitation being the ability to acquire sufficient quantities of the compounds of interest. We note that while most of the in vitro assays do not have metabolic competency, some assays using primary human hepatocytes or pluripotent liver cells (e.g., HepaRG) do allow the assessment of metabolic effects on the liver [ 47 ]. Additional research is ongoing to apply structure-based metabolism prediction methods and to augment other important assays with metabolic competency.
High throughput exposure screening and the ExpoCast project
While thousands of chemicals have been profiled for bioactivity using HTS, many of these chemicals are lacking data on exposure [ 48 ], which hinders risk-based evaluation. Many more chemicals exist without exposure or bioactivity data, and are in need of “exposure-based prioritization” prior to HTS and risk assessment [ 49 ]. The EPA’s exposure forecaster (ExpoCast) project was therefore developed to generate the data, tools, and evaluation methods required to produce rapid and scientifically-defensible estimates of exposure [ 11 ], and to confidently relate these estimates to concentrations that exhibit bioactivity (identified via HTS) [ 50 , 51 , 52 ]. Since the inception of ExpoCast, EPA has organized and analyzed extant data; collected new data on chemical properties, uses, and occurrence [ 53 , 54 ]; and evaluated/developed mathematical models for predicting exposures across thousands of compounds [ 55 ]. With regards to mathematical modeling, a meet-in-the-middle approach has proven valuable. Using this approach, forward modeling predictions (e.g., those from mechanistic exposure models) have been compared against exposure estimates inferred from down-stream monitoring data (e.g., human biomarker measures, which cover only a small fraction of the overall chemicals of interest). Statistical comparisons of forward model predictions vs. biomarker-based estimates allows global examination of model performance and the impact of specific modeling assumptions on final exposure predictions [ 56 ]. The concepts and strategies for this meet-in-the-middle approach have been described elsewhere and implemented at EPA as part of a Systematic Empirical Evaluation of Models (SEEM) framework [ 57 ].
The SEEM framework allows for crude extrapolation from chemicals with monitoring data to chemicals without such data. To date, this approach has relied upon exposures inferred from urinary biomarker data as reported in the Centers for Disease Control and Prevention’s (CDC) National Health and Nutrition Examination Survey (NHANES). Notable findings of SEEM work include: 1) fate and transport models–that can predict exposure for thousands of chemicals following industrial releases (i.e., “far-field” sources) and migration through the environment [ 58 , 59 ]–are limited in their ability to describe urinary biomarker data [ 57 ]; 2) chemicals present in urine often reflect “near-field” sources in the home, such as consumer products and articles of commerce (e.g., furniture and flooring) [ 57 ]; and 3) five factors (production volume, use in consumer products, use in industrial processes, use as a pesticidal active, and use as a pesticidal inert) are able to explain roughly half of the chemical-to-chemical variance in median exposure rates inferred from NHANES urine data [ 60 ].
Consistent with these findings, new mechanistic models have been developed with a focus on near-field exposure pathways [ 61 , 62 ]; the incorporation of predictions from these new models into the SEEM framework has the potential to refine consensus exposure predictions for data-poor chemicals. In order to parameterize these models, however, information is needed on product formulation—that is, the concentration of chemicals in a product. Goldsmith and colleagues addressed this need by cataloging thousands of Material Safety Data Sheets (MSDS) for products sold by a major U.S. retailer, allowing searches for chemical presence in reviewed products [ 53 ]. Dozens of similar product ingredient databases now exist from other sources and were recently aggregated into EPA’s Chemical and Product Categories (CPCat) database [ 54 ]. Listings within this aggregated database include chemicals declared by the manufacturer or observed through laboratory analysis. It is noteworthy that certain formulated products (e.g., personal care products) have specific labeling guidelines that make ingredient information more prevalent, whereas other products (e.g., household cleaning products, and “durable goods” such as apparel or furniture) are governed by narrow (or non-existent) chemical reporting requirements, and therefore have limited formulation data [ 49 ].
A challenge in using product ingredient databases for mechanistic exposure modeling is the qualitative nature of the formulation data. Even when chemicals are listed as being present in a product, concentration values are often not provided. National production volume data are available for many chemicals, but typically binned into category ranges that can span an order of magnitude, and not directly linked to specific releases or intended use. Further, many chemicals determined to be present in urine by NHANES (generally as metabolites) do not even appear on lists of highly produced chemicals, indicating that they are produced at low levels (less than 25,000 lb/year) or do not emanate from monitored production processes. Finally, while some data exist for chemicals deliberately added to objects, many chemicals are introduced to products through packaging, and are therefore present despite not being explicitly labeled [ 54 ]. Noting these limitations, machine learning models have been developed at EPA to fill knowledge gaps related to product chemistries. These models utilize physico-chemical properties [ 63 ] and/or chemical structure information [ 64 ] to predict functional uses for individual compounds. Functional use estimates are then combined with consumer product ingredient databases (described above) to develop screening-level concentration estimates (“generic formulations”) for select products. These screening-level estimates are appropriate for some applications (e.g., chemical prioritization), but may not be well-suited for rigorous quantitative analyses. Additional product composition data are therefore needed to expand coverage across additional products and non-intentional ingredients, and to support the development of exposure predictions fit for higher-tier safety assessments.
Based on existing product/product-use information, along with environmental and biological monitoring data, it’s clear that chemical exposures often co-occur, leading to the potential for mixture effects on biological systems. To date, limited bioactivity-based HTS has been performed on chemical mixtures, owing, in part, to the vast number of mixtures that could conceivably be tested. Exposure information, however, is now being used to address this limitation. In particular, knowledge of chemical co-occurrence in media [ 65 ] and formulations [ 54 ] are being used to reduce the number of permutations considered for HTS. As an example, in a chemical library of 1000 unique compounds, there are more than 10 300 combinations of compounds that could be evaluated using HTS assays. The role of exposure-based priority setting is to identify known or possible (i.e., those that are likely to occur) chemical mixtures that first require screening, and to set aside mixtures that may never occur. A recent ExpoCast analysis demonstrated the value of this approach using existing measures from CDC’s NHANES. Specifically, Kapraun and colleagues considered chemical co-occurrence using urine and blood measures, and ultimately identified a tractable number of chemical combinations that occurred in greater than 30% of the U.S. population [ 66 ]. The techniques utilized by Kapraun and colleagues now make it possible to readily evaluate chemicals for potentially hazardous synergies. Yet, analyses to date are beholden to limited datasets of target analytes. As such, broad measurement-based datasets are now required to further examine the extent to which chemical exposures co-occur in a consistent, predictable, and biologically-relevant manner.
The Distributed Structure-Searchable Toxicity (DSSTox) database
Data generated from EPA’s ToxCast and ExpoCast programs are now stored within EPA’s DSSTox database. The original DSSTox web site was launched in 2004, providing a common access point for several thousand environmental chemicals associated with four publicly available toxicity databases pertaining to carcinogenicity, aquatic toxicity, water disinfection by-products, and estrogen-receptor binding activity. This collection of DSSTox data files offered a highly-curated, standardized set of chemical structures that was well-suited for structure-activity modeling [ 67 , 68 ]. The quality of mappings between chemical identifiers (names, registry numbers, etc.) and their corresponding structures provided the community with a comprehensive set of mappings to a unified DSSTox structure index. This structure index became the underpinning of the current DSSTox chemical database.
DSSTox continued to expand over the next decade with additional chemical structure files of interest to the toxicology and environmental science communities, including lists of high-production volume (HPV) chemicals, indexed lists of public microarray experiment databases, FDA drugs, and risk assessment lists (e.g., EPA’s Integrated Risk Information System [ https://cfpub.epa.gov/ncea/iris2/atoz.cfm ]). From 2007 onward, the database was enlisted to serve as the cheminformatics backbone of the ToxCast and Tox21 programs, with DSSTox curators registering all chemicals entering both screening libraries [ 5 ]. This enabled the mapping of in vitro and in vivo data to chemical structures, the latter through indexing of the NTP bioassay database and EPA’s Toxicity Reference Database (ToxRefDB) [ 69 ]. By mid-2014, the manually curated DSSTox database had grown to over 20,000 chemical substances (spanning more than a dozen inventories) of high priority to EPA research programs (archived DSSTox content available for download at ftp://ftp.epa.gov/dsstoxftp ).
Despite the growth of DSSTox from 2007–2014, coverage did not extend to larger EPA inventories (e.g., the Toxic Substances Control Act [TSCA] inventory, https://www.epa.gov/tsca-inventory and the Endocrine Disruption Screening Program universe, https://www.epa.gov/endocrine-disruption ), which were beginning to define a putative “chemical exposure landscape” [ 48 , 70 ]. The focused nature of DSSTox stemmed from rate-limiting manual curation efforts, which ensured high quality structure-identifier mappings, but limited opportunities for DSSTox to more broadly support EPA research and regulatory efforts. A number of large chemically-indexed databases (such as PubChem, ChemSpider, ChEMBL, ChemIDPlus, and ACToR) eventually provided access points for additional chemical structures and identifiers. Curation efforts, however, demonstrated high rates of inaccuracies and mis-mapped chemical identifiers in these public domain chemical databases (e.g., a name or registry number incorrectly mapped to one or more structures), a common situation that has previously been reported [ 71 , 72 ]. As such, the decision was ultimately made to expand DSSTox using publicly available resources, while also recognizing the limitations of those resources, and preserving the aspects of quality curation upon which DSSTox was built.
The product of database expansion efforts, known as DSSTox version 2 (V2), was developed using algorithmic curation techniques, both alone and in support of focused, ongoing manual curation efforts. A key constraint applied to the construction of DSSTox_V2 was the requirement for a 1:1:1 mapping among the preferred name for a chemical (chosen to be unique), the active (or current) Chemical Abstracts Services Registration Number (CAS-RN), and the chemical structure, as could be uniquely rendered in mol file format. Subject to these constraints (i.e., disallowing conflicts) chemical structures and uniquely mapped identifiers were sequentially loaded into DSSTox_V2 from the following public databases: the EPA Substance Registry Services (SRS) database (containing the public TSCA chemical inventory, accessed at https://iaspub.epa.gov/sor_internet/registry/substreg/ ); the National Library of Medicine’s (NLM) ChemIDPlus (part of the TOXNET suite of databases, accessed at https://chem.nlm.nih.gov/chemidplus/ ); and the National Center for Biotechnology Information’s (NCBI) PubChem database (the portion containing registry number identifiers along with other chemical identifiers, accessed at https://pubchem.ncbi.nlm.nih.gov/ ). Based on the number of sources that agreed on mappings of identifiers to structures, these public data were loaded with a quality control annotation (qc_level) ranging from low to high. Publicly indexed substances containing structures and identifiers that conflicted with existing DSSTox information were not registered; they were either queued for manual curation if considered important to EPA research programs, or were set aside to be loaded at a later date with appropriate documentation of the conflict.
In addition to the programmatic incorporation of non-conflicting portions of SRS, ChemIDPlus and PubChem into DSSTox_V2, both manual and programmatically assisted curation has continued to address critical gaps in coverage of high-interest environmental lists, including pesticides, food additives, chemicals of potential concern for endocrine disruption [ 73 ], chemicals with known functional use in products [ 54 ], and substances on the public EPA hydraulic fracturing chemicals list ( https://cfpub.epa.gov/ncea/hfstudy/recordisplay.cfm?deid=332990 ). With these latest additions, the DSSTox database now has over 750,000 records, with more than 60,000 manually curated or having consistent identifier assignments in three or more public databases constituting the highest qc_level content. The clean mapping of structural identifiers (names, CAS-RN) to chemical structures provides an essential underpinning to robust and accurate cheminformatics workflows. Elements of such workflows, designed to support quantitative structure-activity relationship (QSAR) modeling as part of EPA’s ToxCast and ExpoCast programs, are now being surfaced through EPA’s CompTox Chemistry Dashboard.
The CompTox Chemistry Dashboard
The CompTox Chemistry Dashboard (hereafter, referred to as the “Dashboard”), developed at NCCT, is a freely accessible web-based application and data hub. Chemical substances surfaced via the Dashboard are hosted in the DSSTox database with associated identifiers (e.g., CAS-RN, systematic and trivial names). The Dashboard is used to search DSSTox using a simple alphanumeric text entry box (Fig. 1a ). A successful search will result in a chemical page header (Fig. 1b ) that provides:
The CompTox Chemistry Dashboard home page (a) and an example chemical page header (b)
a chemical structure image (with ability to download in mol file format);
intrinsic properties (e.g., molecular formula and monoisotopic mass);
chemical identifiers (e.g., systematic name, SMILES string, InChI string, and InChIKey);
related compounds (based on molecular skeleton search, molecular similarity search, and chemical presence in various mixtures and salt forms);
a listing of databases in which the chemical is present (e.g., ToxCast and Tox21); and
a record citation including a unique DSSTox substance identifier (DTXSID).
Below the header is a series of individual data tabs (Fig. 1b ). The “Chemical Properties” and “Environmental Fate and Transport” tabs contain experimental properties assembled from various sources; presented values reflect recent efforts of NCCT to curate specific datasets in support of prediction algorithms [ 74 , 75 ]. The “Synonyms” tab lists all associated systematic and trivial names, and various types of CAS-RN (i.e., active, deleted, and alternate, with the associated flags). The “External Links” tab lists a series of external resources associated with the chemical in question. The “Exposure” tab includes information regarding chemical weight fractions in consumer products, product use and functional use categories, NHANES monitoring data, and predicted exposure using the ExpoCast models. The “Bioassays” tab provides access to details of the ToxCast data and bioassay data available in PubChem. The “Toxicity” values tab includes data gathered from multiple EPA databases and documents, and various online open data sources. The “Literature” tab allows a user to choose from a series of queries, and perform searches against Google Scholar and Pubmed. It further integrates PubChem widgets for articles and patents. In general, all tabular data surfaced on the Dashboard can be downloaded as either tab-separated value files or Excel files, or included into an SDF file with the chemical structure.
An advanced search on the Dashboard (Fig. 2a ) allows for mass searching, molecular formula searching, and molecular formula generation (based on a mass input). A batch search (Fig. 2b ) further allows users to input lists of chemical names, CAS numbers, InChI Keys and other identifiers, and to retrieve formulae, masses, DTXSIDs, and other data related to chemical bioactivity and exposure. Various slices of data associated with the Dashboard are available as open data and can be obtained via the downloads page ( https://comptox.epa.gov/dashboard/downloads ). A detailed help file regarding how to use the Dashboard is also available online ( https://comptox.epa.gov/dashboard/help ).
The CompTox Chemistry Dashboard advanced search menu (a) and batch search menu (b)
Summary of EPA’s NTA workshop and collaborative trial
In August 2015, ORD’s National Exposure Research Laboratory (NERL) and Chemical Safety for Sustainability (CSS) research program jointly hosted an NTA-focused workshop in Research Triangle Park, North Carolina. The purpose of the workshop was to bring together world experts in exposure science, toxicology, cheminformatics, and analytical chemistry to discuss opportunities for collaboration and research integration. Invited presentations focused on research and regulatory drivers; existing data, tools, and resources that are being used to support HTS programs (as described in the previous sections); and NTA methods that are being developed and applied to characterize the exposome. Presentations from EPA science leaders called for engagement among research communities and highlighted how individual groups stand to benefit from shared knowledge and resources. Needs of the exposure scientists (representing the ExpoCast project), toxicologists (representing the ToxCast project), and analytical chemists (representing NTA projects) were articulated during the workshop as follows:
Needs of exposure scientists (ExpoCast) to support HT exposure screening:
Measurements of chemicals in consumer products and articles of commerce
Measurements of chemicals in environmental/residential media
Measurements of chemicals in biological media
Needs of toxicologists (ToxCast) to support HT bioactivity screening:
Prioritized lists of candidate parent (registered) chemicals
Prioritized lists of candidate degradants/metabolites
Prioritized lists of candidate chemical mixtures
Needs of analytical chemists (NTA) to support exposome research:
Large, relevant, curated, and open chemical databases for compound identification
Informatics tools for candidate selection and prioritization
Large inventories of chemical standards and reference spectra for candidate confirmation
Laboratory networks to support comprehensive analyses and standardized methods
The needs of the exposure scientists reflect the general lack of measurement data that are required to parameterize and ultimately evaluate exposure models. The needs of the toxicology community reflect the challenge of utilizing HTS methods to characterize bioactivity across tens-of-thousands of known compounds, and many more possible degradants, metabolites, and mixtures. Finally, the needs of the analytical chemistry community reflect the resources that are required for a holistic examination of the exposome.
Two days of discussion on these needs led to the planning and development of a research collaboration that will benefit all invested parties. A primary goal of the research collaboration is to answer the following questions:
How can resources procured for HTS research in support of chemical safety evaluations be used to advance NTA methods?
How can measurement data generated from NTA methods be used to direct HTS research and strengthen chemical safety evaluations?
EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT) was developed in direct response to these questions. ENTACT makes full use of EPA’s ToxCast library of approximately 4000 compounds, is designed to be conducted in three parts, and involves international participants spanning more than 25 government, academic, and private/vendor laboratories. For part I of ENTACT, approximately 1200 compounds from the ToxCast library were combined into a series of synthetic mixtures, with ~100 to ~400 compounds included in each mixture. Laboratories participating in ENTACT will perform blinded analyses of these mixtures using their state-of-the art NTA methods. Individual methods will span a variety of separation and detection techniques, instruments, software, databases, and workflows. Results will be compiled by EPA and used to determine which NTA tools are best suited for the detection of specific compounds or groups of compounds. They will further indicate the extent to which sample complexity affects NTA method performance. Finally, they will serve as the basis for future QSAR models that predict the likelihood of a given compound being detected by a selected analytical method.
Part I of ENTACT evaluates NTA method performance using samples of fully synthetic mixtures. Part II, on the other hand, evaluates NTA method performance using extracts of true environmental and biological samples. Here, extracts of reference material house dust (National Institute of Standards and Technology [NIST] Standard Reference Material [SRM] 2585), human serum (NIST SRM 1957), and silicone passive air samplers were shared across laboratories to determine the region of chemical space that can be characterized using specific NTA methods, by sample type. To explore the extent to which the matrices affect extraction and other method performance parameters, each sample has also been fortified with a mixture of ToxCast chemicals prior to extraction. As such, laboratories participating in ENTACT have received two extracts of each medium—one based on a fortified reference sample and one based on an unaltered reference sample. Results of part II analyses will identify the most suitable methods for characterizing specific chemicals within a given medium. Perhaps more importantly, they will indicate how comprehensively a concerted effort of top laboratories can characterize compounds within house dust, human serum, and passive air samplers.
Parts I and II of ENTACT have been open to all interested laboratories, resources permitting. Part III, however, has been open only to instrument vendors, and select institutions that manage large open databases/software in support of NTA workflows. For part III, the full ToxCast chemical library, totaling ~4000 unique substances, is being shared for the purpose of generating reference mass spectra across a variety of instruments and analytical conditions (e.g., ionization source, ionization mode, collision energy, MS level). Institutions receiving these compounds will generate individual spectral records and make them available to EPA for further public use. Institutions may also make spectral records available to the public, or their customers, via addition to existing databases or development of compound libraries. Collectively, these efforts will enable users of many MS and HRMS platforms to rapidly screen for the presence of ToxCast chemicals in samples of their choosing. Results from these screening-level analyses will then provide provisional measurement data (e.g., presence/absence in a given medium) across thousands of compounds for which exposure data are currently lacking. These data will ultimately allow an improved understanding of aggregate exposures (i.e., one compound, multiple exposure pathways), cumulative exposures (i.e., multiple compounds, multiple exposure pathways, one biological target), and the contribution of ToxCast chemicals to the exposome.
Framework for research integration
A formalized framework is needed to ensure maximum benefit of ENTACT to both exposome and chemical screening research programs. The primary function of the framework, as shown in Fig. 3 , is to highlight how and where existing chemical screening tools (i.e., ToxCast, ExpoCast, DSSTox, and the CompTox Chemistry Dashboard) can be leveraged to enhance NTA efforts, and ways in which NTA data can allow for more informed chemical screening.
A framework for integrating NTA methods and data with HTS tools (ToxCast, ExpoCast, and the CompTox Chemistry Dashboard [with the underlying DSSTox database]) available from EPA’s Office of Research and Development
The first step within the framework is the physical analysis of products/articles, environmental samples, and/or biological samples using NTA methods (Fig. 3 ). Irrespective of the medium in question, no single analysis method, no matter how refined, is able to characterize the full chemical contents of a given sample. The use of multiple methods and analytical platforms, however, can greatly extend surveillance capabilities. A goal of ENTACT is to determine the chemical space applicability domain for a given method. Trial results will inform the breadth of approaches required to adequately characterize a given medium, or to address a given research, public health, or regulatory need. For example, trial results will indicate the number and types of methods required to screen for all ToxCast chemicals in a suite of consumer products. Trial results will also identify compounds that have yet to be considered as part of ToxCast/Tox21 but are present in select environmental and biological media. As described in detail below, latter steps of the framework determine which of these compounds, if any, should be prioritized for bioactivity screening.
Candidate identification and evaluation
Within ORD, the CompTox Chemistry Dashboard, and the underlying DSSTox database, are primary NTA tools for candidate identification and evaluation (Fig. 3 ). Initial work has determined that the Dashboard can effectively identify “known unknowns” in samples using data source ranking techniques as developed by Little et al [ 76 ]. Here, the Dashboard is used to search unidentified features from HRMS experiments within a mass range, or by an exact formula, and the most likely candidate chemicals are those with the highest data source counts [ 77 ]. Data source ranking alone, however, does not provide sufficient evidence for a “probable” compound classification [ 39 ]. The Dashboard is therefore incorporating additional data streams, models, and functionality to increase certainty when assigning structures to unknown compounds. For example, chemical functional use data from EPA’s CPCat database are now available through the Dashboard and can be incorporated into workflows to filter lists of tentative structures. A new and enhanced version of CPCat, the Consumer Products Database (CPDat), has been developed, made available as a beta release in the March, 2017 update to the Dashboard, and further provides predicted functional uses for chemicals with no known use data. This information can help determine the likelihood that a given compound would be present in a given sample (e.g., a textile dye is more likely than a drug to be found in house dust) [ 77 ]. In addition, physicochemical properties of candidate chemicals are available within the Dashboard, and can be used to predict the likelihood of environmental media occurrence, and the suitability of a selected laboratory method (e.g., extraction solvent, separation technique, ionization mode) for detection.
The utilization of relevant data streams within the Dashboard can improve the confidence in structural assignments, but a true one-pass analysis requires the ability to search large lists of unidentified features exported from an HRMS instrument. Batch search capability within the Dashboard (Fig. 2b ) now enables users to search thousands of instrument generated molecular formulae at once and receive back the top ten most likely candidate chemicals with associated chemical data (e.g., identifiers, properties, structures, etc.). A further enhancement to this search capability is the inclusion of “MS-ready” structures, whereby all chemicals within the database have been desalted, desolvated, and had stereochemistry removed to represent the forms of chemicals observed via HRMS. In addition to this feature, and the aforementioned features for data source ranking and functional use filtering, spectral matching capabilities will eventually provide supporting evidence for compound identification. Specifically, linking Dashboard records to those from open spectral libraries (e.g., MassBank and MoNa) and fragmentation prediction resources (e.g., MetFrag and CFM-ID) will allow for further confidence in probable identifications. Finally, incorporation of empirical reference spectra from vendors participating in ENTACT will allow rapid screening for a large suite of ToxCast compounds.
Once probable structures have been proposed, chemical standards are used for feature confirmation, and in some cases, quantitation. As additional standards become available, incremental advances are to be expected in the percentages of probable and confirmed structures relative to tentative candidates. By definition, however, the ability to confirm compounds will always be limited by the availability of chemical standards. This limitation is likely to persist given the cost and time associated with standard synthesis. As such, focus must be given to tools for prioritizing tentative candidates that require further study. In other words, methods should be employed that help determine which tentative compounds require further study, and which are potentially of little health consequence.
In a previous pilot study [ 31 ], we identified molecular features in house dust samples using LC-TOF HRMS, proposed tentative candidates by screening observed molecular features against the DSSTox database (which included, at the time, ~33 K compounds), and prioritized tentative candidates for further analysis using data from ToxCast and ExpoCast [ 31 ]. Priority candidates - those predicted to have high bioactivity, exposure potential, or both - were examined to identify which candidates could be further classified as probable structures. ToxCast standards were ultimately used to confirm a manageable list of compounds. About half of the confirmed chemicals, according to a review of the published literature, had never before been measured in house dust. This pilot study paved the way for a number of NTA studies now being conducted by EPA/ORD, and serves as the basis for the framework proposed here. It was further featured in the recent NRC report “Using 21st Century Science to Improve Risk-Related Evaluations” as an example of an “…innovative approach for identifying and setting priorities among chemicals for additional exposure assessment, hazard testing, and risk assessment that complements the current hazard-oriented paradigm” [ 9 ]
ToxCast and ExpoCast data exist for thousands of DSSTox chemicals, and are freely available to the public via the Dashboard. The Dashboard can therefore be used to identify tentative candidates (via formula or mass-based searching), and then sort these candidates based on potential for human (or ecological) contact and biological response. Figure 3 depicts how ToxCast and ExpoCast data were used in our previous dust analysis, and are now integrated into the research framework. As shown in Fig. 3 , exposure and bioactivity estimates for tentative candidates are combined into a prioritization algorithm, along with estimates of feature abundance (i.e., average peak intensity across samples) and detection frequency. EPA’s Toxicological Prioritization Index (ToxPi) software is then used to generate graphical displays for each tentative candidate [ 78 ]. Here, each pie wedge represents a weighted and normalized value for the selected variable. The scoring algorithm and ToxPi graphical representation are completely customizable—new variables and different weighting schemes can be easily applied. To date, our internal analyses have given more weight to candidates with elevated detection frequency and evidence of bioactivity.
Exposure and bioactivity evaluation
While exposure and bioactivity data are available for thousands of chemicals, the majority of DSSTox compounds (~ 99%) are without these data. With regards to priority scoring, compounds with data are considered separately from those without data. A bifurcation of the research workflow is shown in Fig. 3 to depict this differentiation. Here, compounds with data are shown to undergo a series of steps to enable exposure evaluation, whereas compounds without data are further considered as part of a bioactivity evaluation.
EPA’s HT exposure models and ExpoCast framework make use of and predict environmental and biological concentrations of known compounds. Often, limited data are available as model inputs and for model parameterization, which can lead to large uncertainties in media concentration or final exposure estimates. Chemical measurements are therefore needed to help parameterize, evaluate, and refine existing models. A major goal of the proposed research framework is to enable NTA data to meet these needs. Here, the initial focus is on compounds classified as probable structures, and ranked as high-priority using the ToxPi approach. As a first step, to the extent that resources allow, these compounds are confirmed using existing standards - provisional concentrations may then be estimated using a variety of techniques [ 79 ]. These concentration estimates are then compared to predictions from HT exposure models. Agreement between predicted and estimated concentrations provide confidence in model performance. Sizable disparities between model predicted and laboratory estimated values, however, may prompt re-evaluation of model structures and parameters, and/or follow-up laboratory analyses. Specifically, targeted methods may be developed and applied in instances where NTA-estimated concentrations significantly exceed model predicted values and encroach on exposure thresholds that are consistent with predicted biological activity. The final product of these steps is strengthened assessments of potential risk for confirmed high-priority compounds.
As DSSTox increases in size, so does the number of probable structures for which exposure and bioactivity data are unavailable. For a given experiment, it is not uncommon to have ten times as many probable structures without exposure and bioactivity data than probable structures with this data. It is critical that these compounds are not disregarded from further analysis based on existing data limitations. Rather, these compounds must pass a cursory evaluation for bioactivity before being exempted from further consideration. QSAR modeling has been applied to determine which compounds are most likely to be bioactive, and therefore higher priority. For example, the Collaborative Estrogen Receptor Activity Prediction Project recently predicted ER activity across a set over 32,000 chemical structures [ 73 ]. Using these predictions, candidate compounds can be prioritized, and attempts made at confirmation using standards and/or additional targeted analysis procedures. Confirmed high-priority compounds are eventually nominated for in vitro screening through the ToxCast program. Results of the ToxCast assays, as well as any new ExpoCast predictions, are ultimately collated within the DSSTox database and Chemistry Dashboard, and used to support Agency prioritization efforts and eventual decisions.
Conclusions and outlook
Studies at EPA are now being planned and executed with this integrated research framework in mind. Analyses as part of ENTACT are underway (as of January 2017) and will be a source of measurement data for thousands of ToxCast compounds, and chemicals, degradants, metabolites, and mixtures not currently considered by ToxCast/Tox21. The content of DSSTox and the functionality of the Dashboard are constantly expanding, including the addition of chemical datasets provided by other parties interested in NTA, thereby allowing better access to chemistry data and tools for supporting cheminformatics applications and NTA workflows. Also expanding are the exposure and bioactivity data being generated by the ExpoCast and ToxCast projects, respectively. Semi-quantitative NTA measures across a variety of media will soon enable evaluation and refinement of ExpoCast predictions. When examined using bioactivity prediction models, these NTA measures will further yield prioritized lists of compounds that can be considered for ToxCast screening.
It is worth noting that measurement data from NTA studies will not parallel those from targeted studies in terms of accuracy and precision. The NTA community will surely face challenges when comparing semi-quantitative data over time, and across analytical platforms and labs. Standardized approaches will therefore be needed to ensure the appropriate generation, communication, and use of NTA measurement data. Numeric results from ENTACT are expected to shed light on the severity of this issue (i.e., the amount of variability in semi-quantitative measures from one experiment to the next) and act as a large training set for future concentration prediction models. Ultimately, NTA data are intended to be fit-for-purpose—that is, to support screening-level activities. Targeted measures will always be the benchmark for risk-based decisions and actions, and therefore must be generated in tandem, as needed, with NTA measures (Fig. 3 ). Such a combined measurement scheme will provide a solid foundation for 21st century chemical safety evaluations, and an improved understanding of the chemical composition of the exposome.
Collins FS, Gray GM, Bucher JR. Toxicology. Transforming environmental health protection. Science. 2008;319:906–7.
Article PubMed PubMed Central CAS Google Scholar
NRC. Toxicity testing in the 21st century: a vision and a strategy. Washington, DC: National Academies Press; 2007.
Tice RR, Austin CP, Kavlock RJ, Bucher JR. Improving the human hazard characterization of chemicals: a Tox21 update. Environ Health Perspect. 2013;121:756–65.
Kavlock R, Chandler K, Houck K, Hunter S, Judson R, Kleinstreuer N, et al. Update on EPA’s ToxCast program: providing high throughput decision support tools for chemical risk management. Chem Res Toxicol. 2012;25:1287–302.
Article PubMed CAS Google Scholar
Richard AM, Judson RS, Houck KA, Grulke CM, Volarath P, Thillainadarajah I, et al. ToxCast chemical landscape: paving the road to 21st century toxicology. Chem Res Toxicol. 2016;29:1225–51.
Edwards SW, Tan YM, Villeneuve DL, Meek ME, McQueen CA. Adverse outcome pathways-organizing toxicological information to improve decision making. J Pharmacol Exp Ther. 2016;356:170–81.
Kleinstreuer NC, Sullivan K, Allen D, Edwards S, Mendrick DL, Embry M, et al. Adverse outcome pathways: From research to regulation scientific workshop report. Regul Toxicol Pharmacol: RTP. 2016;76:39–50.
Article PubMed Google Scholar
NRC. Exposure science in the 21st century: a vision and a strategy. Washington, DC: National Academies Press; 2012.
NRC. Using 21st century science to improve risk-related evaluations. Washington, DC: National Academies Press; 2017.
Escher BI, Hackermuller J, Polte T, Scholz S, Aigner A, Altenburger R, et al. From the exposome to mechanistic understanding of chemical-induced adverse effects. Environ Int. 2017;99:97–106.
Hubal EA. Biologically relevant exposure science for 21st century toxicity testing. Toxicol Sci. 2009;111:226–32.
Cohen Hubal EA, Richard AM, Shah I, Gallagher J, Kavlock R, Blancato J, et al. Exposure science and the U.S. EPA National Center for Computational Toxicology. J Expo Sci Environ Epidemiol. 2010;20:231–6.
Egeghy PP, Sheldon LS, Isaacs KK, Ozkaynak H, Goldsmith MR, Wambaugh JF, et al. Computational exposure science: an emerging discipline to support 21st-century risk assessment. Environ Health Perspect. 2016;124:697–702.
Wishart D, Arndt D, Pon A, Sajed T, Guo AC, Djoumbou Y, et al. T3DB: the toxic exposome database. Nucleic Acids Res. 2015;43:D928–34.
Neveu V, Moussy A, Rouaix H, Wedekind R, Pon A, Knox C, et al. Exposome-Explorer: a manually-curated database on biomarkers of exposure to dietary and environmental factors. Nucleic Acids Res. 2017;45:D979–84
Menikarachchi LC, Hill DW, Hamdalla MA, Mandoiu II, Grant DF. In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics. J Chem Inf Model. 2013;53:2483–92.
Rothwell JA, Urpi-Sarda M, Boto-Ordonez M, Llorach R, Farran-Codina A, Barupal DK, et al. Systematic analysis of the polyphenol metabolome using the Phenol-Explorer database. Mol Nutr Food Res. 2016;60:203–11.
Warth B, Spangler S, Fang M, Johnson C, Forsberg E, Granados A, et al. Exposome-scale investigations guided by global metabolomics, pathway analysis, and cognitive computing. Anal Chem. 2017;89:11505–13.
Wishart DS, Jewison T, Guo AC, Wilson M, Knox C, Liu Y, et al. HMDB 3.0–The Human Metabolome Database in 2013. Nucleic Acids Res. 2013;41:D801–7.
Edmands WM, Petrick L, Barupal DK, Scalbert A, Wilson MJ, Wickliffe JK, et al. compMS2Miner: an automatable metabolite identification, visualization, and data-sharing R package for high-resolution LC-MS data sets. Anal Chem. 2017;89:3919–28.
Schymanski EL, Singer HP, Longree P, Loos M, Ruff M, Stravs MA, et al. Strategies to characterize polar organic contamination in wastewater: exploring the capability of high resolution mass spectrometry. Environ Sci Technol. 2014;48:1811–8.
Wild CP. Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Cancer Epidemiol Biomark Prev. 2005;14:1847–50.
Article CAS Google Scholar
Rappaport SM. Implications of the exposome for exposure science. J Expo Sci Env Epid. 2011;21:5–9.
Rappaport SM, Smith MT. Epidemiology. Environment and disease risks. Science. 2010;330:460–1.
Rappaport SM. Biomarkers intersect with the exposome. Biomarkers. 2012;17:483–9.
Miller GW, Jones DP. The nature of nurture: refining the definition of the exposome. Toxicol Sci. 2014;137:1–2.
Pleil JD, Stiegel MA. Evolution of environmental exposure science: using breath-borne biomarkers for “discovery” of the human exposome. Anal Chem. 2013;85:9984–90.
Andra SS, Austin C, Patel D, Dolios G, Awawda M, Arora M. Trends in the application of high-resolution mass spectrometry for human biomonitoring: An analytical primer to studying the environmental chemical space of the human exposome. Environ Int. 2017;100:32–61.
Krauss M, Singer H, Hollender J. LC-high resolution MS in environmental analysis: from target screening to the identification of unknowns. Anal Bioanal Chem. 2010;397:943–51.
Enders JR, Phillips MB, Clewell HJ, Clewell RA, Strynar MJ, Ulrich EM, et al. Application of non-targeted exposure analysis in assessment: opportunities and challenges. In preparation.
Rager JE, Strynar MJ, Liang S, McMahen RL, Richard AM, Grulke CM, et al. Linking high resolution mass spectrometry data with exposure and toxicity forecasts to advance high-throughput environmental monitoring. Environ Int. 2016;88:269–80.
Brack W, Ait-Aissa S, Burgess RM, Busch W, Creusot N, Di Paolo C, et al. Effect-directed analysis supporting monitoring of aquatic environments–An in-depth overview. Sci Total Environ. 2016;544:1073–118.
Fang M, Webster TF, Stapleton HM. Activation of human peroxisome proliferator-activated nuclear receptors (PPARgamma1) by semi-volatile compounds (SVOCs) and chemical mixtures in indoor dust. Environ Sci Technol. 2015;49:10057–64.
Phillips KA, Yau A, Favela KA, Isaacs K, Grulke CM, Richard AM, et al. Suspect screening analysis of chemicals in consumer products. Submitted.
Nakamura J, Mutlu E, Sharma V, Collins L, Bodnar W, Yu R, et al. The endogenous exposome. DNA Repair (Amst). 2014;19:3–13.
Article PubMed Central CAS Google Scholar
Rappaport SM, Barupal DK, Wishart D, Vineis P, Scalbert A. The blood exposome and its role in discovering causes of disease. Environ Health Persp. 2014;122:769–74.
Andra SS, Austin C, Arora M. The tooth exposome in children’s health research. Curr Opin Pediatr. 2016;28:221–7.
Article PubMed PubMed Central Google Scholar
Lioy PJ, Rappaport SM. Exposure science and the exposome: an opportunity for coherence in the environmental health sciences. Environ Health Perspect. 2011;119:A466–7.
PubMed PubMed Central Google Scholar
Schymanski EL, Jeon J, Gulde R, Fenner K, Ruff M, Singer HP, et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ Sci Technol. 2014;48:2097–8.
McEachran AD, Mansouri K, Newton SR, Beverly B, Sobus JR, Williams AJ. A comparison of three chromatographic retention time prediction models. Submitted.
Ridder L, van der Hooft JJ, Verhoeven S. Automatic compound annotation from mass spectrometry data using MAGMa. Mass Spectrom. 2014;3:S0033.
Article Google Scholar
Judson R, Richard A, Dix D, Houck K, Elloumi F, Martin M, et al. ACToR–aggregated computational toxicology resource. Toxicol Appl Pharmacol. 2008;233:7–13.
Judson R, Houck K, Martin M, Richard AM, Knudsen TB, Shah I, et al. Analysis of the effects of cell stress and cytotoxicity on in vitro assay activity across a diverse chemical and assay space. Toxicol Sci. 2016;153:409.
Judson RS, Houck KA, Kavlock RJ, Knudsen TB, Martin MT, Mortensen HM, et al. In vitro screening of environmental chemicals for targeted testing prioritization: the ToxCast project. Environ Health Perspect. 2010;118:485–92.
Browne P, Judson RS, Casey WM, Kleinstreuer NC, Thomas RS. Screening chemicals for estrogen receptor bioactivity using a computational model. Environ Sci Technol. 2015;49:8804–14.
Pinto CL, Mansouri K, Judson R, Browne P. Prediction of estrogenic bioactivity of environmental chemical metabolites. Chem Res Toxicol. 2016;29:1410–27.
Rotroff DM, Beam AL, Dix DJ, Farmer A, Freeman KM, Houck KA, et al. Xenobiotic-metabolizing enzyme and transporter gene expression in primary cultures of human hepatocytes modulated by ToxCast chemicals. J Toxicol Environ Health B Crit Rev. 2010;13:329–46.
Egeghy PP, Judson R, Gangwal S, Mosher S, Smith D, Vail J, et al. The exposure data landscape for manufactured chemicals. Sci Total Environ. 2012;414:159–66.
Egeghy PP, Vallero DA, Hubal EAC. Exposure-based prioritization of chemicals for risk assessment. Environ Sci Policy. 2011;14:950–64.
Wambaugh JF, Wetmore BA, Pearce R, Strope C, Goldsmith R, Sluka JP, et al. Toxicokinetic triage for environmental chemicals. Toxicol Sci. 2015;147:55–67.
Wetmore BA, Wambaugh JF, Allen B, Ferguson SS, Sochaski MA, Setzer RW, et al. Incorporating high-throughput exposure predictions with dosimetry-adjusted in vitro bioactivity to inform chemical toxicity testing. Toxicol Sci. 2015;148:121–36.
Wetmore BA, Wambaugh JF, Ferguson SS, Sochaski MA, Rotroff DM, Freeman K, et al. Integration of dosimetry, exposure, and high-throughput screening data in chemical toxicity assessment. Toxicol Sci. 2012;125:157–74.
Goldsmith MR, Grulke CM, Brooks RD, Transue TR, Tan YM, Frame A, et al. Development of a consumer product ingredient database for chemical exposure screening and prioritization. Food Chem Toxicol. 2014;65:269–79.
Dionisio KL, Frame AM, Goldsmith M-R, Wambaugh JF, Liddell A, Cathey T, et al. Exploring consumer exposure pathways and patterns of use for chemicals in the environment. Toxicol Rep. 2015;2:228–37.
Mitchell J, Arnot JA, Jolliet O, Georgopoulos PG, Isukapalli S, Dasgupta S, et al. Comparison of modeling approaches to prioritize chemicals based on estimates of exposure and exposure potential. Sci Total Environ. 2013;458-460:555–67.
Chadeau-Hyam M, Athersuch TJ, Keun HC, De Iorio M, Ebbels TM, Jenab M, et al. Meeting-in-the-middle using metabolic profiling–a strategy for the identification of intermediate biomarkers in cohort studies. Biomarkers. 2011;16:83–8.
Wambaugh JF, Setzer RW, Reif DM, Gangwal S, Mitchell-Blackwood J, Arnot JA, et al. High-throughput models for exposure-based chemical prioritization in the ExpoCast project. Environ Sci Technol. 2013;47:8479–88.
PubMed CAS Google Scholar
Arnot JA, Brown TN, Wania F, Breivik K, McLachlan MS. Prioritizing chemicals and data requirements for screening-level exposure and risk assessment. Environ Health Perspect. 2012;120:1565–70.
Arnot JA, Mackay D, Webster E, Southwood JM. Screening level risk assessment model for chemical fate and effects in the environment. Environ Sci Technol. 2006;40:2316–23.
Wambaugh JF, Wang A, Dionisio KL, Frame A, Egeghy P, Judson R, et al. High throughput heuristics for prioritizing human exposure to environmental chemicals. Environ Sci Technol. 2014;48:12760–7.
Csiszar SA, Ernstoff AS, Fantke P, Meyer DE, Jolliet O. High-throughput exposure modeling to support prioritization of chemicals in personal care products. Chemosphere. 2016;163:490–8.
Isaacs KK, Glen WG, Egeghy P, Goldsmith MR, Smith L, Vallero D, et al. SHEDS-HT: an integrated probabilistic exposure model for prioritizing exposures to chemicals with near-field and dietary sources. Environ Sci Technol. 2014;48:12750–9.
Isaacs KK, Goldsmith MR, Egeghy P, Phillips K, Brooks R, Hong T, et al. Characterization and prediction of chemical functions and weight fractions in consumer products. Toxicol Rep. 2016;3:723–32.
Phillips KA, Wambaugh JF, Grulke CM, Dionisio KL, Isaacs KK. High-throughput screening of chemicals as functional substitutes using structure-based classification models. Green Chem. 2017;19:1063–74.
Tornero-Velez R, Egeghy PP, Cohen Hubal EA. Biogeographical analysis of chemical co-occurrence data to identify priorities for mixtures research. Risk Anal. 2012;32:224–36.
Kapraun DF, Wambaugh JF, Ring CL, Tornero-Velez R, Setzer RW. A method for identifying prevalent chemical combinations in the US population. Environ Health Perspect. 2017;125:087017.
Richard AM, Yang C, Judson RS. Toxicity data informatics: supporting a new paradigm for toxicity prediction. Toxicol Mech Methods. 2008;18:103–18.
Richard AM. DSSTox Website launch: Improving public access to databases for building structure-toxicity prediction models. Preclinica. 2004;2:103–8.
CAS Google Scholar
Martin MT, Judson RS, Reif DM, Kavlock RJ, Dix DJ. Profiling chemicals based on chronic toxicity results from the U.S. EPA ToxRef Database. Environ Health Perspect. 2009;117:392–9.
Judson R, Richard A, Dix DJ, Houck K, Martin M, Kavlock R, et al. The toxicity data landscape for environmental chemicals. Environ Health Perspect. 2009;117:685–95.
Williams AJ, Ekins S. A quality alert and call for improved curation of public chemistry databases. Drug Discov Today. 2011;16:747–50.
Williams AJ, Ekins S, Tkachenko V. Towards a gold standard: regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discov Today. 2012;17:685–701.
Mansouri K, Abdelaziz A, Rybacka A, Roncaglioni A, Tropsha A, Varnek A, et al. CERAPP: Collaborative Estrogen Receptor Activity Prediction Project. Environ Health Perspect. 2016;124:1023–33.
Mansouri K, Grulke CM, Richard AM, Judson RS, Williams AJ. An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Environ Res. 2016;27:939–65.
Zang Q, Mansouri K, Williams AJ, Judson RS, Allen DG, Casey WM, et al. In silico prediction of physicochemical properties of environmental chemicals using molecular fingerprints and machine learning. J Chem Inf Model. 2017;57:36–49.
Little JL, Williams AJ, Pshenichnov A, Tkachenko V. Identification of “known unknowns” utilizing accurate mass data and ChemSpider. J Am Soc Mass Spectrom. 2012;23:179–85.
McEachran AD, Sobus JR, Williams AJ. Identifying known unknowns using the US EPA’s CompTox chemistry dashboard. Anal Bioanal Chem. 2017;409:1729–35.
Reif DM, Sypa M, Lock EF, Wright FA, Wilson A, Cathey T, et al. ToxPi GUI: an interactive visualization tool for transparent integration of data from diverse sources of evidence. Bioinformatics. 2013;29:402–3.
Go YM, Walker DI, Liang Y, Uppal K, Soltow QA, Tran V, et al. Reference Standardization for mass spectrometry and high-resolution metabolomics applications to exposome research. Toxicol Sci. 2015;148:531–43.
The authors thank Adam Biales, Myriam Medina-Vera, John Kenneke, Sania Tong-Argao, Timothy Buckley, Annette Guiseppi-Elie, Jennifer Orme-Zavaleta, Tina Bahadori, Russell Thomas, Robert Kavlock, and Thomas Burke from EPA for their guidance and support. The authors also thank Barbara Wetmore, Peter Egeghy, and Jeffre Johnson from EPA for their thoughtful reviews of this manuscript. The United States Environmental Protection Agency through its Office of Research and Development funded and managed the research described here. It has been subjected to Agency administrative review and approved for publication. Julia Rager and Andrew McEachran were supported by an appointment to the Internship/Research Participation Program at the Office of Research and Development, U.S. Environmental Protection Agency, administered by the Oak Ridge Institute for Science and Education through an interagency agreement between the U.S. Department of Energy and EPA.
Julia E. Rager
Present address: ToxStrategies, Inc., 9390 Research Blvd., Suite 100, Austin, TX, 78759, USA
Authors and Affiliations
U.S. Environmental Protection Agency, Office of Research and Development, National Exposure Research Laboratory, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Jon R. Sobus, Kristin K. Isaacs, Elin M. Ulrich, Mark J. Strynar & Seth R. Newton
U.S. Environmental Protection Agency, Office of Research and Development, National Center for Computational Toxicology, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
John F. Wambaugh, Antony J. Williams, Ann M. Richard & Christopher M. Grulke
Oak Ridge Institute for Science and Education (ORISE) Participant, 109 T.W. Alexander Drive, Research Triangle Park, NC, 27709, USA
Andrew D. McEachran & Julia E. Rager
You can also search for this author in PubMed Google Scholar
Correspondence to Jon R. Sobus .
Conflict of interest.
The authors declare that they have no conflict of interest.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, and provide a link to the Creative Commons license. You do not have permission under this license to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/ .
Reprints and Permissions
About this article
Cite this article.
Sobus, J.R., Wambaugh, J.F., Isaacs, K.K. et al. Integrating tools for non-targeted analysis research and chemical safety evaluations at the US EPA. J Expo Sci Environ Epidemiol 28 , 411–426 (2018). https://doi.org/10.1038/s41370-017-0012-y
Received : 27 May 2017
Revised : 04 August 2017
Accepted : 25 August 2017
Published : 29 December 2017
Issue Date : September 2018
DOI : https://doi.org/10.1038/s41370-017-0012-y
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Non-targeted analysis
- Suspect screening
This article is cited by
Predicting rp-lc retention indices of structurally unknown chemicals from mass spectrometry data.
- Jim Boelrijk
- Denice van Herwerden
- Saer Samanipour
Journal of Cheminformatics (2023)
A precision environmental health approach to prevention of human disease
- Andrea Baccarelli
- Dana C. Dolinoy
- Cheryl Lyn Walker
Nature Communications (2023)
Screening for drinking water contaminants of concern using an automated exposure-focused workflow
- Kristin K. Isaacs
- Jonathan T. Wall
- Christopher Greene
Journal of Exposure Science & Environmental Epidemiology (2023)
Evaluating non-targeted analysis methods for chemical characterization of organic contaminants in different matrices to estimate children’s exposure
- Natalia Quinete
Investigating geographic differences in environmental chemical exposures in maternal and cord sera using non-targeted screening and silicone wristbands in California
- Dana E. Goin
- Dimitri Abrahamsson
- Tracey J. Woodruff
- Explore articles by subject
- Guide to authors
- Editorial policies
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Account settings
- Advanced Search
- Journal List
- EPA Author Manuscripts
An Introduction to the Benchmarking and Publications for Non-Targeted Analysis Working Group
Benjamin j. place.
1. National Institute of Standards and Technology, Gaithersburg, MD, USA 20899
Elin M. Ulrich
2. U.S. Environmental Protection Agency, Office of Research and Development, Center for Computational Toxicology and Exposure, Research Triangle Park, NC, USA 27711
Jonathan K. Challis
3. Toxicology Centre, University of Saskatchewan, Saskatoon, Canada S7N 5B3
4. Southern California Coastal Water Research Project Authority, Costa Mesa, CA, USA 92626
5. Southwest Research Institute, San Antonio, TX, USA 78238
6. Exposure and Biomonitoring Division, Environmental Health Science and Research Bureau, Health Canada, Ottawa, Ontario, Canada, K1A 0K9
Christine M. Fisher
7. U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, MD, USA 20740.
8. Institute of Environment & Department of Chemistry and Biochemistry, Florida International University, North Miami, FL 33181
9. U.S. Food and Drug Administration, Center for Devices and Radiological Health, Silver Spring, MD, USA 20993
Ann M. Knolhoff
Andrew d. mceachran.
10. Agilent Technologies, Inc. Santa Clara, CA, USA 95051
Sara L. Nason
11. Connecticut Agricultural Experiment Station, New Haven, CT, USA 06511
Seth R. Newton
12. Pacific Northwest National Laboratory, Richland, WA, USA 99352
Katherine T. Peter
13. National Institute of Standards and Technology, Charleston, SC, USA 29412
Allison L. Phillips
14. U.S. Environmental Protection Agency, Office of Research and Development, Center for Public Health and Environmental Assessment, Research Triangle Park, NC, USA 27711.
Ryan renslow, jon r. sobus, eric m. sussman, benedikt warth.
15. Department of Food Chemistry and Toxicology, Faculty of Chemistry, University of Vienna, 1090 Vienna, Austria
Antony j. williams.
Non-targeted analysis (NTA) encompasses a rapidly evolving set of mass spectrometry techniques aimed at characterizing the chemical composition of complex samples, identifying unknown compounds, and/or classifying samples, without prior knowledge regarding the chemical content of the samples. Recent advances in NTA are the result of improved and more accessible instrumentation for data generation, and analysis tools for data evaluation and interpretation. As researchers continue to develop NTA approaches in various scientific fields, there is a growing need to identify, disseminate, and adopt community-wide method reporting guidelines. In 2018, NTA researchers formed the Benchmarking and Publications for Non-Targeted Analysis Working Group (BP4NTA) to address this need. Consisting of participants from around the world and representing fields ranging from environmental science and food chemistry to ‘omics and toxicology, BP4NTA provides resources addressing a variety of challenges associated with NTA. Thus far, BP4NTA group members have aimed to establish consensus on NTA-related terms and concepts, and to create consistency in reporting practices by providing resources on a public website, including consensus definitions, reference content, and lists of available tools. Moving forward, BP4NTA will provide a setting for NTA researchers to continue discussing emerging challenges and contribute to additional harmonization efforts.
Non-targeted analysis (NTA), also referred to as “non-target screening” and “untargeted screening,” among several other related terms, is a theoretical concept that can be broadly defined as the characterization of the chemical composition of any given sample without the use of a priori knowledge regarding the sample’s chemical content. Some NTA experiments focus on the discovery of unknown chemicals using first principles and careful evaluation of experimental data. Other NTA experiments (often termed “suspect screening analyses”) aim to rapidly identify known chemicals using suspect lists with experimental data (e.g., reference spectra and metadata). Yet others aim to classify samples using detected chemical profiles (containing both unknown and identified chemicals). For the purposes of this Perspective, we will focus on non-targeted analysis conducted with gas or liquid chromatography coupled to mass spectrometry, with high resolution mass spectrometers being the most commonly used instrumentation. Applications of NTA include, but are not limited to, analysis of naturally occurring materials, 1 manufactured chemicals and materials, 2 manufactured consumer products (e.g., drugs 3 and their interaction with environmental exposures in precision medicine, 4 medical devices, 5 and tobacco 6 ), environmental media, 7 food, 8 and biological samples. 9 A harmonized NTA framework will help facilitate various research objectives, including discovery, 10 forensics, 11 hazard-based prioritization, 12 and support of regulatory decision-making. 13
Interest in NTA applications has grown rapidly in response to the commercial availability of advanced mass spectrometers and novel data analysis tools, as evidenced by the steady increase in published studies on this topic over the past two decades ( Figure 1 ). Accompanying its rapid development, there is growing recognition of the need to develop approaches and methodologies to both promote and assess the quality and confidence of NTA results. For example, Hites and Jobst recently wrote that “the criteria for reproducibility routinely applied to quantitative analyses are not as well defined for non-targeted screening,” and noted that more evaluation is necessary to assess the accuracy, precision, sensitivity, selectivity, and reproducibility of NTA. 14
Google Scholar trend analysis with two sets of search terms: 1) “nontarget analysis” OR “nontargeted analysis” OR “non-targeted analysis” OR “non-targeted screening” OR “nontargeted screening” AND “mass spectrometry” (bottom; blue striped) and 2) “untargeted screening” OR “untargeted analysis” AND “mass spectrometry”(top; orange solid). The search analysis was performed on February 8 th , 2021.
For targeted, quantitative analysis, there are existing guidelines for detection and quantification that are accepted by the research community. Compared to traditional targeted analysis of specific compounds, NTA is an emerging field with a lack of standardization and limited capabilities to assess and communicate performance, both of which impede the broader implementation and acceptance of NTA data. Meaningful evaluation of an NTA study’s performance is predicated on the existence of harmonized terminology and clear guidance about best practices for analysis and reporting results. Therefore, harmonized NTA guidance is imperative to promote high quality data and allow inter-study comparisons. Measuring how well a method identifies chemicals from the vast chemical universe is challenging; accurately communicating performance is arguably more difficult, since varying levels of identification confidence can be assigned. 15 Moreover, in contrast to targeted methods, the presence or absence of individual compounds across a sample set is difficult to validate, limiting evaluation using a traditional confusion matrix (i.e., true/false positives/negatives) and associated performance metrics (e.g., precision, accuracy, false discovery rate). There are examples of NTA applications, such as sample classification, where sample classes can be clearly bounded and therefore confusion matrices and other performance metrics are calculable. 16 However, there are still challenges associated with developing robust data analysis strategies and quality assurance/quality control (QA/QC) models for all types of NTA studies. 17 For example, variability in NTA results can be artifacts driven by the selected preparation techniques, instrumentation, software, and user settings, rather than true sample differences, making it difficult to compare methods and results between instruments and/or laboratories. 18 Although some recent studies have developed and implemented sample QA/QC procedures, there are no generally accepted community-wide QA/QC guidelines for NTA performance evaluation. 19 – 22
Previous efforts by other professional organizations in related research fields (e.g., metabolomics, 23 – 25 mass spectral data generation, 26 and non-targeted screening of water 27 ) have established guidelines and protocols for reporting results and determining method performance. Alternative methods of quantifying NTA method performance have been proposed by various researchers 28 – 31 but have not been widely adopted by research communities. Many government agencies have provided NTA guidelines for reporting compound identification. 32 – 34 However, it is challenging to establish guidelines generalizable to all NTA studies, as they can be specific to certain methods, matrices, and/or analytes. There remains a need for consistent definitions, as well as broadly applicable guidance for creating NTA studies and reporting method performance. Easily accessible, centralized recommendations and their widespread adoption are critical to enable accurate and reliable reporting and will facilitate the implementation of NTA beyond the research community.
Formation of the Working Group
In August 2018, the U.S. Environmental Protection Agency (EPA) convened a meeting to discuss the interim results from EPA’s Non-Targeted Analysis Collaborative Trial (ENTACT). 35 During the workshop, participants engaged in breakout discussions around several topic areas. Two topics (“Publication Issues” and “Proficiency Testing/Reference Methods”) merged into one discussion group, with members continuing to meet on a monthly basis. The group, now called the Benchmarking and Publications for Non-Targeted Analysis Working Group, or BP4NTA, currently has 100 North American and European members (February 2021). Members hail from three employment sectors, with about 25% from academia and industry, each, and 50% from government. Membership is voluntary with no formalized process except by written expressed interest through contacting group members or via the website. The group has garnered interest through presentations at relevant scientific conferences, including the American Society for Mass Spectrometry (ASMS), American Chemical Society (ACS), Society for Environmental Toxicology and Chemistry (SETAC), and Pittcon Conference & Expo, and by individual members’ research networks.
The overarching goals for BP4NTA are directly related to the community’s needs to:
- harmonize and/or standardize approaches and reporting practices, to the degree possible and practical;
- improve determination, calculation, and communication of performance metrics;
- share best practices (including QA/QC) within the NTA community; and
- improve the transparency and reproducibility of peer reviewed NTA studies.
During the workshop, and in subsequent meetings, the working group established short-term and long-term goals to address these needs. The short-term goals include:
- publishing a list of commonly used NTA terms, concepts, and performance calculations, with accompanying consensus definitions;
- designing and releasing a public study reporting tool to aid the design of NTA studies and the review of research proposals and manuscripts; and
- collating resources for new NTA researchers traversing the learning curve.
Long term goals for the working group include continuing to address gaps in data, methods, and computational tools within the community and moving the NTA field toward measurable standards for proficiency testing of non-targeted analytical laboratories. To facilitate these aims, the group plans to continuously build and maintain coalitions and communications with other groups that have similar interests, such as the NORMAN Network [ https://www.norman-network.net/ ], the Metabolomics QA & QC Consortium [mQACC; https://epi.grants.cancer.gov/Consortia/mQACC/ ], and the Metabolomics Standards Initiative in Toxicology [MERIT; https://doi.org/10.1038/s41467-019-10900-y ].
Development of NTA Reference Content and Tools
To achieve the short-term goals conceived by the first BP4NTA meeting, the members focused on the production of four primary deliverables:
- a glossary of useful NTA-relevant terms, based on current literature and personal NTA experience;
- reference content to support concepts mentioned in the definitions;
- an interactive tool to aid researchers in consistently and transparently reporting NTA methods and results; and
- a public website for novice and veteran NTA researchers to access the products described above and to gain knowledge of current and emerging NTA concepts.
In this paper, overviews of these four products are provided, although we refer readers to a concurrent manuscript on the Study Reporting Tool (the third deliverable) 37 for additional details, and the website (the fourth deliverable) 36 , which can be accessed directly. The formats of these deliverables were designed to be dynamic and evolve with feedback and technological advances from the NTA community.
To support a clear communication of methods and results by NTA researchers, reviewers, and journal editors, the members identified a collection of NTA terms for which harmonized definitions were needed. Arriving at consensus definitions proved challenging (even the term “non-targeted analysis” itself is often debated within the community, see Figure 1 caption), further demonstrating the need for harmonization. In total, the group defined 34 high-level terms that span all aspects of NTA workflows including sample collection/preparation, data acquisition/processing, results, and performance assessment. A subset of these terms is listed in Table 1 . The final product of this effort is a comprehensive glossary of clear and concise definitions that have been reviewed by the working group and can be found on the BP4NTA website. 36 The BP4NTA definitions and reference content (subsequently described) are applicable to a variety of research fields. While experimental aspects such as sample types, complexity, and data quality may vary, the underlying framework is similar.
A subset of consensus definitions determined by BP4NTA using current literature and NTA expertise.
NTA Reference Content.
To support the use of cohesive terminology across diverse NTA research groups, the working group developed reference content that addresses key study design considerations and supplements the consensus definitions. The reference content was organized according to the chronology of a typical NTA study. For each section, working group members gathered and distilled relevant literature to describe current, published NTA practices. In addition, recommendations regarding good practices for performing and reporting NTA research are offered. For example, the Study Design section includes recommendations for describing and using blanks and quality control samples to enable assessment of background detections and analytical performance. Likewise, the Data Processing & Analysis section provides detailed recommendations on method aspects that, if reported, should promote reproducibility, including software tools, versions, and settings used to detect features and a comprehensive description (e.g., size, content, access date) of the mass spectral database(s) used for annotating features.
Study Reporting Tool.
The working group also developed the NTA Study Reporting Tool (SRT), a stand-alone, interactive tool for assessing the quality of reporting in NTA studies. The SRT was created to help: (1) current NTA researchers report their study methods and results in a consistent manner, and (2) reviewers and editors consistently and rigorously evaluate the content of NTA proposals and research manuscripts. The SRT was organized in the same structure as the reference content. Additionally, the SRT can be used to guide study design or as an educational framework for less-experienced NTA researchers. By aligning the organizational structure of the SRT and the reference content described above, researchers can readily use the two resources in tandem. The SRT is further described and evaluated by Peter, Phillips, et al. 37 in a concurrent manuscript.
For the aforementioned information and tools to be useful, they must be widely available and amenable to updates given the rapid advancements in NTA technology. Therefore, the BP4NTA website [ https://nontargetedanalysis.org/ ] was developed ( Figure 2 ), 36 to provide public access to the glossary, reference content, SRT, and other NTA-related information without requirements for working group membership. In the online reference content, users can click on the organizational headers (identical to those in the SRT) to directly access detailed reference content on the relevant topic. The SRT webpage includes downloadable versions of the interactive tool and portals for community feedback. Beyond these materials, the website contains an extensive list of external NTA resources (including NTA software tools and online databases), a page for NTA job opportunities, information about the membership and history of the BP4NTA group, and the BP4NTA blog (which contains occasional updates on NTA-relevant news and publications). In addition to its accessibility, the website format allows ongoing updates to BP4NTA materials by members of the working group as the field evolves and provides a more interactive user experience than a single, static document. We believe this compilation of online resources will be extremely helpful for both beginning and experienced NTA researchers, as well as journal editors and peer reviewers who encounter manuscripts and proposals that include NTA.
Screen shot of the Additional Resources/Software Tools page of the BP4NTA website 36 showing the website’s main menu (top) and a list of NTA software that is available for NTA researchers to explore (access date: 11/10/2021).
Challenges, Recommendations, and Outlook
The resources developed thus far by BP4NTA provide valuable information for experienced and novice NTA researchers alike. As new NTA methods (both analytical and computational) are developed, it is critical to assess their performance against current best practices. Such assessments can require running samples on multiple instruments or through multiple data analysis pipelines, yielding both time and cost considerations. To meet this challenge, the community should push for (1) open sharing of raw NTA data so new computational approaches can be compared using identical data, (2) additional assessment studies of blinded samples with known composition (akin to ENTACT 39 , 40 and CASMI 41 ), (3) adoption of common reporting formats and transparent sharing of metadata, (4) adoption of common QA/QC practices, and (5) the development of comparison tools to support automated performance assessment.
Additionally, the NTA community would benefit from a better understanding of the relationship between identification confidence and false discovery rates or the probability of presence. However, to date, there has not been a rigorous assessment to correlate these concepts. To help meet this challenge, the community should (1) openly report the analytical, computational, and subject-matter expert based evidence used for each compound identification, (2) determine methods to directly calculate false discovery rate (FDR) and identification probabilities, and (3) add additional dimensions of evidence that may ultimately lead to feature sets that are unambiguous (i.e., unique to each compound in the entire molecular universe). The community is shifting to address this gap quickly, with decoy libraries and approaches similar to proteomics-inspired FDR assessments now appearing in literature. 28 – 31 Furthermore, the combination of ion mobility spectrometry (including gas-phase chiral separation 42 ) and cryogenic infrared spectroscopy 43 – 45 with traditional LC-MS/MS, as well as microcrystal electron diffraction 46 methods, are emerging tools for providing unique identifiers for molecules.
Moreover, we need to continue developing novel methods for identifying compounds. “Standard-free” identification makes use of computational prediction of molecular properties to build identification libraries without the use of chemical standards. This approach is an improvement on traditional approaches, such as the use of acquired mass spectral libraries, because authentic chemical standards are not available for the vast majority of small molecules. 47 , 48 However, the community has yet to standardize a consistent approach to include in silico libraries as part of identification evidence and to implement these libraries into existing databases. Currently, there is not a clear method to combine information from multiple instruments and multiple libraries into a single identification score or probability. To address this challenge, the community should move to (1) determine the best approach for utilizing this information to bolster the evidence for molecular presence, (2) continually improve predicted property accuracy, and (3) form a deeper understanding of experimental error associated with this evidence (computational or empirical).
Finally, one of the shortcomings of NTA approaches is the inability to provide fully quantitative data (i.e., concentration) for identified compounds, when relative abundance is not sufficient. This is largely due to the absence of authentic standards and methodological compromises necessary to detect a broad swath of chemical space. Accurate semi-quantitative approaches have proven challenging due to the wide disparity in analytical response factors. 49 , 50 Existing studies have focused on using modeling 51 or surrogate compounds (i.e., isotopically labelled spiked compounds) to predict response factors for compounds identified with NTA. 52 , 53 To address this challenge, the community should: (1) develop semiquantitative estimation models or quantitative measurement models, (2) understand the estimation error and determine proper guidelines for reporting how concentration was determined, and (3) standardize the detection limit concept in NTA (e.g., instrument specific limit of detection (LOD) vs. NTA method LOD) when detected compounds are not available as standards.
Because the BP4NTA working group is organized and operated by volunteers, we feel it is wise to fit ourselves within a larger scientific society structure that will allow a mechanism for recruiting efforts and maintenance of membership rolls. While no official collaborative relationship has been established, representatives from BP4NTA have engaged in discussions with other groups (e.g., mQACC, NORMAN, Compound Identification Development Cores (CIDC)), and our efforts are intended to complement ongoing harmonization efforts by other NTA-related organizations.
Representing a significant effort by working group members, the SRT is an important tool that we believe, with widespread adoption, will benefit a variety of roles within the field. To that end, in addition to this manuscript and a thorough evaluation of the SRT, 37 we are reaching out to journal editors to explain the tool, its intent and limitations, and the potential benefit to authors, reviewers, and editors.
Emerging technologies for NTA are routinely developed and there is no intention that the reference content initially presented by this communication or the BP4NTA website will remain static. As the working group receives feedback from the NTA community, and as needs for NTA harmonization evolve, the website will be regularly maintained and updated. To provide input or suggestions, researchers are encouraged to use the form on the website (or the comment box on the SRT page for SRT-specific feedback) or contact the corresponding author of this communication. As NTA techniques advance, it is expected that recommendations for comprehensive reporting will transform accordingly. Therefore, the SRT will remain flexible and evolve with changing technology. The SRT available on the website will be updated annually, although static versions of the document will remain available for download.
While it seems to be a herculean effort, the harmonization of NTA methods and results reporting should have a substantial impact on the quality and outcomes of NTA research. These powerful and ever-advancing capabilities, data processing tools, and molecular databases/libraries will encourage more researchers to enter the field and apply NTA methodologies to unique research problems, making it important to create and share resources such as those from BP4NTA and related groups. While much work remains, the information and tools provided by this group can improve the communication of work being accomplished by the community. Through our efforts, we hope to empower researchers with tools and knowledge for presenting reproducible, transparent, and impactful NTA research.
We acknowledge partial support from the Connecticut Agricultural Experiment Station, and specifically USDA NIFA Hatch funds (CONH00789), which covered the cost of website hosting and the domain name for the first year, and supported contributions from author S.L.N. We thank all past and present members of BP4NTA for their energy and efforts in bringing these deliverables to fruition, in particular Charlie Lowe and Natalia Quinete for their complete review of the reference content. This work was performed while the author K.T.P. held a National Research Council Associateship award with the National Institute of Standards and Technology. J.K.C. conducted this work while holding a Banting Postdoctoral Fellowship from the Natural Sciences and Engineering Research Council of Canada.
The views expressed in this article are those of the author(s) and do not necessarily represent the views or policies of the U.S. Environmental Protection Agency, the National Institute of Standards & Technology, and the U.S. Food and Drug Administration. The mention of commercial products, their sources, or their use in connection with material reported herein is not to be construed as either an actual or implied endorsement of such products.