statistical methods for case control studies

  • Science & Math
  • Biological Sciences

Amazon prime logo

Enjoy fast, free delivery, exclusive deals, and award-winning movies & TV shows with Prime Try Prime and start saving today with fast, free delivery

Amazon Prime includes:

Fast, FREE Delivery is available to Prime members. To join, select "Try Amazon Prime and start saving today with Fast, FREE Delivery" below the Add to Cart button.

  • Cardmembers earn 5% Back at Amazon.com with a Prime Credit Card.
  • Unlimited Free Two-Day Delivery
  • Streaming of thousands of movies and TV shows with limited ads on Prime Video.
  • A Kindle book to borrow for free each month - with no due dates
  • Listen to over 2 million songs and hundreds of playlists
  • Unlimited photo storage with anywhere access

Important:  Your credit card will NOT be charged when you start your free trial or if you cancel during the trial period. If you're happy with Amazon Prime, do nothing. At the end of the free trial, your membership will automatically upgrade to a monthly membership.

Buy new: $140.00 $140.00 FREE delivery: March 2 - 4 Ships from: Amazon.com Sold by: Amazon.com

  • Free returns are available for the shipping address you chose. You can return the item for any reason in new and unused condition: no shipping charges
  • Learn more about free returns.
  • Go to your orders and start the return
  • Select the return method

Buy used: $60.00

Kindle app logo image

Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required .

Read instantly on your browser with Kindle for Web.

Using your mobile phone camera - scan the code below and download the Kindle app.

QR code to download the Kindle App

Image Unavailable

Handbook of Statistical Methods for Case-Control Studies (Chapman & Hall/CRC Handbooks of Modern Statistical Methods)

  • To view this video download Flash Player

statistical methods for case control studies

Handbook of Statistical Methods for Case-Control Studies (Chapman & Hall/CRC Handbooks of Modern Statistical Methods) 1st Edition

Purchase options and add-ons.

  • Book Description
  • Editorial Reviews

Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field. It provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods. The handbook is designed to serve as a reference text for biostatisticians and quantitatively-oriented epidemiologists who are working on the design and analysis of case-control studies or on related statistical methods research. Though not specifically intended as a textbook, it may also be used as a backup reference text for graduate level courses.

Book Sections

  • Classical designs and causal inference, measurement error, power, and small-sample inference
  • Designs that use full-cohort information
  • Time-to-event data
  • Genetic epidemiology

About the Editors

Ørnulf Borgan is Professor of Statistics, University of Oslo. His book with Andersen, Gill and Keiding on counting processes in survival analysis is a world classic.

Norman E. Breslow was, at the time of his death, Professor Emeritus in Biostatistics, University of Washington. For decades, his book with Nick Day has been the authoritative text on case-control methodology.

Nilanjan Chatterjee is Bloomberg Distinguished Professor, Johns Hopkins University. He leads a broad research program in statistical methods for modern large scale biomedical studies.

Mitchell H. Gail is a Senior Investigator at the National Cancer Institute. His research includes modeling absolute risk of disease, intervention trials, and statistical methods for epidemiology.

Alastair Scott was, at the time of his death, Professor Emeritus of Statistics, University of Auckland. He was a major contributor to using survey sampling methods for analyzing case-control data.

Chris J. Wild is Professor of Statistics, University of Auckland. His research includes nonlinear regression and methods for fitting models to response-selective data.

"This book is essential reading and reference for any statistical methodologist with interest in case-control studies...This book is a very good place to start on the next leg of our statistical journey in this field." ~Nicholas P. Jewell , ISCB Newsletter

" . . . as a handbook, it is designed to address specific methodological issues, more like a toolbox. And this is done well. All chapters come with an introduction and a worked example using sample data, with ample reference to further details. Occasional chapters on unconventional study designs provide food for thought. Overall, the book is well written and very comprehensive; it provides help for many situations, and for situations of greater complexity it points to further references." ~Anika Hüsing, Biometrical Journal

About the Author

  • ISBN-10 149876858X
  • ISBN-13 978-1498768580
  • Edition 1st
  • Publisher Chapman and Hall/CRC
  • Publication date July 2, 2018
  • Part of series Chapman & Hall/CRC Handbooks of Modern Statistical Methods
  • Language English
  • Dimensions 7.99 x 10 x 1.85 inches
  • Print length 536 pages
  • See all details

Amazon First Reads | Editors' picks at exclusive prices

Product details

  • Publisher ‏ : ‎ Chapman and Hall/CRC; 1st edition (July 2, 2018)
  • Language ‏ : ‎ English
  • Hardcover ‏ : ‎ 536 pages
  • ISBN-10 ‏ : ‎ 149876858X
  • ISBN-13 ‏ : ‎ 978-1498768580
  • Item Weight ‏ : ‎ 2.7 pounds
  • Dimensions ‏ : ‎ 7.99 x 10 x 1.85 inches
  • #622 in Biostatistics (Books)
  • #945 in Epidemiology (Books)

Important information

To report an issue with this product or seller, click here .

Customer reviews

Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.

To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.

  • Sort reviews by Top reviews Most recent Top reviews

Top review from the United States

There was a problem filtering reviews right now. please try again later..

statistical methods for case control studies

  • Amazon Newsletter
  • About Amazon
  • Accessibility
  • Sustainability
  • Press Center
  • Investor Relations
  • Amazon Devices
  • Amazon Science
  • Start Selling with Amazon
  • Sell apps on Amazon
  • Supply to Amazon
  • Protect & Build Your Brand
  • Become an Affiliate
  • Become a Delivery Driver
  • Start a Package Delivery Business
  • Advertise Your Products
  • Self-Publish with Us
  • Host an Amazon Hub
  • › See More Ways to Make Money
  • Amazon Visa
  • Amazon Store Card
  • Amazon Secured Card
  • Amazon Business Card
  • Shop with Points
  • Credit Card Marketplace
  • Reload Your Balance
  • Amazon Currency Converter
  • Your Account
  • Your Orders
  • Shipping Rates & Policies
  • Amazon Prime
  • Returns & Replacements
  • Manage Your Content and Devices
  • Recalls and Product Safety Alerts
  • Conditions of Use
  • Privacy Notice
  • Your Ads Privacy Choices
  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Analysis of matched...

Analysis of matched case-control studies

  • Related content
  • Peer review
  • Neil Pearce , professor 1 2
  • 1 Department of Medical Statistics and Centre for Global NCDs, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London WC1E 7HT, UK
  • 2 Centre for Public Health Research, Massey University, Wellington, New Zealand
  • neil.pearce{at}lshtm.ac.uk
  • Accepted 30 December 2015

There are two common misconceptions about case-control studies: that matching in itself eliminates (controls) confounding by the matching factors, and that if matching has been performed, then a “matched analysis” is required. However, matching in a case-control study does not control for confounding by the matching factors; in fact it can introduce confounding by the matching factors even when it did not exist in the source population. Thus, a matched design may require controlling for the matching factors in the analysis. However, it is not the case that a matched design requires a matched analysis. Provided that there are no problems of sparse data, control for the matching factors can be obtained, with no loss of validity and a possible increase in precision, using a “standard” (unconditional) analysis, and a “matched” (conditional) analysis may not be required or appropriate.

Summary points

Matching in a case-control study does not control for confounding by the matching factors

A matched design may require controlling for the matching factors in the analysis

However, it is not the case that a matched design requires a matched analysis

A “standard” (unconditional) analysis may be most valid and appropriate, and a “matched” (conditional) analysis may not be required or appropriate

Matching on factors such as age and sex is commonly used in case-control studies. 1 This can be done for convenience (eg, choosing a control admitted to hospital on the same day as the case), to improve study efficiency by improving precision (under certain conditions) when controlling for the matching factors (eg, age, sex) in the analysis, or to enable control in the analysis of unquantifiable factors such as neighbourhood characteristics (eg, by choosing neighbours as controls and then controlling for neighbourhood in the analysis). The increase in efficiency occurs because it ensures similar numbers of cases and controls in confounder strata. For example, in a study of lung cancer, if controls are sampled at random from the source population, their age distribution will be much younger than that of the lung cancer cases. Thus, when age is controlled in the analysis, the young age stratum may contain mostly controls and few cases, whereas the old age stratum may contain mostly cases and fewer controls. Thus, statistical precision may be improved if controls are age matched to ensure roughly equal numbers of cases and controls in each age stratum.

There are two common misconceptions about case-control studies: that matching in itself eliminates confounding by the matching factors; and that if matching has been performed, then a “matched analysis” is required.

Matching in the design does not control for confounding by the matching factors. In fact, it can introduce confounding by the matching factors even when it did not exist in the source population. 1 The reasons for this are complex and will only be discussed briefly here. In essence, the matching process makes the controls more similar to the cases not only for the matching factor but also for the exposure itself. This introduces a bias that needs to be controlled in the analysis. For example, suppose we were conducting a case-control study of poverty and death (from any cause), and we chose siblings as controls (that is, for each person who died, we matched on family or residence by choosing a sibling who was still alive as a control). In this situation, since poverty runs in families we would tend to select a disadvantaged control for each disadvantaged person who had died and a wealthy control for each wealthy person who had died. We would find roughly equal percentages of disadvantaged people among the cases and controls, and we would find little association between poverty and mortality. The matching has introduced a bias, which fortunately (as we will illustrate) can be controlled by controlling for the matching factor in the analysis.

Thus, a matched design will (almost always) require controlling for the matching factors in the analysis. However, this does not necessarily mean that a matched analysis is required or appropriate, and it will often be sufficient to control for the matching factors using simpler methods. Although this is well recognised in both recent 2 3 and historical 4 5 texts, other texts 6 7 8 9 do not discuss this issue and present the matched analysis as the only option for analysing matched case-control studies. In fact, the more standard analysis may not only be valid but may be much easier in practice, and yield better statistical precision.

In this paper I explore and illustrate these problems using a hypothetical pair matched case-control study.

Options for analysing case-control studies

Unmatched case-control studies are typically analysed using the Mantel-Haenszel method 10 or unconditional logistic regression. 4 The former involves the familiar method of producing a 2×2 (exposure-disease) stratum for each level of the confounder (eg, if there are five age groups and two sex groups, then there will be 10 2×2 tables, each showing the association between exposure and disease within a particular stratum), and then producing a summary (average) effect across the strata. The Mantel-Haenszel estimates are robust and not affected by small numbers in specific strata (provided that the overall numbers of exposed or non-exposed cases or controls are adequate), although it can be difficult or impossible to control for factors other than the matching factors if some strata involve small numbers (eg, just one case and one control). Furthermore, the Mantel-Haenszel approach works well when there are only a few confounder strata, but will experience problems of small numbers (eg, strata with only cases and no controls) if there are too many confounders to adjust for. In this situation, logistic regression may be preferred, since this uses maximum likelihood methods, which enable the adjustment (given certain assumptions) of more confounders.

Suppose that for each case we have chosen a control who is in the same five year age group (eg, if the case is aged 47 years, then a control is chosen who is aged 45-49 years). We can then perform a standard analysis, which adjusts for the matching factor (age group) by grouping all cases and controls into five year age groups and using unconditional logistic regression 4 (or the Mantel-Haenszel method 10 ); if there are eight age groups then this analysis will just have eight strata (represented by seven age group dummy variables), each with multiple cases and controls. Alternatively we can perform a matched analysis (that is, retaining the pair matching of one control for each case) using conditional logistic regression (or the matched data methods, which are equivalent to the Mantel-Haenszel method); if there are 100 case-control pairs, this analysis will then have 100 strata.

The main reason for using conditional (rather than unconditional) logistic regression is that when the analysis strata are very small (eg, with just one case and one control for each stratum), problems of sparse data will occur with unconditional methods. 11 For example, if there are 100 strata, this requires 99 dummy variables to represent them, even though there are only 200 study participants. In this extreme situation, unconditional logistic regression is biased and produces an odds ratio estimate that is the square of the conditional (true) estimate of the odds ratio. 5 12

Example of age matching

Table 1 ⇓ gives an example of age matching in a population based case-control study, and shows the “true’ findings for the total population, the findings for the corresponding unmatched case-control study, and the findings for an age matched case-control study using the standard analysis. Table 2 ⇓ presents the findings for the same age matched case-control study using the matched analysis. All analyses were performed using the Mantel-Haenszel method, but this yields similar results to the corresponding (unconditional or conditional) logistic regression analyses.

Hypothetical study population and case-control study with unmatched and matched standard analyses

  • View inline

Hypothetical matched case-control study with matched analysis

Table 1 ⇑ shows that the crude odds ratio in the total population is 0.86 (0.70 to 1.05), but this changes to 2.00 (1.59 to 2.51) when the analysis is adjusted for age (using the Mantel-Haenszel method). This occurs because there is strong confounding by age—the cases are mostly old, and old people have a lower exposure than young people. Overall, there are 390 cases, and when 390 controls are selected at random from the non-cases in the total population (which is half exposed and half not exposed), this yields the same crude (0.86) and adjusted (2.00) odds ratios, but with wider confidence intervals, reflecting the smaller numbers of non-cases (controls) in the case-control study.

Why matching factors need to be controlled in the analysis

Now suppose that we reconduct the case-control study, matching for age, using two very broad age groups: old and young (table 1 ⇑ ). The number of cases and controls in each age group are now equal. However, the crude odds ratio (1.68, 1.25 to 2.24) is different from both the crude (0.86) and the adjusted (2.00) odds ratios in the total population. In contrast, the adjusted odds ratio (2.00) is the same as that in the total population and in the unmatched case-control study (both of these adjusted odds ratios were estimated using the standard approach). Thus, matching has not removed age confounding and it is still necessary to control for age (this occurs because the matching process in a case-control study changes the association between the matching factor and the outcome and can create an association even if there were none before the matching was conducted). However, there is a small increase in precision in the matched case-control study compared with the unmatched case-control studies (95% confidence intervals of 1.42 to 2.81 compared with 1.38 to 2.89) because there are now equal numbers of cases and controls in each age group (table 1 ⇑ ).

A pair matched study does not necessarily require a pair matched analysis

However, control for simple matching factors such as age does not require a pair matched analysis. Table 2 ⇑ gives the findings that would have been obtained from a pair matched analysis (this is created by assuming that in each age group, and for each case, the control was selected at random from all non-cases in the same age group). The standard adjusted (Mantel-Haenszel) analysis (table 1 ⇑ ) yields an odds ratio of 2.00 (95% confidence interval 1.42 to 2.81); the matched analysis (table 2 ⇑ ) yields the same odds ratio (2.00) but with a slightly wider confidence interval (1.40 to 2.89).

Advantages of the standard analysis

So for many matched case-control studies, we have a choice of doing a standard analysis or a matched analysis. In this situation, there are several possible advantages of using the standard approach.

The standard analysis can actually yield slightly better statistical precision. 13 This may apply, for example, if two or more cases and their matched controls all have identical values for their matching factors; then combining them into a single stratum produces an estimator with lower variance and no less validity 14 (as indicated by the slightly narrower confidence interval for the standard adjusted analysis (table 1 ⇑ ) compared with the pair matched analysis (table 2 ⇑ ). This particularly occurs because combining strata with identical values for the matching factors (eg, if two case-control pairs all concern women aged 55-59 years) may mean that fewer data are discarded (that is, do not contribute to the analysis) because of strata where the case and control have the same exposure status. Further gains in precision may be obtained if combining strata means that cases with no corresponding control (or controls without a corresponding case) can be included in the analysis. When such strata are combined, a conditional analysis may still be required if the resulting strata are still “small,” 13 but an unconditional analysis will be valid and yield similar findings if the resulting strata are sufficiently large. This may often be the case when matching has only been performed on standard factors such as sex and age group.

The standard analysis may also enhance the clarity of the presentation, particularly when analysing subgroups of cases and controls selected for variables on which they were not matched, since it involves standard 2×2 tables for each subgroup. 15

A further advantage of the standard analysis is that it makes it easier to combine different datasets that have involved matching on different factors (eg, if some have matched for age, some for age and sex, and some for nothing, then all can be combined in an analysis adjusting for age, sex, and study centre). In contrast, one multicentre study 16 (of which I happened to be a coauthor) attempted to (unnecessarily) perform a matched analysis across centres. Because not all centres had used pair matching, this involved retrospective pair matching in those centres that had not matched as part of the study design. This resulted in the unnecessary discarding of the unmatched controls, thus resulting in a likely loss of precision.

Conclusions

If matching is carried out on a particular factor such as age in a case-control study, then controlling for it in the analysis must be considered. This control should involve just as much precision as was used in the original matching 14 (eg, if exact age in years was used in the matching, then exact age in years should be controlled for in the analysis), although in practice such rigorous precision may not always be required (eg, five year age groups may suffice to control confounding by age, even if age matching was done more precisely than this). In some circumstances, this control may make no difference to the main exposure effect estimate—eg, if the matching factor is unrelated to exposure. However, if there is an association between the matching factor and the exposure, then matching will introduce confounding that needs to be controlled for in the analysis.

So when is a pair matched analysis required? The answer is, when the matching was genuinely at (or close to) the individual level. For example, if siblings have been chosen as controls, then each stratum would have just one case and the sibling control; in this situation, an unconditional logistic regression analysis would suffer from problems of sparse data, and conditional logistic regression would be required. Similar situations might arise if controls were neighbours or from the same general practice (if each general practice only had one or a few cases), or if matching was performed on many factors simultaneously so that most strata (in the standard analysis) had just one case and one control.

Provided, however, that there are no problems of sparse data, such control for the matching factors can be obtained using an unconditional analysis, with no loss of validity and a possible increase in precision.

Thus, a matched design will (nearly always) require controlling for the matching factors in the analysis. It is not the case, however, that a matched design requires a matched analysis.

I thank Simon Cousens, Deborah Lawlor, Lorenzo Richiardi, and Jan Vandenbroucke for their comments on the draft manuscript. The Centre for Global NCDs is supported by the Wellcome Trust Institutional Strategic Support Fund, 097834/Z/11/B.

Competing interests: I have read and understood the BMJ policy on declaration of interests and declare the following: none.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/ .

  • ↵ Rothman KJ, Greenland S, Lash TL, eds Design strategies to improve study accuracy. Modern epidemiology. 3rd ed . Lippincott Williams & Wilkins,  2008 .
  • ↵ Rothman KJ. Epidemiology: an introduction. Oxford University Press,  2012 .
  • ↵ Rothman KJ, Greenland S, Lash TL, eds. Modern epidemiology. 3rd ed . Lippincott Williams & Wilkins,  2008 .
  • ↵ Breslow NE, Day NE. Statistical methods in cancer research. Vol I: the analysis of case-control studies. IARC,  1980 .
  • ↵ Kleinbaum DG, Kupper LL, Morgenstern H. Epidemiologic research: principles and quantitative methods. Lifetime Learning Publications,  1982 .
  • ↵ Dos Santos Silva I. Cancer epidemiology: principles and methods. IARC,  1999 .
  • ↵ Keogh RH, Cox DR. Case-control studies. Cambridge University Press,  2014 doi:10.1017/CBO9781139094757 . .
  • ↵ Lilienfeld DE, Stolley PD. Foundations of epidemiology. 3rd ed . Oxford University Press,  1994 .
  • ↵ MacMahon B, Trichopolous D. Epidemiology: principles and methods. 2nd ed . Little Brown,  1996 .
  • ↵ Mantel N, Haenszel W. Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst  1959 ; 22 :719- 48 . 13655060 .
  • ↵ Robins J, Greenland S, Breslow NE. A general estimator for the variance of the Mantel-Haenszel odds ratio. Am J Epidemiol  1986 ; 124 :719- 23 . 3766505 .
  • ↵ Pike MC, Hill AP, Smith PG. Bias and efficiency in logistic analyses of stratified case-control studies. Int J Epidemiol  1980 ; 9 :89- 95 . doi:10.1093/ije/9.1.89 .  7419334 .
  • ↵ Brookmeyer R, Liang KY, Linet M. Matched case-control designs and overmatched analyses. Am J Epidemiol  1986 ; 124 :693- 701 . 3752063 .
  • ↵ Greenland S. Applications of stratified analysis methods. In: Rothman KJ, Greenland S, Lash TL, eds. Modern epidemiology. 3rd ed . Lippincott Williams & Wilkins,  2008 .
  • ↵ Vandenbroucke JP, Koster T, Briët E, Reitsma PH, Bertina RM, Rosendaal FR. Increased risk of venous thrombosis in oral-contraceptive users who are carriers of factor V Leiden mutation. Lancet  1994 ; 344 :1453- 7 . doi:10.1016/S0140-6736(94)90286-0 .  7968118 .
  • ↵ Cardis E, Richardson L, Deltour I, et al. The INTERPHONE study: design, epidemiological methods, and description of the study population. Eur J Epidemiol  2007 ; 22 :647- 64 . doi:10.1007/s10654-007-9152-z .  17636416 .
  • Mansournia MA, Hernán MA, Greenland S. Matched designs and causal diagrams. Int J Epidemiol  2013 ; 42 :860- 9 . doi:10.1093/ije/dyt083 .  23918854 .
  • Hernán MA, Hernández-Díaz S, Robins JM. A structural approach to selection bias. Epidemiology  2004 ; 15 :615- 25 . doi:10.1097/01.ede.0000135174.63482.43 .  15308962 .

statistical methods for case control studies

Handbook of Statistical Methods for Case-Control Studies

The Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field and published by Chapman & Hall/CRC Press (2018). The handbook provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods. The handbook is designed to serve as a reference text for biostatisticians and quantitatively-oriented epidemiologists who are working on the design and analysis of case-control studies or on related statistical methods research.

This website provides supplementary materials for some of the chapters of the handbook. The website is maintained by Ørnulf Borgan (email: [email protected]).

statistical methods for case control studies

SUPPLEMENTARY MATERIALS 

Chapter 8: Small Sample Methods

Chapter 12: Multi-Phase Sampling

Chapter 13: Calibration in Case-Control Studies

Chapter 17: Survival Analysis of Case-Control Data: A Sample Survey Approach

Chapter 18: Nested Case-Control Studies: A Counting Process Approach

Chapter 19: Inverse Probability Weighting in Nested Case-Control Studies

Chapter 20: Multiple Imputation for Sampled Cohort Data

Chapter 21: Maximum Likelihood Estimation for Case-Cohort and Nested Case-Control Studies

Chapter 22: The Self-Controlled Case Series Method

  • Share on Facebook
  • Share on Twitter

pubrica academy logo

Statistical analyses of case-control studies

statistical methods for case control studies

How Evidence-based practice (EBP) can be translated as health communication or patient education materials

statistical methods for case control studies

How to evaluate bias in meta-analysis within meta-epidemiological studies?

Introduction.

A case-control study is used to see if exposure is linked to a certain result (i.e., disease or condition of interest). Case-control research is always retrospective by definition since it starts with a result and then goes back to look at exposures. The investigator already knows the result of each participant when they are enrolled in their separate groups. Case-control studies are retrospective because of this, not because the investigator frequently uses previously gathered data. This article discusses statistical analysis in case-control studies.

Advantages and Disadvantages of Case-Control Studies

statistical methods for case control studies

Study Design

Participants in a case-control study are chosen for the study depending on their outcome status. As a result, some individuals have the desired outcome (referred to as cases), while others do not have the desired outcome (referred to as controls). After that, the investigator evaluates the exposure in both groups. As a result, in case-control research , the outcome must occur in at least some individuals. Thus, as shown in Figure 1, some research participants have the outcome, and others do not enrol.

statistical methods for case control studies

Figure 1. Example of a case-control study [1]

Selection of case

The cases should be defined as precisely as feasible by the investigator. A disease’s definition may be based on many criteria at times; hence, all aspects should be fully specified in the case definition.

Selection of a control

Controls that are comparable to the cases in a variety of ways should be chosen. The matching criteria are the parameters (e.g., age, sex, and hospitalization time) used to establish how controls and cases should be similar. For instance, it would be unfair to compare patients with elective intraocular surgery to a group of controls with traumatic corneal lacerations. Another key feature of a case-control study is that the exposure in both cases and controls should be measured equally.

Though some controls have to be similar to cases in many respects, it is possible to over-match. Over-matching might make it harder to identify enough controls. Furthermore, once a matching variable is chosen, it cannot be analyzed as a risk factor. Enrolling more than one control for each case is an effective method for increasing the power of research. However, incorporating more than two controls per instance adds little statistical value.

Data collection

Decide on the data to be gathered after precisely identifying the cases and controls; both groups must have the same data obtained in the same method. If the search for primary risk variables is not conducted objectively, the study may suffer from researcher bias, especially because the conclusion is already known. It’s crucial to try to hide the outcome from the person collecting risk factor data or interviewing patients, even if it’s not always practicable. Patients may be asked questions concerning historical issues (such as smoking history, food, usage of conventional eye medications, and so on). For some people, precisely recalling all of this information may be challenging.

Furthermore, patients who get the result (cases) are more likely to recall specifics of unfavourable experiences than controls. Recall bias is a term for this phenomenon. Any effort made by the researcher to reduce this form of bias would benefit the research.

The frequency of each of the measured variables in each of the two groups is computed in the analysis. Case-control studies produce the odds ratio to measure the strength of the link between exposure and the outcome. An odds ratio is the ratio of exposure probabilities in the case group to the odds of response in the control group. Calculating a confidence interval for each odds ratio is critical. A confidence interval of 1.0 indicates that the link between the exposure and the result might have been discovered by chance alone and that the link is not statistically significant. Without a confidence interval, an odds ratio isn’t particularly useful. Computer programmes are typically used to do these computations. Because no measures are taken in a population-based sample, case-control studies cannot give any information regarding the incidence or prevalence of a disease.

Risk Factors and Sampling

Case-control studies can also be used to investigate risk factors for a rare disease. Cases might be obtained from hospital records. Patients who present to the hospital, on the other hand, may not be typical of the general community. The selection of an appropriate control group may provide challenges. Patients from the same hospital who do not have the result are a common source of controls. However, hospitalized patients may not always reflect the broader population; they are more likely to have health issues and access the healthcare system.

Recent research on case-control studies using statistical analyses

i) R isk factors related to multiple sclerosis in Kuwait

This matched case-control research in Kuwait looked at the relationship between several variables: family history, stressful life events, tobacco smoke exposure, vaccination history, comorbidity, and multiple sclerosis (MS) risk. To accomplish the study’s goal, a matched case-control strategy was used. Cases were recruited from Ibn Sina Hospital’s neurology clinics and the Dasman Diabetes Institute’s MS clinic. Controls were chosen from among Kuwait University’s faculty and students. A generalized questionnaire was used to collect data on socio-demographic, possibly genetic, and environmental aspects from each patient and his/her pair-matched control. Descriptive statistics were produced, including means and standard deviations for quantitative variables and frequencies for qualitative variables. Variables that were substantially (p ≤ 0.15) associated with MS status in the univariable conditional logistic regression analysis were evaluated for inclusion in the final multivariable conditional logistic regression model. In this case-control study, 112 MS patients were invited to participate, and 110 (98.2 %) agreed to participate. Therefore, 110 MS patients and 110 control participants were enlisted, and they were individually matched with cases (1:1) on age (5 years), gender, and nationality (Fig. 1). The findings revealed that having a family history of MS was significantly associated with an increased risk of developing MS. In contrast, vaccination against influenza A and B viruses provided significant protection against MS.

statistical methods for case control studies

Figure 1. Flow chart on the enrollment of the MS cases and controls [1]

ii) Relation between periodontitis and COVID-19 infection

COVID-19 is linked to a higher inflammatory response, which can be deadly. Periodontitis is characterized by systemic inflammation. In Qatar, patients with COVID-19 were chosen from Hamad Medical Corporation’s (HMC) national electronic health data. Patients with COVID-19 problems (death, ICU hospitalizations, or assisted ventilation) were categorized as cases, while COVID-19 patients released without severe difficulties were categorized as controls. There was no control matching because all controls were included in the analysis. Periodontal problems were evaluated using dental radiographs from the same database. The relationships between periodontitis and COVID 19 problems were investigated using logistic regression models adjusted for demographic, medical, and behavioural variables. 258 of the 568 participants had periodontitis. Only 33 of the 310 patients with periodontitis had COVID-19 issues, whereas only 7 of the 310 patients without periodontitis had COVID-19 issues. Table 2 shows the unadjusted and adjusted odds ratios and 95 % confidence intervals for the relationship between periodontitis and COVID-19 problems. Periodontitis was shown to be substantially related to a greater risk of COVID-19 complications, such as ICU admission, the requirement for assisted breathing, and mortality, as well as higher blood levels of indicators connected to a poor COVID-19 outcome, such as D-dimer, WBC, and CRP.

Table 2. Associations between periodontal condition and COVID-19 complications [3]

statistical methods for case control studies

iii) Menstrual, reproductive and hormonal factors and thyroid cancer

The relationships between menstrual, reproductive, and hormonal variables and thyroid cancer incidence in a population of Chinese women were investigated in this study. A 1:1 corresponding hospital-based Case-control study was conducted in 7 counties of Zhejiang Province to investigate the correlations of diabetes mellitus and other variables with thyroid cancer. Case participants were eligible if they were diagnosed with primary thyroid cancer for the first time in a hospital between July 2015 and December 2017. The patients and controls in this research were chosen at random. At enrollment, the interviewer gathered all essential information face-to-face using a customized questionnaire. Descriptive statistics were utilized to characterize the baseline characteristics of female individuals using frequency and percentage. To investigate the connections between the variables and thyroid cancer, univariate conditional logistic regression models were used. We used four multivariable conditional logistic regression models adjusted for variables to investigate the relationships between menstrual, reproductive, and hormonal variables and thyroid cancer. In all, 2937 pairs of participants took part in the case-control research. The findings revealed that a later age at first pregnancy and a longer duration of breastfeeding were substantially linked with a lower occurrence of thyroid cancer, which might shed light on the aetiology, monitoring, and prevention of thyroid cancer in Chinese women [4].

It’s important to note that the term “case-control study” is commonly misunderstood. A case-control study starts with a group of people exposed to something and a comparison group (control group) who have not been exposed to anything and then follows them over time to see what occurs. However, this is not a case-control study. Case-control studies are frequently seen as less valuable since they are retrospective. They can, however, be a highly effective technique of detecting a link between an exposure and a result. In addition, they are sometimes the only ethical approach to research a connection. Case-control studies can provide useful information if definitions, controls, and the possibility for bias are carefully considered.

[1] Setia, Maninder Singh. “Methodology Series Module 2: Case-control Studies.” Indian journal of dermatology vol. 61,2 (2016): 146-51. doi:10.4103/0019-5154.177773

[2] El-Muzaini, H., Akhtar, S. & Alroughani, R. A matched case-control study of risk factors associated with multiple sclerosis in Kuwait. BMC Neurol 20, 64 (2020). https://doi.org/10.1186/s12883-020-01635-1 .

[3] Marouf, Nadya, Wenji Cai, Khalid N. Said, Hanin Daas, Hanan Diab, Venkateswara Rao Chinta, Ali Ait Hssain, Belinda Nicolau, Mariano Sanz, and Faleh Tamimi. “Association between periodontitis and severity of COVID‐19 infection: A case–control study.” Journal of clinical periodontology 48, no. 4 (2021): 483-491.

[4] Wang, Meng, Wei-Wei Gong, Qing-Fang He, Ru-Ying Hu, and Min Yu. “Menstrual, reproductive and hormonal factors and thyroid cancer: a hospital-based case-control study in China.” BMC Women’s Health 21, no. 1 (2021): 1-8.

pubrica-academy

pubrica-academy

Related posts.

statistical methods for case control studies

PUB - Selecting material (e.g. excipient, active pharmaceutical ingredient) for drug development

Selecting material (e.g. excipient, active pharmaceutical ingredient, packaging material) for drug development

statistical methods for case control studies

PUB - Health Economics of Data Modeling

Health economics in clinical trials

statistical methods for case control studies

PUB - Epidemiology designs for clinical trials

Epidemiology designs for clinical trials

Comments are closed.

statistical methods for case control studies

  • WHO Classification of Tumours
  • IAC-IARC-WHO Cytopathology Reporting Systems
  • IARC Monographs on the Identification of Carcinogenic Hazards to Humans
  • IARC Monographs Supplements
  • IARC Scientific Publications
  • IARC Technical Publications
  • IARC Handbooks of Cancer Prevention
  • IARC Working Group Reports
  • IARC Biennial Reports
  • World Cancer Reports
  • The History of IARC
  • IARC Selected Scientific Activities
  • Directories of Agents Being Tested for Carcinogenicity
  • Other Non-Series Publications
  • IARC CancerBases
  • Journal Articles

1.4.SciPub_032.jpg

Statistical Methods in Cancer Research Volume I: The Analysis of Case-Control Studies

Iarc scientific publication no. 32.

Authors: Breslow NE, Day NE

978-92-832-0132-8

Buy Print Book

Download free pdf, other languages, no other languages.

  • About this book
  • Table of contents

The case–control study is the major epidemiological approach used to identify risk factors for cancer. This textbook explains the statistical methods and theory behind this design, and the practical application to specific sets of data. It includes chapters on fundamental measures of disease occurrence, analysis of grouped and ungrouped data, and use of unconditional and conditional logistic regressions.

Book cover

Methoden der Statistik und Informatik in Epidemiologie und Diagnostik pp 97–109 Cite as

Statistical Methods for Cohort and Case-Control Studies

  • N. E. Breslow 2 , 3  
  • Conference paper

94 Accesses

1 Citations

Part of the Medizinische Informatik und Statistik book series (MEDINFO,volume 40)

Traditional methods of occupational cohort analysis have used the standardized mortality ratio (SMR) as the fundamental measure of association between risk factor and disease. The SMR is shown here to result from maximum likelihood estimation in a multiplicative statistical model involving known national death rates. The same model permits regression analysis of variations in the SMR according to the intensity, type, or duration of exposure to environmental agents.

A second method of analysis (COX,1972) results when the underlying death rates are treated as an unknown nuisance function. Case-control sampling from the “risk sets” formed during analysis leads to a third technique which is computationally more efficient than the other two.

All three methods yield roughly equivalent measures of the relative risk of respiratory cancer associated with arsenic trioxide exposure among a cohort of Montana smelter workers. Questions of efficiency, bias and cost in the selection of a method of analysis are discussed.

Research supported in part by USPHS grant 1 K07 CA00723 and the Alexander von Humboldt Foundation

This is a preview of subscription content, log in via an institution .

Buying options

  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Unable to display preview.  Download preview PDF.

Baker RJ and Neider JA (1978). The GLIM System: Release 3, Oxford: Numerical Algorithms Group.

Google Scholar  

Berry G, Gilson JC, Holmes S, Lewisohn HC and Roach SA (1979). Asbestosis: a study of dose-response relationship in an asbestos textile factory. British Journal of Industrial Medicine 36, 98–112.

Breslow NE and Day NE (1980). Statistical Methods in Cancer Research I: The Analysis of Case-Control Studies. Lyon: IARC.

Cox DR (1972). Regression models and life tables (with discussion). Journal of the Royal Statistical Society Series B 34, 187–220.

MATH   Google Scholar  

Enterline PE (1976). Pitfalls in epidemiological research: an examination of the asbestos literature. Journal of Occupational Medicine 18, 150–156.

Article   Google Scholar  

Fox AJ and Collier PF (1976). Low mortality rates in industrial cohort studies due to selection for work and survival in the industry. British Journal of Preventive and Social Medicine 30, 225–230.

Kalbfleisch JD and Prentice RL (1980). The Statistical Analysis of Failure Time Data. New York: Wiley.

Knox EG (1973). Computer simulation of industrial hazards. British Journal of Industrial Medicine 30, 54–63.

Lee AM and Fraumeni JF (1969). Arsenic and respiratory cancer in man. Journal of the National Cancer Institute 42, 1045–1052.

Lubin JH and Breslow NE (1983). Application of survival data ethodology to occupational mortality studies. (Unpublished manuscript).

Mancuso TF and El-Attar AA (1967). Mortality pattern in a cohort of asbestos workers. Journal of Occupational Medicine 9, 147–162.

Mosteller F and Tukey JW (1977). Data Analysis and Regression. Reading: Addison-Wesley.

Prentice RL and Breslow NE (1978). Retrospective studies and failure time models. Biometrika 65, 153–158.

Article   MATH   Google Scholar  

Rao CR (1965). Linear Statistical Inference and its Applications. New York: Wiley.

Yule GU (1934). On some points relating to vital statistics, more especially statistics of occupational mortality. Journal of the Royal Statistical Society 94, 1–84.

Download references

Author information

Authors and affiliations.

Department of Biostatistics, University of Washington, Seattle, USA

N. E. Breslow

Institute for Documentation, Information, and Statistics, German Cancer Research Center, Heidelberg, USA

You can also search for this author in PubMed   Google Scholar

Editor information

Editors and affiliations.

Universitäts-Krankenhaus Eppendorf, Institut für Mathematik und Datenverarbeitung in der Medizin, Universität Hamburg, Martinistraße 52, 2000, Hamburg 20, Deutschland

J. Berger  & K. H. Höhne  & 

Additional information

Dedicated to Professor Dr. Otto Westphal on the occasion of his 70th birthday.

Rights and permissions

Reprints and permissions

Copyright information

© 1983 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper.

Breslow, N.E. (1983). Statistical Methods for Cohort and Case-Control Studies. In: Berger, J., Höhne, K.H. (eds) Methoden der Statistik und Informatik in Epidemiologie und Diagnostik. Medizinische Informatik und Statistik, vol 40. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-81938-4_12

Download citation

DOI : https://doi.org/10.1007/978-3-642-81938-4_12

Publisher Name : Springer, Berlin, Heidelberg

Print ISBN : 978-3-540-12007-0

Online ISBN : 978-3-642-81938-4

eBook Packages : Springer Book Archive

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Statistical methods for biomarker data pooled from multiple nested case-control studies

Affiliations.

  • 1 Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
  • 2 Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
  • 3 Department of Nutrition, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
  • 4 Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
  • 5 Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
  • 6 Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, and Harvard Medical School, Boston, MA, USA.
  • PMID: 31750898
  • PMCID: PMC8286552
  • DOI: 10.1093/biostatistics/kxz051

Pooling biomarker data across multiple studies allows for examination of a wider exposure range than generally possible in individual studies, evaluation of population subgroups and disease subtypes with more statistical power, and more precise estimation of biomarker-disease associations. However, circulating biomarker measurements often require calibration to a single reference assay prior to pooling due to assay and laboratory variability across studies. We propose several methods for calibrating and combining biomarker data from nested case-control studies when reference assay data are obtained from a subset of controls in each contributing study. Specifically, we describe a two-stage calibration method and two aggregated calibration methods, named the internalized and full calibration methods, to evaluate the main effect of the biomarker exposure on disease risk and whether that association is modified by a potential covariate. The internalized method uses the reference laboratory measurement in the analysis when available and otherwise uses the estimated value derived from calibration models. The full calibration method uses calibrated biomarker measurements for all subjects, including those with reference laboratory measurements. Under the two-stage method, investigators complete study-specific analyses in the first stage followed by meta-analysis in the second stage. Our results demonstrate that the full calibration method is the preferred aggregated approach to minimize bias in point estimates. We also observe that the two-stage and full calibration methods provide similar effect and variance estimates but that their variance estimates are slightly larger than those from the internalized approach. As an illustrative example, we apply the three methods in a pooling project of nested case-control studies to evaluate (i) the association between circulating vitamin D levels and risk of stroke and (ii) how body mass index modifies the association between circulating vitamin D levels and risk of cardiovascular disease.

Keywords: Aggregation; Calibration; Conditional logistic regression; Nested case–control study; Pooling.

© The Author 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: [email protected].

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural
  • Calibration
  • Case-Control Studies
  • Research Design*

Grants and funding

  • R01 CA152071/CA/NCI NIH HHS/United States
  • R03 CA212799/CA/NCI NIH HHS/United States
  • T32 CA009337/CA/NCI NIH HHS/United States
  • T32 NS048005/NS/NINDS NIH HHS/United States

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • Hosted content
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Proton pump inhibitors and the risk of inflammatory bowel disease: a Mendelian randomisation study
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • Hongjin An 1 ,
  • Min Zhong 1 ,
  • http://orcid.org/0000-0002-5736-1283 Huatian Gan 2 , 3
  • 1 Department of Gastroenterology and Hepatology, West China Hospital, Sichuan University , Chengdu , China
  • 2 Department of Geriatrics and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University , Chengdu , China
  • 3 Department of Gastroenterology and Laboratory of Inflammatory Bowel Disease, the Center for Inflammatory Bowel Disease, Clinical Institute of Inflammation and Immunology, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University , Chengdu , China
  • Correspondence to Dr Huatian Gan, West China Hospital of Sichuan University, Chengdu, Sichuan, China; ganhuatian123{at}163.com

https://doi.org/10.1136/gutjnl-2024-331904

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • INFLAMMATORY BOWEL DISEASE

We read with great interest the population-based cohort study by Abrahami D et al , 1 in which they found that the use of proton pump inhibitors (PPIs) was not associated with an increased risk of inflammatory bowel disease (IBD). However, the assessment of causality in observational studies is often challenging due to the presence of multiple confounding factors. The existence of a causal relationship between PPIs and IBD remains unclear at present. Mendelian randomisation (MR) is a method of generating more reliable evidence using exposure-related genetic variants to assess causality, limiting the bias caused by confounders. 2 Therefore, we used a two-sample MR analysis to investigate the association between the use of PPIs and IBD including Crohn’s disease (CD) and ulcerative colitis (UC).

Supplemental material

Here, we mainly used the inverse-variance weighted 8 method for MR analysis with weighted median, 9 MR-Egger 10 and MR-PRESSO 5 as complementary approaches. Furthermore, we applied a series of sensitivity analyses to ensure the robustness of our results, with Cochran’s Q test to assess heterogeneity and the intercept of an MR-Egger regression to assess horizontal pleiotropy. The genetic prediction of omeprazole, esomeprazole, lansoprazole and rabeprazole use, as depicted in figure 1 , demonstrated no significant association with an increased risk of IBD after excluding pleiotropic SNPs (omeprazole, OR, 1.05; 95% CI, 0.88 to 1.25; p=0.587; esomeprazole, OR, 0.99; 95% CI, 0.92 to 1.07; p=0.865; lansoprazole, OR, 1.06; 95% CI, 0.89 to 1.26; p=0.537; and rabeprazole, OR, 1.00; 95% CI, 0.95 to 1.04; p=0.862). The IBD subtype analyses also did not reveal any evidence of an increased risk of CD or UC associated with the use of PPIs ( figure 1 ). These findings were robustly confirmed through complementary approaches employing rigorous methodologies that consistently yielded similar point estimates ( figure 1 ). Further sensitivity analyses showed the absence of heterogeneity (All P heterogeneity >0.05) and pleiotropy (All P pleiotropy >0.05), again demonstrating the robustness of the conclusions ( figure 1 ).

  • Download figure
  • Open in new tab
  • Download powerpoint

Mendelian randomisation estimates the associations between the use of different types of proton pump inhibitors and inflammatory bowel disease. IBD, inflammatory bowel disease; CD, Crohn’s disease; UC, ulcerative colitis; PPIs, proton pump inhibitors; IVW, inverse-variance weighted; MR, Mendelian randomisation.

In conclusion, the MR results corroborate Abrahami D et al ’s findings that PPIs were not associated with an increased risk of IBD. Nonetheless, further research is needed to elucidate the effects of more types, drug dosage, frequency and duration on IBD.

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Abrahami D ,
  • Pradhan R ,
  • Yin H , et al
  • Kathiresan S
  • Fang H , et al
  • van Sommeren S ,
  • Huang H , et al
  • Verbanck M ,
  • Neale B , et al
  • Tilling K ,
  • Davey Smith G
  • Brion M-JA ,
  • Shakhbazov K ,
  • Visscher PM
  • Burgess S ,
  • Timpson NJ , et al
  • Davey Smith G ,
  • Haycock PC , et al

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1

HA and MZ contributed equally.

Contributors All authors conceived and designed the study. HA and MZ did the statistical analyses and wrote the manuscript. HG revised the manuscript and is the guarantor. HA and MZ have contributed equally to this study.

Funding The present work was supported by the National Natural Science Foundation of China (No. 82070560) and 1.3.5 Project for Disciplines of Excellence, West China Hospital, Sichuan (No. ZYGD23013).

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Disclaimer: Early release articles are not considered as final versions. Any changes will be reflected in the online version in the month the article is officially released.

Volume 30, Number 3—March 2024

Geographic Variation and Environmental Predictors of Nontuberculous Mycobacteria in Laboratory Surveillance, Virginia, USA, 2021–2023 1

Suggested citation for this article

Because epidemiologic and environmental risk factors for nontuberculous mycobacteria (NTM) have been reported only infrequently, little information exists about those factors. The state of Virginia, USA, requires certain ecologic features to be included in reports to the Virginia Department of Health, presenting a unique opportunity to study those variables. We analyzed laboratory reports of Mycobacterium avium complex (MAC) and M. abscessus infections in Virginia during 2021–2023. MAC/ M. abscessus was isolated from 6.19/100,000 persons, and 2.37/100,000 persons had MAC/ M. abscessus lung disease. M. abscessus accounted for 17.4% and MAC for 82.6% of cases. Saturated vapor pressure was associated with MAC/ M. abscessus prevalence (prevalence ratio 1.414, 95% CI 1.011–1.980; p = 0.043). Self-supplied water use was a protective factor (incidence rate ratio 0.304, 95% CI 0.098–0.950; p = 0.041). Our findings suggest that a better understanding of geographic clustering and environmental water exposures could help develop future targeted prevention and control efforts.

Nontuberculous mycobacteria (NTM) infections are increasing globally and have thus become pathogens of substantial public health concern ( 1 ). However, because of scarce public health reporting, little is known about epidemiologic and environmental risk factors for NTM. Virginia is one of the few states in the United States where NTM infections are reported to a statewide public health agency ( 2 ); those data are uniquely suited to study the NTM bacterial complex. In addition, Virginia, which has areas of varying population density and a relatively large population using self-supplied domestic water (e.g., well water, rainwater captured in cisterns), presents a particularly advantageous location to study the environmental epidemiology of NTM, given its location in the southeastern United States, a region previously described as having a relatively high burden of NTM disease and that has areas of various geographic and climatic conditions: the Coastal Plains (Tidewater), Piedmont, Blue Ridge Mountains, Valley and Ridge, and Appalachian Plateau regions ( 3 , 4 ).

Exposure to environmental and in-home water sources, soil conditions and metallic content, climate, and coexisting medical conditions are thought to play complex roles in the acquisition and development of NTM infection ( 5 ). Numerous risk factors for NTM disease have been identified, including coexisting conditions such as compromised immunity, cystic fibrosis, prior cavitary lung disease, and bronchiectasis; atmospheric water vapor content has also been identified as a predictor of NTM rates across cystic fibrosis centers ( 6 , 7 ).

Previous studies of NTM epidemiology, often relying on data from retrospective review of electronic medical record databases, suggest NTM are increasing in incidence; the most common pathogens of clinical respiratory disease belong to Mycobacterium avium complex (MAC) and Mycobacterium abscessus ( 8 – 10 ). To date, information for epidemiologic research from laboratory surveillance for NTM such as MAC and M. abscessus has not been accessed as frequently as for some other pathogens of public health concern ( 11 – 15 ). Despite this, population-based studies of NTM have found that 86% of patients meeting the American Thoracic Society/Infectious Diseases Society of America microbiologic definition of NTM lung disease also met full clinical criteria for that disease, suggesting microbiologic laboratory-based data could be used for public health surveillance ( 16 ). We aimed to characterize the geographic distribution of MAC/ M. abscessus isolates that met microbiologic criteria for NTM lung disease across Virginia to determine geographic clustering and model population-level determinants of prevalence at the county level. For this epidemiologic study, we used demographic and microbiologic data from routine electronic laboratory reports made to the Virginia Department of Health during June 2021–March 2023, as part of a prospective surveillance study approved by human subject review boards at the University of Virginia (#HSR 200234) and Virginia Department of Health.

The time period for our study encompassed multiple years of inherent seasonality inclusive of all months for which complete data were available from the state health department. These reports included any culture positive for MAC or M. abscessus from any laboratory within the state of Virginia. For all positive cultures, we obtained the person’s age, sex, and residential ZIP (postal) code, as well as the anatomic site of sample isolation and date of test result. Case counts were aggregated to the county level based on residential postal codes.

To investigate potential climatic and geographic factors associated with MAC/ M. abscessus prevalence, we obtained mean annual saturated vapor pressure, mean daily maximum temperature, and mean annual precipitation data for each county in Virginia during 2021–2022 from Weather Source ( https://weathersource.com ). We extracted the percentage of each county using self-supplied groundwater from US Geological Survey data from 2018, the most recent data available ( 4 ). Based on a recent US Geological Survey analysis, water source data from Virginia has been reliably recorded and relatively stable over time ( 17 ).

Case Definitions

We defined cases of MAC/ M. abscessus lung disease using 2020 American Thoracic Society/Infectious Diseases Society of America microbiologic criteria for NTM pulmonary disease ( 18 ). Case-patients had either a single MAC or M. abscessus culture isolated from bronchoalveolar lavage, pleural fluid, or lung tissue or ≥2 cultures from sputum. For persons with multiple cultures collected over time, we included case data only from the earliest culture meeting these criteria. We excluded data from mixed MAC and M. abscessus cultures or from successive cultures testing positive for one then the other. We excluded cases not meeting the microbiologic criteria for lung disease in which only 1 sputum culture contained MAC or M. abscessus . We excluded data from lung disease cases diagnosed based on nonrespiratory samples. We also excluded data from persons residing outside of Virginia.

Statistical Analyses

We analyzed differences in age of MAC and M. abscessus case-patients using Mann-Whitney U tests and differences in sex using χ 2 tests. We obtained US Census Bureau data on population size, median age, and population density for each Virginia county from 2022, the midpoint of the study period ( 19 ). We calculated average annual prevalence of MAC/ M. abscessus lung disease captured by laboratory surveillance during 2021–2023 for the entire state of Virginia and for each county and independent city. Average annual prevalence was reported as rate per 100,000 population.

We generated choropleth maps to visualize total county-level MAC/ M. abscessus , MAC, and M. abscessus infections, saturated vapor pressure, and percentage of county population using self-supplied water. Self-supplied water comes from nonpublic groundwater or surface water sources, such as wells or rainwater captured in cisterns. To assess clustering, we calculated Moran I for each map as a measure of spatial autocorrelation. We analyzed factors potentially associated with prevalence of MAC/ M. abscessus infections in each county using negative binomial regression, a generalization of Poisson regression, to account for overdispersion. We adjusted population numbers using the natural log of person-years as an offset variable. We defined person-years as the given population (e.g., statewide, county) multiplied by 3 years (i.e., length of the study period). We included additional variables in the final model as potentially relevant epidemiologic confounders and environmental factors noted in previous investigations of NTM: sex, median age, population density, mean saturated vapor pressure, mean maximum temperature, mean daily precipitation, and percentage of population using self-supplied water ( 3 , 6 , 8 , 10 ). We reported exponentiated coefficients from the model as prevalence ratios. We analyzed data using SPSS Statistics 28.0 (IBM, https://www.ibm.com ) and generated maps using ArcGIS 3.0 (Environmental Systems Research Institute, https://www.esri.com ).

Statewide Results

We identified 874 persons with > 1 MAC or M. abscessus pulmonary cultures during the 2021–2023 data collection period. We excluded 10 persons who resided outside of Virginia, leaving data from 864 persons to evaluate. We categorized 714 persons (82.6%) with MAC and 150 (17.4%) with M. abscessus ; 331/864 (38.3%) of those met microbiologic criteria for NTM lung disease.

Case Demographics

Prevalence of Mycobacterium avium complex (MAC), M. abscessus, or both (MAC/M. abscessus), categorized by age and sex, Virginia, USA, 2021–2023.

Figure 1 . Prevalence of Mycobacterium avium complex (MAC), M. abscessus , or both (MAC/ M. abscessus ), categorized by age and sex, Virginia, USA, 2021–2023.

Median age was 69 (interquartile range [IQR] 58–76) years among case-patients identified with MAC/ M. abscessus infections overall, median 64 (IQR 46–75) years among those with M. abscessus , and median 69 (IQR 60–77) years among those with MAC. Only 18 case-patients (2.1%) were <18 years of age, and 534 (61.8%) were >65 years of age. Sex distribution for all case-patients was 497 (57.5%) female and 366 (42.5%) male ( Table 1 ). We found no difference in sex distribution between total MAC and M. abscessus case-patients of all ages (p = 0.934). Prevalences of MAC, M. abscessus , and total MAC/ M. abscessus cases were higher for female than male case-patients >65 years of age but were similar compared with all other case-patients <65 years ( Figure 1 ).

Geographic Distribution

Geographic distribution and variables of interest for Mycobacterium avium complex (MAC) and Mycobacterium abscessus infections, Virginia, USA, 2021–2023. County-level prevalence (cases/100,000 person-years) of A) MAC/M. abscessus; C) MAC; and E) M. abscessus. B) M. abscessus distribution as a percentage of total MAC/M. abscessus infections. D) Percentage of residents using self-supplied water. F) Saturated water vapor pressure in millibars.

Figure 2 . Geographic distribution and variables of interest for Mycobacterium avium complex ( MAC) and Mycobacterium abscessus infections, Virginia, USA, 2021–2023. County-level prevalence (cases/100,000 person-years) of A) MAC/...

Rates of MAC/ M. abscessus infections varied significantly by locality, driven by differences in distribution of MAC infections ( Figure 2 ). MAC/ M. abscessus cases clustered throughout the state (Moran I  = 0.219, p<0.001) similar to MAC ( Figure 2 panel C; Moran I  = 0.210, p<0.001), especially in the central counties of the Piedmont region and on several peninsulas on Chesapeake Bay in the Tidewater region ( Figure 2 , panels A, C); we found no clear clustering of M. abscessus cases (Moran I  = 0.01, p = 0.663) ( Figure 2 , panel E). We did find clustering in rates of self-supplied water use (Moran’s I  = 0.189, p<0.001) and mean annual saturated vapor pressure (Moran I  = 0.820, p<0.001) ( Figure 2 , panels D, F). Self-supplied water use appeared to cluster in the more rural south-central parts of the Piedmont region; saturated vapor pressure was highest in the Tidewater region in the southeastern part of the state.

A regression model of county-level prevalence of MAC/ M. abscessus infections ( Table 2 ) showed saturated vapor pressure to be associated with prevalence of MAC/ M. abscessus infections. Each 1 millibar increase in mean annual saturated vapor pressure resulted in a 41.4% increase in expected count of MAC/ M. abscessus infections (prevalence ratio [PR] 1.414, 95% CI 1.011–1.980; p = 0.043), whereas each 1% increase in the proportion of the county population using self-supplied water resulted in a 69.6% decrease in expected MAC/ M. abscessus infections (IRR 0.304, 95% CI 0.098–0.950; p = 0.041). Other population-level variables included in the model were not significantly related to MAC/ M. abscessus prevalence rates. A similar model was constructed to evaluate effects of median age, sex, population density, saturated vapor pressure, temperature, precipitation, and proportion of self-supplied water use on prevalence of MAC or M. abscessus infections. Saturated vapor pressure was positively associated and self-supplied water use was negatively associated with MAC infection prevalence, but none of those factors was significantly associated with M. abscessus infection prevalence. A model constructed to assess relationships between those factors and prevalence of MAC/ M. abscessus pulmonary disease identified no significant association.

We report results of our evaluation of local and statewide rates of MAC/ M. abscessus infection in Virginia using real-time, laboratory-based monitoring. We found that average annual prevalence of MAC/ M. abscessus in Virginia over the study period was 6.19 cases of MAC/ M. abscessus infection per 100,000 population and 2.37 cases of MAC/ M. abscessus lung disease per 100,000 population. More case-patients were female than male, and most were older persons (median age 69 years), consistent with known demographics associated with NTM infection. Of note, we demonstrated significant geographic clustering of MAC/ M. abscessus . We found increases in saturated water vapor pressure strongly associated with prevalence and self-supplied water use negatively associated with prevalence at the county level, independent of population density.

Characterizing the epidemiology of NTM remains challenging, often because of underreporting. Multiple studies have demonstrated the limitations of using diagnostic billing (International Classification of Diseases [ICD]) codes to identify rates of NTM disease. Barriers include lack of clinician familiarity with NTM diagnostic characteristics and variable rates of need for active antimicrobial therapy, which might not be necessary for treatment of NTM lung disease, unlike for many other infectious diseases ( 20 , 21 ). Several additional recent studies have evaluated laboratory-based surveillance of NTM, including 1 study from a CDC surveillance program ( 22 ). Our study differed from that study in multiple ways. Of note, we included data from a state in the southeastern United States, a region not represented in the CDC surveillance data, and gathered comprehensive surveillance data for the entire state from statewide laboratories rather than individual sentinel laboratories. Our prevalence estimate for MAC/ M. abscessus pulmonary disease (2.37/100,000 population) was lower than overall NTM incidence seen in the CDC study (6.1/100,000 population). That difference might be because we included only MAC and M. abscessus , not other NTM, or that we included all laboratories statewide rather than only laboratories serving referral centers. Other recent studies based on statewide data from Missouri ( 23 ) and Wisconsin ( 24 ) have used laboratory-based surveillance. Comparing prevalence rates based on our data with rates from those other studies was difficult because of differences in methodology and inclusion criteria. The Missouri study ( 23 ) reported aggregate period rates. The Wisconsin study ( 24 ) reported an overall average annual NTM incidence of 22.1–22.4 cases/100,000 persons but included repeat positive samples from individual persons as separate cases. In multivariate modeling across those studies, socioeconomic factors were found to be associated with NTM rates in the Wisconsin study but not the Missouri study. We lacked access to those data from patients in our cohort. Our study also differed from the Missouri and Wisconsin studies in that it was set in the southeastern rather than midwestern United States. In addition, we included environmental exposure variables not evaluated in the Missouri and Wisconsin studies ( 23 , 24 ).

We found a higher percentage of M. abscessus (17.4%) among total MAC/ M. abscessus infections than other studies of distribution of NTM based on aggregate data ( 25 ), possibly because we excluded NTM species other than MAC and M. abscessus . Still, a recent study showed a range of 4.5%–21.7% widely distributed across the United States for M. abscessus ( 26 ). The southeast had the highest proportion of M. abscessus among NTM species of any US region ( 26 ), but particularly given the clinical severity of M. abscessus lung disease, its considerable antimicrobial resistance, and the difficulty of managing antimycobacterial therapy, further research is needed to understand why M. abscessus appears to be so prevalent in that region.

Our study explored associations between MAC/ M. abscessus infections and local-level environmental exposures. Previous data have shown that variations between locations in temperature, rainfall, flooding, and drought are associated with prevalence of NTM ( 27 ). Saturated vapor pressure has been shown to be the climate variable most closely associated with NTM prevalence ( 6 , 7 ). In our study, mean annual saturated vapor pressure was highest in the Tidewater region in the southeastern part of the state and correlated with higher local prevalence of MAC/ M. abscessus . Of note, saturated vapor pressure is expected to increase globally with ongoing trends in climate change, highlighting the need to understand how those changes might relate to risks of developing NTM lung disease.

We also examined the relationship between drinking water sources and MAC/ M. abscessus prevalence. NTM have been more commonly isolated from central water distribution system than groundwater sources, but this comparison has not been tested epidemiologically ( 28 ). However, several studies have shown piping from central household water sources to be a pathway for NTM infection ( 29 , 30 ). The source of household water is thought to be critical, with NTM rarely found in samples of clean groundwater ( 31 ). Here, we found increased use of self-supplied water (mostly well water) to be associated with lower rates of MAC/ M. abscessus infections in a given locality even after adjusting for population density. Based on our data, the effect size associated with water sources was even larger than with environmental variables, suggesting that water source might constitute a substantial factor in acquiring NTM.

As with many studies based on laboratory surveillance, our study was limited by a lack of individual-level data regarding water sources and behavioral variables, and we assumed that residential postal codes best reflect the location of a person’s greatest source of exposure to water for drinking and bathing. However, environmental ( 31 ) and household ( 29 , 32 ) surveillance data from our study support that water vapor pressure and types of water source might be factors in acquiring NTM. We also considered that the location of referral centers, particularly the cluster of counties surrounding a large academic hospital in central Virginia. might have biased our observation of geographic clustering. However, 1 study of NTM clustering across the United States found that neither physician-to-patient ratio nor referral center proximity within an area was associated with local variations in clustering of NTM prevalence ( 33 ). In addition to the modest underestimate of NTM lung disease when considering only laboratory-based microbiologic criteria ( 16 ), MAC and M. abscessus represented only 73.6% of pathogenic pulmonary NTM isolates in Virginia based on earlier data from our group ( 34 ), and thus NTM lung disease likely carries a greater total population burden than we report. Furthermore, given our study design, we could not conclusively establish causation with regards to the association between exposure variables and outcomes of interest. Finally, although recent data were available, we matched covariates only spatially, not temporally.

In summary, we found a high proportion of NTM isolates in Virginia were MAC. Local clustering of MAC/ M. abscessus infections within Virginia during the study period might be explained by differences in household water sources and saturated water vapor levels. Future studies of the geographic distribution of NTM should highlight variations in the distribution of different NTM species; additional controlled studies are needed to explore those factors and assess the effects of other individual-level exposures that might be related to developing NTM lung disease. Our findings suggest that a better understanding of geographic clustering and environmental water exposures related to NTM could help inform future monitoring activities and development of prevention and control efforts targeted to populations most at risk.

Dr. Mullen is a resident physician within the Department of Internal Medicine at the University of Virginia in Charlottesville. His research interests include epidemiology and treatment of mycobacterial infections and HIV.

Acknowledgment

This work was supported by funding from funding from National Institutes of Health grant R01 HL 155547.

  • Dahl  VN , Mølhave  M , Fløe  A , van Ingen  J , Schön  T , Lillebaek  T , et al. Global trends of pulmonary infections with nontuberculous mycobacteria: a systematic review. Int J Infect Dis . 2022 ; 125 : 120 – 31 . DOI PubMed Google Scholar
  • Winthrop  KL , Henkle  E , Walker  A , Cassidy  M , Hedberg  K , Schafer  S . On the reportability of nontuberculous mycobacterial disease to public health authorities. Ann Am Thorac Soc . 2017 ; 14 : 314 – 7 . DOI PubMed Google Scholar
  • Strollo  SE , Adjemian  J , Adjemian  MK , Prevots  DR . The burden of pulmonary nontuberculous mycobacterial disease in the United States. Ann Am Thorac Soc . 2015 ; 12 : 1458 – 64 . DOI PubMed Google Scholar
  • Dieter  CA , Maupin  MA , Caldwell  RR , Harris  MA , Ivahnenko  TI , Lovelace  JK , et al. Estimated use of water in the United States in 2015 (circular 1441). Reston, VA: US Geological Survey; 2018 [ cited 2023 Jul 12 ]. https://pubs.er.usgs.gov/publication/cir1441
  • Johnson  MM , Odell  JA . Nontuberculous mycobacterial pulmonary infections. J Thorac Dis . 2014 ; 6 : 210 – 20 . PubMed Google Scholar
  • Adjemian  J , Olivier  KN , Prevots  DR . Nontuberculous mycobacteria among patients with cystic fibrosis in the United States: screening practices and environmental risk. Am J Respir Crit Care Med . 2014 ; 190 : 581 – 6 . DOI PubMed Google Scholar
  • Prevots  DR , Adjemian  J , Fernandez  AG , Knowles  MR , Olivier  KN . Environmental risks for nontuberculous mycobacteria. Individual exposures and climatic factors in the cystic fibrosis population. Ann Am Thorac Soc . 2014 ; 11 : 1032 – 8 . DOI PubMed Google Scholar
  • Adjemian  J , Frankland  TB , Daida  YG , Honda  JR , Olivier  KN , Zelazny  A , et al. Epidemiology of nontuberculous mycobacterial lung disease and tuberculosis, Hawaii, USA. Emerg Infect Dis . 2017 ; 23 : 439 – 47 . DOI PubMed Google Scholar
  • Adjemian  J , Olivier  KN , Seitz  AE , Holland  SM , Prevots  DR . Prevalence of nontuberculous mycobacterial lung disease in U.S. Medicare beneficiaries. Am J Respir Crit Care Med . 2012 ; 185 : 881 – 6 . DOI PubMed Google Scholar
  • Winthrop  KL , Varley  CD , Ory  J , Cassidy  PM , Hedberg  K . Pulmonary disease associated with nontuberculous mycobacteria, Oregon, USA. Emerg Infect Dis . 2011 ; 17 : 1760 – 1 . DOI PubMed Google Scholar
  • Cheng  Q , Collender  PA , Heaney  AK , McLoughlin  A , Yang  Y , Zhang  Y , et al. Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections. PLOS Comput Biol . 2022 ; 18 : e1010575 . DOI PubMed Google Scholar
  • Huang  JH , Kao  PN , Adi  V , Ruoss  SJ . Mycobacterium avium-intracellulare pulmonary infection in HIV-negative patients without preexisting lung disease: diagnostic and management limitations. Chest . 1999 ; 115 : 1033 – 40 . DOI PubMed Google Scholar
  • Chou  MP , Clements  AC , Thomson  RM . A spatial epidemiological analysis of nontuberculous mycobacterial infections in Queensland, Australia. BMC Infect Dis . 2014 ; 14 : 279 . DOI PubMed Google Scholar
  • Donohue  MJ , Wymer  L . Increasing prevalence rate of nontuberculous mycobacteria infections in five states, 2008–2013. Ann Am Thorac Soc . 2016 ; 13 : 2143 – 50 . DOI PubMed Google Scholar
  • Mejia-Chew  C , Chavez  MA , Lian  M , McKee  A , Garrett  L , Bailey  TC , et al. Spatial epidemiologic analysis and risk factors for nontuberculous mycobacteria infections, Missouri, USA, 2008–2019. Emerg Infect Dis . 2023 ; 29 : 1540 – 6 . DOI PubMed Google Scholar
  • Winthrop  KL , McNelley  E , Kendall  B , Marshall-Olson  A , Morris  C , Cassidy  M , et al. Pulmonary nontuberculous mycobacterial disease prevalence and clinical features: an emerging public health disease. Am J Respir Crit Care Med . 2010 ; 182 : 977 – 82 . DOI PubMed Google Scholar
  • US Geological Survey . Factors affecting uncertainty of public supply, self-supplied domestic, irrigation, and thermoelectric water-use data, 1985–2015—evaluation of information sources, estimation methods, and data variability [ cited 2023 Dec 12 ]. https://pubs.usgs.gov/sir/2021/5082/sir20215082.pdf
  • Daley  CL , Iaccarino  JM , Lange  C , Cambau  E , Wallace  RJ Jr , Andrejak  C , et al. Treatment of nontuberculous mycobacterial pulmonary disease: an official ATS/ERS/ESCMID/IDSA clinical practice guideline. Eur Respir J . 2020 ; 56 : 2000535 . DOI PubMed Google Scholar
  • US Census Bureau . QuickFacts: Virginia [ cited 2023 Jul 11 ]. https://www.census.gov/quickfacts/VA
  • Winthrop  KL , Baxter  R , Liu  L , McFarland  B , Austin  D , Varley  C , et al. The reliability of diagnostic coding and laboratory data to identify tuberculosis and nontuberculous mycobacterial disease among rheumatoid arthritis patients using anti-tumor necrosis factor therapy. Pharmacoepidemiol Drug Saf . 2011 ; 20 : 229 – 35 . DOI PubMed Google Scholar
  • Mejia-Chew  C , Yaeger  L , Montes  K , Bailey  TC , Olsen  MA . Diagnostic accuracy of health care administrative diagnosis codes to identify nontuberculous mycobacteria disease: a systematic review. Open Forum Infect Dis. 2021 ;8:ofab035.
  • Grigg  C , Jackson  KA , Barter  D , Czaja  CA , Johnston  H , Lynfield  R , et al. Epidemiology of pulmonary and extrapulmonary nontuberculous mycobacteria infections at 4 US emerging infections program sites: a 6-month pilot. Clin Infect Dis . 2023 ; 77 : 629 – 37 . DOI PubMed Google Scholar
  • Vonasek  BJ , Gusland  D , Hash  KP , Wiese  AL , Tans-Kersten  J , Astor  BC , et al. Nontuberculous mycobacterial infection in Wisconsin adults and its relationship to race and social disadvantage. Ann Am Thorac Soc . 2023 ; 20 : 1107 – 15 . DOI PubMed Google Scholar
  • Prevots  DR , Marras  TK . Epidemiology of human pulmonary infection with nontuberculous mycobacteria: a review. Clin Chest Med . 2015 ; 36 : 13 – 34 . DOI PubMed Google Scholar
  • Marshall  J , Mercaldo  R , Lipner  E , Prevots  R . Nontuberculous mycobacteria testing and culture positivity in the United States based on Labcorp Data. In: American Thoracic Society International Conference Abstracts, May 19–24, 2023 , Washington DC, USA. p. A2955.
  • Thomson  RM , Furuya-Kanamori  L , Coffey  C , Bell  SC , Knibbs  LD , Lau  CL . Influence of climate variables on the rising incidence of nontuberculous mycobacterial (NTM) infections in Queensland, Australia 2001-2016. Sci Total Environ . 2020 ; 740 : 139796 . DOI PubMed Google Scholar
  • Falkinham  JO III , Norton  CD , LeChevallier  MW . Factors influencing numbers of Mycobacterium avium, Mycobacterium intracellulare , and other Mycobacteria in drinking water distribution systems. Appl Environ Microbiol . 2001 ; 67 : 1225 – 31 . DOI PubMed Google Scholar
  • Lande  L , Alexander  DC , Wallace  RJ Jr , Kwait  R , Iakhiaeva  E , Williams  M , et al. Mycobacterium avium in community and household water, suburban Philadelphia, Pennsylvania, USA, 2010–2012. Emerg Infect Dis . 2019 ; 25 : 473 – 81 . DOI PubMed Google Scholar
  • Whiley  H , Keegan  A , Giglio  S , Bentham  R . Mycobacterium avium complex—the role of potable water in disease transmission. J Appl Microbiol . 2012 ; 113 : 223 – 32 . DOI PubMed Google Scholar
  • Martin  EC , Parker  BC , Falkinham  JO III . Epidemiology of infection by nontuberculous mycobacteria. VII. Absence of mycobacteria in southeastern groundwaters. Am Rev Respir Dis . 1987 ; 136 : 344 – 8 . DOI PubMed Google Scholar
  • Falkinham  JO III . Nontuberculous mycobacteria from household plumbing of patients with nontuberculous mycobacteria disease. Emerg Infect Dis . 2011 ; 17 : 419 – 24 . DOI PubMed Google Scholar
  • Adjemian  J , Olivier  KN , Seitz  AE , Falkinham  JO III , Holland  SM , Prevots  DR . Spatial clusters of nontuberculous mycobacterial lung disease in the United States. Am J Respir Crit Care Med . 2012 ; 186 : 553 – 8 . DOI PubMed Google Scholar
  • Satyanarayana  G , Heysell  SK , Scully  KW , Houpt  ER . Mycobacterial infections in a large Virginia hospital, 2001-2009. BMC Infect Dis . 2011 ; 11 : 113 . DOI PubMed Google Scholar
  • Figure 1 . Prevalence of Mycobacterium avium complex (MAC), M. abscessus, or both (MAC/M. abscessus), categorized by age and sex, Virginia, USA, 2021–2023.
  • Figure 2 . Geographic distribution and variables of interest for Mycobacterium avium complex (MAC) and Mycobacterium abscessus infections, Virginia, USA, 2021–2023. County-level prevalence (cases/100,000 person-years) of A) MAC/M. abscessus; C) MAC; and...
  • Table 1 . Demographic characteristics of case-patients with MAC and Mycobacterium abscessus, by isolate, Virginia, USA, 2021–2023
  • Table 2 . Negative binomial regression model of county-level factors associated with county Mycobacterium avium complex and M. abscessus case prevalence, Virginia, USA, 2021–2023

Suggested citation for this article : Mullen B, Houpt ER, Colston J, Becker L, Johnson S, Young L, et al. Geographical variation and environmental predictors of nontuberculous mycobacteria in laboratory surveillance, Virginia, USA, 2021–2023. Emerg Infect Dis. 2024 Mar [ date cited ]. https://doi.org/10.3201/eid3003.231162

DOI: 10.3201/eid3003.231162

Original Publication Date: February 15, 2024

1 Preliminary results from this study were presented at the Union-North America Region (NAR) conference, February 22–25, 2023, Vancouver, British Columbia, Canada.

Table of Contents – Volume 30, Number 3—March 2024

Please use the form below to submit correspondence to the authors or contact them at the following address:

Scott Heysell, Division of Infectious Diseases and International Health, University of Virginia, 345 Crispell Dr, Charlottesville, VA 22908, USA

Comment submitted successfully, thank you for your feedback.

There was an unexpected error. Message not sent.

Exit Notification / Disclaimer Policy

  • The Centers for Disease Control and Prevention (CDC) cannot attest to the accuracy of a non-federal website.
  • Linking to a non-federal website does not constitute an endorsement by CDC or any of its employees of the sponsors or the information and products presented on the website.
  • You will be subject to the destination website's privacy policy when you follow the link.
  • CDC is not responsible for Section 508 compliance (accessibility) on other federal or private website.

Metric Details

Geographic variation and environmental predictors of nontuberculous mycobacteria in laboratory surveillance, virginia, usa, 2021–2023, what is the altmetric attention score.

The Altmetric Attention Score for a research output provides an indicator of the amount of attention that it has received. The score is derived from an automated algorithm, and represents a weighted count of the amount of attention Altmetric picked up for a research output.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Designing Case-Control Studies: Decisions About the Controls

The authors quantified, first, the effect of misclassified controls (i.e., individuals who are affected with the disease under study but who are classified as controls) on the ability of a case-control study to detect an association between a disease and a genetic marker, and second, the effect of leaving misclassified controls in the study, as opposed to removing them (thus decreasing sample size). The authors developed an informativeness measure of a study’s ability to identify real differences between cases and controls. They then examined this measure’s behavior when there are no misclassified controls, when there are misclassified controls, and when there were misclassified controls but they have been removed from the study. The results show that if, for example, 10% of controls are misclassified, the study’s informativeness is reduced to approximately 81% of what it would have been in a sample with no misclassified controls, whereas if these misclassified controls are removed from the study, the informativeness is only reduced to about 90%, despite the reduced sample size. If 25% are misclassified, those figures become approximately 56% and 75%, respectively. Thus, leaving the misclassified controls in the control sample is worse than removing them altogether. Finally, the authors illustrate how insufficient power is not necessarily circumvented by having an unlimited number of controls. The formulas provided by the authors enable investigators to make rational decisions about removing misclassified controls or leaving them in.

Epidemiologists developed case-control designs to aid them in searching for factors that might cause a given illness by comparing rates of exposure for potential risk factors in persons who have the illness (the case subjects) with those in the same population who presumably are not ill (the control subjects). The finding that people with lung cancer had higher rates of exposure to tobacco through their smoking than did controls who did not have lung cancer was the first evidence for the potential causative effects of smoking on cancer. Results of a case-control study are not considered as definitive as those of a randomized controlled trial, where a comparison is made prospectively between two identical populations, one exposed to the factor of interest and the other not, such as when Walter Reed exposed some soldiers to mosquitoes and placed others in a mosquito-free room to see if mosquitoes carried yellow fever. However, case-control studies are invaluable for several reasons: 1) many factors, such as patients’ genotypes, cannot be assigned randomly; 2) several factors, such as genetic and environmental risks, can be examined simultaneously; and 3) case-control studies, in which subjects are observed once, generally cost less than a prospective randomized controlled trial. Investigators of mental disorders appropriately put much effort into defining caseness by rigorous diagnostic criteria. However, deciding who should be a control is equally important. In this article, as guidance for the investigator who intends to design such a study, we illustrate the consequences, for a study’s results, of the decision to include or exclude from the control group persons who might have the targeted illness.

While it may seem obvious that control groups should include only disease-free subjects, several factors may induce investigators to consider unscreened controls. First, persons with illness are more likely to agree to participate in research than those without. Second, large databases of people who agreed to have their DNA anonymously genotyped with the results available for study already exist, which spares considerable expense for investigators. Third, without a definitive test for the absence of a targeted mental disorder, any claim that controls do not have the disorder seems limited in validity.

The psychiatric genetic literature contains ongoing discussions about the advantages and disadvantages of including affected individuals in control groups. For example, Tsuang et al. ( 1 ) noted that for relatively common conditions, one could reach different conclusions depending on whether one used screened controls or not; in their study of major depression, the morbid risk among relatives of controls was 8.1% when the controls came from the general population but 7.6% when only screened controls were used. Wickramaratne ( 2 ) showed that using population (i.e., unscreened) controls in a familial aggregation study did not affect validity (type I error) but did weaken statistical power. Moskvina et al. ( 3 ) have provided mathematical formulas for calculating power when unscreened controls are used.

In this article, we provide simple rules of thumb for evaluating the effect of misclassified disease-bearing controls on the ability of a case-control study to detect real differences between cases and controls. We also show that removing misclassified controls is better than leaving them in, even though doing so reduces total sample size. To do so, we consider case-control association studies between the disease and the genetic marker. We ask what happens when a specified proportion of the controls are misclassified. By “misclassified controls” we mean individuals who are classified as controls but who actually have the disease under study, that is, who should have been classified as cases. We then compare the strength of association one would get using the misclassified controls, as compared to using ideal controls.

Some investigators have proposed that using a very large number of controls can compensate for reduced power. It turns out, counterintuitively, that this is not true. We illustrate that beyond a certain point, collecting more and more controls does not improve a study’s statistical ability to detect a true association in a case-control design.

We assume an association study with a case-control design in which investigators are studying a possible association between a genetic marker—say, a single-nucleotide polymorphism—and a disease, with no comorbid conditions. We assume further that the case sample consists solely of correctly classified patients with the disease. However, the control sample may include some subjects who, unbeknownst to the investigator, are actually affected with the disease being studied. We refer to these subjects as “misclassified” controls.

We let p represent the true proportion of affected individuals who have the genetic marker in question, and q represents the same proportion among unaffected individuals. We assume that there is a true association between the disease and the genetic marker (i.e., p > q ). Then we define α as the proportion of misclassified controls in the control sample; for example, an α of 0.10 means that 10% of individuals in the control sample actually have the disease, whereas an α of zero indicates that no one in the control sample has the disease.

As a measure of informativeness, we use the chi-square statistic as it would be calculated in a “perfect” sample. Say the true proportion of cases who have the genetic marker is 30%, and imagine a sample with 100 cases. Then, for the calculations in this article, we let exactly 30 of those individuals have the marker. (This is in contrast to a real-life sample, in which, because of sampling variation, one might observe only 26 of the 100 cases having the marker, or perhaps 33 of the 100.) Similar reasoning applies to the control sample.

Below we give values of chi-square statistics for different association strengths and different proportions of misclassified controls. We then show how to interpret the tabular results, with examples. Next, we describe revealing patterns in the results and the useful rule of thumb we can derive from those patterns. Finally, we show what happens when the investigator can collect many controls but has no access to any more cases.

Numerical Results

We consider three situations:

  • Situation 1: There are no misclassified controls—that is, no one in the control sample has the disease being studied. In this case, our measure of informativeness, the chi-square statistic, represents the gold standard for that sample size. We call this χ 2 CC (where CC stands for “correctly classified”).
  • Situation 2: A proportion (α) of the controls in the sample are misclassified—that is, they actually have the disorder being studied. Now the χ 2 statistic is reduced from the gold standard value of situation 1 to a lower value. We call this lower value χ 2 MC (“misclassified”).
  • Situation 3: This is the same as situation 2, except that the investigator identifies and excludes the misclassified controls. Now the control sample is uniformly correctly classified again, but a price has been paid in terms of reduced size. The measure of informativeness for situation 3 is called χ 2 reduced .

Table 1 illustrates the behavior of all three types of χ 2 for some representative values of p and q and for setups where the proportions of misclassified controls in the control sample are α=0.1 and α=0.25, respectively. We consider setups in which there are equal numbers of cases and controls (denoted by t=1, where t indicates the ratio of controls to cases) and setups in which there are twice as many controls as cases (t=2) in the sample. The table gives a “factor” for each combination of p , q , and α; the user multiplies that factor by the number of cases, N , to calculate the corresponding χ 2 .

Chi-Square Factors to Use for the Correctly Classified (CC), Misclassified (MC), and Reduced Chi-Square Values, for Selected Values of p and q a

Examples Illustrating How to Use Table 1

Consider a sample with equal numbers of cases and controls (120 of each), and we will see what happens when 10% of the controls sample have the disease (i.e., α=0.1). Say that the true prevalence of the marker is 20% in cases, as opposed to 10% in unaffected individuals (i.e., p=0.2, q=0.1). The upper half of Table 1 shows results for α=0.1, and the first part of that section shows results for equal numbers of cases and controls (t=1). Look in the cells corresponding to p=0.2, q=0.1. The first cell gives the factor for χ 2 CC , which is 0.0392. To apply that factor to our data set, multiply it by the number of cases (0.0392×120), which reveals that the chi-square test statistic for an ideal sample of that size, with no misclassified controls, would be about 4.70—statistically significant at the 5% level. Now imagine that 10% of the 120 controls (i.e., 12 controls) are misclassified and actually have the disease. The next cell in the results for p=0.2 and q=0.10 gives the factor for χ 2 MC , which is 0.0309. Multiplying this factor by 120 yields 3.71—no longer significant. Finally, if we remove the 12 misclassified controls from the sample, we use the third cell in that box, a factor of 0.0366, yielding χ 2 reduced =120×0.0366=4.39—again significant, even though the sample is now smaller.

Consider a sample with twice as many controls as cases (100 cases, 200 controls), and see what happens when 25% of the controls in the sample have the disease (α=0.25). Use the same p=0.2 and q=0.1 as in the first example. We look to the lower half of the table for α=0.25, the lower section of which shows results for samples with twice as many controls as cases (t=2). Again find the results for p=0.2 and q=0.10, and see that the factor for χ 2 CC is 0.0577. To determine the value of χ 2 CC , multiply this factor by the number of cases (not the number of controls), which yields χ 2 CC =100×0.0577=5.77 (significant). Following the same steps as in example 1, we see that χ 2 MC =100×0.0294=2.94 (not significant) and χ 2 reduced =100×0.0498=4.98 (significant). This example also illustrates how a proportion of 25% misclassified controls has a much more serious effect than one of 10%.

These two examples illustrate how to use Table 1 . Readers who wish to calculate the chi-square factors for values of p , q , α, and t other than those listed in the table can refer to part 1 of the data supplement that accompanies the online edition of this article.

Patterns and Rule of Thumb

The numerical results in Table 1 reveal two interesting patterns. First, the greater the difference between the proportions of the genetic marker in cases and controls, the easier the association is to detect, as expected. We see this by comparing χ 2 values between different p - q combinations, which reveals that the greater the difference between p and q , the greater the χ 2 factor. For example, in any one of the subsections of the table, the χ 2 factors are greatest when p=0.30 and q=0.05. Second, the information lost by removing the misclassified controls, represented by χ 2 reduced , is far less than that lost by leaving them in the control sample. We see this by comparing the three χ 2 values within each p-q combination. Consistently, χ 2 MC (misclassified) is markedly less than χ 2 CC (correctly classified), whereas χ 2 reduced is only slightly less than χ 2 CC .

Theoretical calculations (see part 2 of the online data supplement) reveal that the ratio of χ 2 MC to χ 2 CC , which we can call the “ including misclassified controls ratio,” is around (1−α) 2 . Thus, if 10% of controls are misclassified, the χ 2 drops to about (0.9) 2 , or 81%, of the value it would have had if all controls had been correctly classified, and if 25% are misclassified, it drops to about (0.75) 2 , or 56%. In contrast, the ratio of χ 2 reduced to χ 2 CC , which we can call the “ removing misclassified controls ratio,” is only about 1−α. If 10% of controls are misclassified, this ratio is 90%, and if 25% are misclassified, it is 75%.

These results lead to a simple rule of thumb: If the proportion of misclassified controls in the sample is α, then the study’s informativeness will be reduced to about (1−α) 2 if the misclassified controls are left in the study, but only to about 1−α if they are removed.

Table S2 in the online data supplement lists values of these two ratios for the same setups examined above in Table 1 and shows that the actual ratios are reasonably close to those from the rule of thumb.

When Increasing the Number of Controls Does Not Improve Power

Whether or not one’s sample contains misclassified controls, it can happen that the sample is not large enough to achieve statistical significance. In that situation, one can try to increase the sample size, so as to improve statistical power. Unfortunately, if one has a limited number of cases and can collect only more controls, there is an upper limit on statistical power ( 4 ). We illustrate this fact by showing the maximum value that χ 2 can achieve in the following example.

Imagine you are conducting a study in which the true prevalence of the genetic marker is 10% in cases and 5% in controls (thus, p=0.10, q=0.05), and say your initial sample contains 50 cases and 50 controls (N=50, t=1). You can collect more controls if needed, but not more cases. Assume in this example that all controls are correctly classified. Table 1 yields a χ 2 factor of 0.0180. Multiplying by N yields 0.0180×50=0.90 for the approximate χ 2 —nowhere near sufficient for statistical significance. Intuitively, you might think that if you could collect enough additional controls, you could raise that χ 2 factor to an acceptable value, but that is not the case. In this example, the χ 2 cannot be made larger than 2.63, no matter how many controls you collect. Figure 1 illustrates this: If you increase the number of controls from 50 to 100, the χ 2 rises from 0.90 to 1.34, which is a nice improvement. However, even using 1,000 controls will only raise the χ 2 to 2.40, and after 2,000 controls, the curve practically levels off, slowly approaching its maximum value of 2.63.

An external file that holds a picture, illustration, etc.
Object name is nihms552276f1.jpg

One Example of a Chi-Square Value as a Function of an Increasing Number of Controls, in a Perfect Sample With 50 Cases a

a In this example, p=0.10 and q=0.05—that is, there is a true association, with the genetic marker occurring in 10% of cases and 5% of controls. There are no misclassified controls in this example. The graph shows how the χ 2 value approaches its maximum possible value of 2.63 as the number of controls increases. Increasing the number of controls to 10–20 times the number of cases will raise the χ 2 value to about 80%–90% of the maximum value, but beyond that, increasing the number of controls has little effect.

To calculate the maximum possible χ 2 value for other numbers of cases and other values of p and q , see equation 4 in the online data supplement.

We have shown that if 10% of the controls in a sample are misclassified, that is, are actually affected with the disease under study, the sample’s informativeness falls to about 75%–80% of what it would have been if all controls had been correctly classified; and if 25% of controls are misclassified, informativeness falls to around 50%, where we measure informativeness via the chi-square value from a “perfect” sample. These results are robust and do not depend on the true proportions of the genetic marker in the cases and controls or on whether there are equal numbers of cases and controls. Removing the ill controls from the control sample restores much of that lost informativeness and more than compensates for the reduced sample size. In this sense, the misclassified controls are “worse than useless” for analysis.

We have illustrated the effects when α is as high as 10% or 25%. If α is very low, the effect of misclassification is minor. For example, if α is only 1%, then (1−α) 2 is 98%, and 1−α is 99%; the study’s informativeness is hardly reduced at all, whether the misclassified controls are left in or not. Thus, these issues may be of less concern for rare psychiatric conditions such as schizophrenia.

Readers should bear in mind that these chi-square values do not measure statistical power directly. A user who wants to estimate power should use appropriate power formulas (see reference 3 , for example) or run computer simulations to do so.

Additionally, we have illustrated how if one has a limited number of cases available, then once past a certain point, increasing the number of controls no longer adds statistical power to one’s study. This fact is well known in biostatistics (see reference 4 , for example) but has not been widely recognized in psychiatric genetics. One implication is that consortia or repositories with very large numbers of controls may be of limited usefulness for some studies.

The reader may ask, “If I can identify which of my controls are misclassified, couldn’t I simply move them into the ‘cases’ category—wouldn’t that be better than removing them from the study altogether?” Yes, in the ideal situation in which one may be certain that the misclassified controls actually meet one’s diagnostic criteria for the disease of interest, counting them as cases will increase statistical power. However, if there is uncertainty about their diagnoses, it is better simply to remove them from the study ( 5 ). Our results show that the loss in informativeness from doing so is not as severe as leaving them in as controls would be.

Ongoing discussions in psychiatric genetics concern just how damaging misclassified controls may be to a case-control study. Some have argued that it is all right to have misclassified controls in one’s sample as long as one collects a sufficiently large sample to “counteract” their effect (see reference 6 , for example). However, we have also illustrated how simply collecting more and more controls does not necessarily solve the problem, since beyond a certain point, additional controls add no more statistical power. Schwartz and Susser ( 7 , 8 ) have argued that using “well” (i.e., screened) controls actually undermines validity. However, their argument addresses the situation in which investigators use stricter criteria for the controls than for the cases, such that cases and controls are no longer comparable. They do not address the more general situation in which comparable criteria are used for both groups, which is our concern here.

Acknowledgments

Dr. Weissman has received research support from NIMH, the National Institute on Drug Abuse, NARSAD, the Sackler Foundation, the Templeton Foundation, and the Interstitial Cystitis Association and receives royalties from Perseus Books, American Psychiatric Press, Oxford University Press, and Multi-Health Systems.

Supported by NIMH grants MH60912 (to Dr. Weissman), MH37592 (to Dr. Donald F. Klein and Dr. Fyer), MH65213 (to Drs. Subaran and Hodge), MH48858 (to Dr. Hodge), and MH090966 (to Drs. Jay Gingrich, Weissman, and Hodge).

The other authors report no financial relationships with commercial interests.

IMAGES

  1. Handbook of Statistical Methods for Case-Control Studies

    statistical methods for case control studies

  2. PPT

    statistical methods for case control studies

  3. case control study how to select controls

    statistical methods for case control studies

  4. PPT

    statistical methods for case control studies

  5. PPT

    statistical methods for case control studies

  6. PPT

    statistical methods for case control studies

VIDEO

  1. Numerical Analysis

  2. Learning Outcomes

  3. A Case study on Statistical Process Control (Free Chapter video)

  4. Statistical Process Control

  5. ANALYSE CASE STUDIES ON STRATEGY USING THESE ANALYTICAL TOOLS # MBA # BBA # ACCA # CIMA # BCOM

  6. "Decoding Statistical Techniques: A Guide to Methodological Choices in Data Analysis" part A

COMMENTS

  1. Handbook of Statistical Methods for Case-Control Studies

    For these, you will have to look elsewhere (e.g., Rothman, Greenland, and Lash 2008 ). Provided that it is read and used together with such a comprehensive epidemiological text, this new Handbook of Statistical Methods for Case-Control Studies is a valuable and important book, which will be useful for seminars and courses on the developments in ...

  2. Handbook of Statistical Methods for Case-Control Studies

    Handbook of Statistical Methods for Case-Control Studies. Ørnulf Borgan, Norman E. Breslow, Nilanjan Chatterjee, Mitchell H. Gail, Alastair Scott, and Christopher J. Wild, eds. Boca Raton, FL: Chapman & Hall/CRC Press, 2018, xvii+536 pp., $119.95(H), ISBN: 978-1-49-876858-.

  3. Handbook of Statistical Methods for Case-Control Studies

    Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field. It provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods.

  4. Handbook of Statistical Methods for Case-Control Studies (Chapman

    Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field. It provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods.

  5. Analysis of matched case-control studies

    There are two common misconceptions about case-control studies: that matching in itself eliminates (controls) confounding by the matching factors, and that if matching has been performed, then a "matched analysis" is required.

  6. Handbook of Statistical Methods for Case-Control Studies

    ... Brief introductions to case-control studies, including the process involved in finding matched controls, appear in Schulz and Grimes (2002), Grimes and Schulz (2005), and Levin (2006). A...

  7. Handbook of statistical methods for case-control studies. Ø. Borgan, N

    Anika Hüsing. [email protected]; Division of Cancer Epidemiology, German Cancer Research Center, Heidelberg, Germany. Search for more papers by this author

  8. Matched case-control studies: a review of reported statistical

    The analysis of matched data requires specific statistical methods. Methods The objective of this study was to determine the proportion of published, peer-reviewed matched case-control studies that used statistical methods appropriate for matched data.

  9. Handbook of Statistical Methods for Case-Control Studies

    The Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field and published by Chapman & Hall/CRC Press (2018). The handbook provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods.

  10. Design and data analysis case-controlled study in clinical research

    Case-control studies are one of the most frequently used study designs for these purposes. This paper explains basic features of case control studies, rationality behind applying case control design with appropriate examples and limitations of this design.

  11. Statistical analyses of case-control studies

    This article discusses statistical analysis in case-control studies. Advantages and Disadvantages of Case-Control Studies Study Design Participants in a case-control study are chosen for the study depending on their outcome status.

  12. Statistical analysis of case-control studies

    Data Interpretation, Statistical* Humans Methods of analysis of results from case-control studies have evolved considerably since the 1950s. These methods have helped to improve the validity of the conclusions drawn from case-control research and have helped to ensure that the available data are utilized to their fullest extent. Logistic r …

  13. Handbook of Statistical Methods for Case-Control Studies

    Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field. It provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods. The handbook is designed to serve as a ...

  14. Methodology Series Module 2: Case-control Studies

    Case-Control study design is a type of observational study. In this design, participants are selected for the study based on their outcome status. Thus, some participants have the outcome of interest (referred to as cases), whereas others do not have the outcome of interest (referred to as controls).

  15. (PDF) Matched case-control studies: A review of reported statistical

    Abstract and Figures. Case-control studies are a common and efficient means of studying rare diseases or illnesses with long latency periods. Matching of cases and controls is frequently employed ...

  16. Statistical Methods in Cancer Research Volume I: The Analysis of Case

    The case-control study is the major epidemiological approach used to identify risk factors for cancer. This textbook explains the statistical methods and theory behind this design, and the practical application to specific sets of data. It includes chapters on fundamental measures of disease occurrence, analysis of grouped and ungrouped data ...

  17. Statistical Methods for Cohort and Case-Control Studies

    1 Citations Part of the Medizinische Informatik und Statistik book series (MEDINFO,volume 40) Summary Traditional methods of occupational cohort analysis have used the standardized mortality ratio (SMR) as the fundamental measure of association between risk factor and disease.

  18. Basic statistical analysis in genetic case-control studies

    A genetic association case-control study compares the frequency of alleles or genotypes at genetic marker loci, usually single-nucleotide polymorphisms (SNPs) (see Box 1 for a glossary of terms), in individuals from a given population—with and without a given disease trait—in order to determine whether a statistical association exists between th...

  19. Handbook of Statistical Methods for Case-Control Studies

    Handbook of Statistical Methods for Case-Control Studies is written by leading researchers in the field. It provides an in-depth treatment of up-to-date and currently developing statistical methods for the design and analysis of case-control studies, as well as a review of classical principles and methods. The handbook is designed to serve as a reference text for biostatisticians and ...

  20. Statistical methods for biomarker data pooled from multiple nested case

    As an illustrative example, we apply the three methods in a pooling project of nested case-control studies to evaluate (i) the association between circulating vitamin D levels and risk of stroke and (ii) how body mass index modifies the association between circulating vitamin D levels and risk of cardiovascular disease.

  21. Full article: Handbook of Statistical Methods for Randomized Controlled

    A good balance between the statistical theory underlying the method and practical application in the context of randomized controlled clinical trials. Readers can understand the theoretical basics and observe the practical application of the methods to real data. This book includes examples and case studies that use real data.

  22. Case Control Studies

    A case-control study is a type of observational study commonly used to look at factors associated with diseases or outcomes. [1] The case-control study starts with a group of cases, which are the individuals who have the outcome of interest.

  23. Proton pump inhibitors and the risk of inflammatory bowel disease: a

    We read with great interest the population-based cohort study by Abrahami D et al ,1 in which they found that the use of proton pump inhibitors (PPIs) was not associated with an increased risk of inflammatory bowel disease (IBD). However, the assessment of causality in observational studies is often challenging due to the presence of multiple confounding factors. The existence of a causal ...

  24. Early Release

    The Missouri study reported aggregate period rates. The Wisconsin study reported an overall average annual NTM incidence of 22.1-22.4 cases/100,000 persons but included repeat positive samples from individual persons as separate cases. In multivariate modeling across those studies, socioeconomic factors were found to be associated with NTM ...

  25. Designing Case-Control Studies: Decisions About the Controls

    Method. We assume an association study with a case-control design in which investigators are studying a possible association between a genetic marker—say, a single-nucleotide polymorphism—and a disease, with no comorbid conditions. We assume further that the case sample consists solely of correctly classified patients with the disease.