What's Your Question?
What Is a Case Study?
When you’re performing research as part of your job or for a school assignment, you’ll probably come across case studies that help you to learn more about the topic at hand. But what is a case study and why are they helpful? Read on to learn all about case studies.
Deep Dive into a Topic
At face value, a case study is a deep dive into a topic. Case studies can be found in many fields, particularly across the social sciences and medicine. When you conduct a case study, you create a body of research based on an inquiry and related data from analysis of a group, individual or controlled research environment.
As a researcher, you can benefit from the analysis of case studies similar to inquiries you’re currently studying. Researchers often rely on case studies to answer questions that basic information and standard diagnostics cannot address.
Study a Pattern
One of the main objectives of a case study is to find a pattern that answers whatever the initial inquiry seeks to find. This might be a question about why college students are prone to certain eating habits or what mental health problems afflict house fire survivors. The researcher then collects data, either through observation or data research, and starts connecting the dots to find underlying behaviors or impacts of the sample group’s behavior.
During the study period, the researcher gathers evidence to back the observed patterns and future claims that’ll be derived from the data. Since case studies are usually presented in the professional environment, it’s not enough to simply have a theory and observational notes to back up a claim. Instead, the researcher must provide evidence to support the body of study and the resulting conclusions.
As the study progresses, the researcher develops a solid case to present to peers or a governing body. Case study presentation is important because it legitimizes the body of research and opens the findings to a broader analysis that may end up drawing a conclusion that’s more true to the data than what one or two researchers might establish. The presentation might be formal or casual, depending on the case study itself.
Once the body of research is established, it’s time to draw conclusions from the case study. As with all social sciences studies, conclusions from one researcher shouldn’t necessarily be taken as gospel, but they’re helpful for advancing the body of knowledge in a given field. For that purpose, they’re an invaluable way of gathering new material and presenting ideas that others in the field can learn from and expand upon.
MORE FROM QUESTIONSANSWERED.NET
Arcu felis bibendum ut tristique et egestas quis:
- Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris
- Duis aute irure dolor in reprehenderit in voluptate
- Excepteur sint occaecat cupidatat non proident
7.2 - advanced case-control designs, nested case-control study:.
This is a case-control study within a cohort study. At the beginning of the cohort study \((t_0)\), members of the cohort are assessed for risk factors. Cases and controls are identified subsequently at time \(t_1\). The control group is selected from the risk set (cohort members who do not meet the case definition at \(t_1\).) Typically, the nested case-control study is less than 20% of the parent cohort.
Advantages of nested case-control
- Efficient – not all members of the parent cohort require diagnostic testing
- Flexible – allows testing of hypotheses not anticipated when the cohort was drawn (at \(t_0\))
- Reduces selection bias – cases and controls sampled from the same population
- Reduces information bias – risk factor exposure can be assessed with investigator blind to case status
- Reduces power (from parent cohort) because of reduced sample size by 1/(c+1), where c = number of controls per case
Nested case-control studies can be matched , not matched , or counter-matched.
Matching cases to controls according to baseline measurements of one or several confounding variables is done to control for the effect from confounding variables. A counter-matched study, in contrast, is when we matched cases to controls who have a different baseline risk factor exposure level. The counter-matched study design is used to specifically assess the impact of this risk factor; it is especially good for assessing the potential interaction (effect modification!) of the secondary risk factor and the primary risk factor. Counter-matched controls are randomly selected from different strata of risk factor exposure levels in order to maximize variation in risk exposures among the controls. For example, in a study of the risk for bladder cancer from alcohol consumption, you might match cases to controls who smoke different amounts to see if the effect of smoking is only evident at a minimum level of exposure.
Example of a Nested Case-Control Study: Familial, psychiatric, and socioeconomic risk factors for suicide in young people: a nested case-control study . In a cohort study of risk factors for suicide, Agerbo et al. (2002), enrolled 496 young people who had committed suicide during 1981-97 in Denmark matched for sex, age, and time to 24,800 controls. Read how they matched each case to a representative random subsample of 50 people born the same year!
- - Google Chrome
Intended for healthcare professionals
- Access provided by Google Indexer
- My email alerts
- BMA member login
- Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution
- Advanced search
- Search responses
- Search blogs
- Nested case-control...
Nested case-control studies: advantages and disadvantages
- Related content
- Peer review
- Philip Sedgwick , reader in medical statistics and medical education 1
- 1 Centre for Medical and Healthcare Education, St George’s, University of London, London, UK
Researchers investigated whether antipsychotic drugs were associated with venous thromboembolism. A population based nested case-control study design was used. Data were taken from the UK QResearch primary care database consisting of 7 267 673 patients. Cases were adult patients with a first ever record of venous thromboembolism between 1 January 1996 and 1 July 2007. For each case, up to four controls were identified, matched by age, calendar time, sex, and practice. Exposure to antipsychotic drugs was assessed on the basis of prescriptions on, or during the 24 months before, the index date. 1
There were 25 532 eligible cases (15 975 with deep vein thrombosis and 9557 with pulmonary embolism) and 89 491 matched controls. The primary outcome was the odds ratios for venous thromboembolism associated with antipsychotic drugs adjusted for comorbidity and concomitant drug exposure. When adjusted using logistic regression to control for potential confounding, prescription of antipsychotic drugs in the previous 24 months was significantly associated with an increased occurrence of venous thromboembolism compared with non-use (odds ratio 1.32, 95% confidence interval 1.23 to 1.42). The researchers concluded that prescription of antipsychotic drugs was associated with venous thromboembolism in a large primary care population.
Which of the following statements, if any, are true?
a) The nested case-control study is a retrospective design
b) The study design minimised selection bias compared with a case-control study
c) Recall bias was minimised compared with a case-control study
d) Causality could be inferred from the association between prescription of antipsychotic drugs and venous thromboembolism
Statements a , b , and c are true, whereas d is false.
The aim of the study was to investigate whether prescription of antipsychotic drugs was associated with venous thromboembolism. A nested case-control study design was used. The study design was an observational one that incorporated the concept of the traditional case-control study within an established cohort. This design overcomes some of the disadvantages associated with case-control studies, 2 while incorporating some of the advantages of cohort studies. 3 4
Data for the study above were extracted from the UK QResearch primary care database, a computerised register of anonymised longitudinal medical records for patients registered at more than 500 UK general practices. Patient data were recorded prospectively, the database having been updated regularly as patients visited their GP. Cases were all adult patients in the register with a first ever record of venous thromboembolism between 1 January 1996 and 1 July 2007. There were 25 532 cases in total. For each case, up to four controls were identified from the register, matched by age, calendar time, sex, and practice. In total, 89 491 matched controls were obtained. Data relating to prescriptions for antipsychotic drugs on, or during the 24 months before, the index date were extracted for the cases and controls. The index date was the date in the register when venous thromboembolism was recorded for the case. The cases and controls were compared to ascertain whether exposure to prescription of antipsychotic drugs was more common in one group than in the other. Despite the data for the cases and controls being collected prospectively, the nested case-control study is described as retrospective ( a is true) because it involved looking back at events that had already taken place and been recorded in the register.
Selection bias is of particular concern in the traditional case-control study. Described in a previous question, 5 selection bias is the systematic difference between the study participants and the population they are meant to represent with respect to their characteristics, including demographics and morbidity. Cases and controls are often selected through convenience sampling. Cases are typically recruited from hospitals or general practices because they are convenient and easily accessible to researchers. Controls are often recruited from the same hospital clinics or general practices as the cases. Therefore, the selected cases may not be representative of the population of all cases. Equally, the controls might not be representative of otherwise healthy members of the population. The above nested case-control study was population based, with the QResearch primary care database incorporating a large proportion of the UK population. The cases and controls were selected from the database and therefore should be more representative of the population than those in a traditional case-control study. Hence, selection bias was minimised by using the nested case-control study design ( b is true).
The traditional case-control study involves participants recalling information about past exposure to risk factors after identification as a case or control. The study design is prone to recall bias, as described in a previous question. 6 Recall bias is the systematic difference between cases and controls in the accuracy of information recalled. Recall bias will exist if participants have selective preconceptions about the association between the disease and past exposure to the risk factor(s). Cases may, for example, recall information more accurately than controls, possibly because of an association with the disease or outcome. Although in the study above the cases and controls were identified retrospectively, the data for the QResearch primary care database were collected prospectively. Therefore, there was no reason for any systematic differences between groups of study participants in the accuracy of the information collected. Therefore, recall bias was minimised compared with a traditional case-control study ( c is true).
Not all of the patient records in the UK QResearch primary care database were used to explore the association between prescription of antipsychotic drugs and development of venous thromboembolism. A nested case-control study was used instead, with cases and controls matched on age, calendar time, sex, and practice. This was because it was statistically more efficient to control for the effects of age, calendar time, sex, and practice by matching cases and controls on these variables at the design stage, rather than controlling for their potential confounding effects when the data were analysed. The matching variables were considered to be important factors that could potentially confound the association between prescription of antipsychotic drugs and venous thromboembolism, but they were not of interest as potential risk factors in themselves. Matching in case-control studies has been described in a previous question. 7
Unlike a traditional case-control study, the data in the example above were recorded prospectively. Therefore, it was possible to determine whether prescription of antipsychotic drugs preceded the occurrence of venous thromboembolism. Nonetheless, only association, and not causation, can be inferred from the results of the above nested case-control study ( d is false)—that is, those people who were exposed to prescribed antipsychotic drugs were more likely to have developed venous thromboembolism. This is because the observed association between prescribed antipsychotic drugs and occurrence of venous thromboembolism may have been due to confounding. In particular, it was not possible to measure and then control for, through statistical analysis, all factors that may have affected the occurrence of venous thromboembolism.
The example above is typical of a nested case-control study; the health records for a group of patients that have already been collected and stored in an electronic database are used to explore the association between one or more risk factors and a disease or condition. The management of such databases means it is possible for a variety of studies to be undertaken, each investigating the risk factors associated with different diseases or outcomes. Nested case-control studies are therefore relatively inexpensive to perform. However, the major disadvantage of nested case-control studies is that not all pertinent risk factors are likely to have been recorded. Furthermore, because many different healthcare professionals will be involved in patient care, risk factors and outcome(s) will probably not have been measured with the same accuracy and consistency throughout. It may also be problematic if the diagnosis of the disease or outcome changes with time.
Cite this as: BMJ 2014;348:g1532
Competing interests: None declared.
- ↵ Parker C, Coupland C, Hippisley-Cox J. Antipsychotic drugs and risk of venous thromboembolism: nested case-control study. BMJ 2010 ; 341 : c4245 . OpenUrl Abstract / FREE Full Text
- ↵ Sedgwick P. Case-control studies: advantages and disadvantages. BMJ 2014 ; 348 : f7707 . OpenUrl CrossRef
- ↵ Sedgwick P. Prospective cohort studies: advantages and disadvantages. BMJ 2013 ; 347 : f6726 . OpenUrl FREE Full Text
- ↵ Sedgwick P. Retrospective cohort studies: advantages and disadvantages. BMJ 2014 ; 348 : g1072 . OpenUrl FREE Full Text
- ↵ Sedgwick P. Selection bias versus allocation bias. BMJ 2013 ; 346 : f3345 . OpenUrl FREE Full Text
- ↵ Sedgwick P. What is recall bias? BMJ 2012 ; 344 : e3519 . OpenUrl FREE Full Text
- ↵ Sedgwick P. Why match in case-control studies? BMJ 2012 ; 344 : e691 . OpenUrl FREE Full Text
- Research article
- Open access
- Published: 21 July 2008
Advantages of the nested case-control design in diagnostic research
- Cornelis J Biesheuvel 1 , 2 ,
- Yvonne Vergouwe 1 ,
- Ruud Oudega 1 ,
- Arno W Hoes 1 ,
- Diederick E Grobbee 1 &
- Karel GM Moons 1
BMC Medical Research Methodology volume 8 , Article number: 48 ( 2008 ) Cite this article
Despite its benefits, it is uncommon to apply the nested case-control design in diagnostic research. We aim to show advantages of this design for diagnostic accuracy studies.
We used data from a full cross-sectional diagnostic study comprising a cohort of 1295 consecutive patients who were selected on their suspicion of having deep vein thrombosis (DVT). We draw nested case-control samples from the full study population with case:control ratios of 1:1, 1:2, 1:3 and 1:4 (per ratio 100 samples were taken). We calculated diagnostic accuracy estimates for two tests that are used to detect DVT in clinical practice.
Estimates of diagnostic accuracy in the nested case-control samples were very similar to those in the full study population. For example, for each case:control ratio, the positive predictive value of the D-dimer test was 0.30 in the full study population and 0.30 in the nested case-control samples (median of the 100 samples). As expected, variability of the estimates decreased with increasing sample size.
Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies and should also be (re)appraised in current guidelines on diagnostic accuracy research.
Peer Review reports
In diagnostic research it is essential to determine the accuracy of a test to evaluate its value for medical practice [ 1 ]. Diagnostic test accuracy is assessed by comparing the results of the index test with the results of the reference standard in the same patients. Given the cross-sectional nature of a diagnostic accuracy question, the design may be referred to as a cross-sectional cohort design. The (cohort) characteristic by which the study subjects (cohort members) are selected is 'the suspicion of the target disease', defined by the presence of particular symptoms or signs [ 2 ]. The collected study data allow for calculation of all diagnostic accuracy parameters of the index test, such as sensitivity, specificity, odds ratio, receiver operating characteristic (ROC) curve and predictive values, i.e. the probabilities of presence and absence of the disease given the index test result(s).
Subjects are not always selected on their initial suspicion of having the disease but often on the true presence or absence of the disease among those who underwent the reference test in routine care practice, which merely reflects a cross-sectional case-control design [ 3 , 4 ]. Appraisal of such conventional case-control design in diagnostic accuracy research has been limited due to its problems related to the incorrect sampling of cases and controls [ 3 – 7 ]. These problems may be overcome by applying a nested (cross-sectional) case-control study design, which may be advantageous over a full (cross-sectional) cohort design. The rationale, strengths and limitations of a nested case-control approach in epidemiology studies have widely been discussed in the literature [ 8 – 11 ], but not so much in the context of diagnostic accuracy research [ 6 ].
We therefore aim to show advantages of the nested case-control design for addressing diagnostic accuracy questions and discuss its pros and cons in relation to a conventional case-control design and to the full (cross sectional) cohort design in this domain. We will illustrate this with data from a recently conducted diagnostic accuracy study.
Case-control versus nested case-control design
The essence of a case-control study is that cases with the condition under study arise in a source population and controls are a representative sample of this same source population. Not the entire population is studied, what would be a full cohort study or census approach, but rather a random sample from the source population [ 12 ]. A major flaw inherent to case-control studies, described as early as 1959 [ 13 ], is the difficulty to ensure that cases and controls are a representative sample of the same source population. In a nested case-control study the cases emerge from a well-defined source population and the controls are sampled from that same population. The main difference between a case-control and a nested case-control study is that in the former the cases and controls are sampled from a source population with unknown size, whereas the latter is 'nested' in an existing predefined source population with known sample size. This source population can be a group or cohort of subjects that is followed over time or not.
The term 'cohort' is commonly referred to a group of subjects followed over time in etiologic or prognostic research. But in essence, time is no prerequisite for the definition of a cohort. A cohort is a group of subjects that is defined by the same characteristic. This characteristic can be a particular birth year, a particular living area, and also the presence of a particular sign or symptom that makes them suspected of having a particular disease as in diagnostic research. Accordingly, a cross-sectional study can either be a cross-sectional case-control study or a cross-sectional cohort study.
Case-control and nested case-control design in diagnostic accuracy research
In diagnostic accuracy research the case-control design is incorrectly applied when subjects are selected from routine care databases. First, this design commonly leads to biased estimates of diagnostic accuracy of the index test due to referral or (partial) verification bias [ 4 , 14 – 18 ]. In routine care, physicians selectively refer patients for additional tests, including the reference test, based on previous test results. This is good clinical practice but a bad starting point for diagnostic research. As said, for diagnostic research purposes all subjects suspected of the target disease preferably undergo the index test(s) plus reference test irrespective of previous test results. Second, selection of patients with a negative reference test result as 'controls' may lead to inclusion of controls that correspond to a different clinical domain, i.e. patients who underwent the reference test but not necessarily because they were similarly suspected of the target condition [ 16 , 17 ]. A third disadvantage of such case-control design is that absolute probabilities of disease presence given the index test results, i.e. the predictive values or post-test probabilities, that are the desired parameters for patient care, cannot be obtained. Cases and controls are sampled from a source population of unknown size. The total number of patients that were initially suspected of the target disease based on the presence of symptoms or signs, i.e. the true source population, is commonly unknown as in routine care patients are hardly classified by their symptoms and signs at presentation [ 18 ]. Hence, the sampling fraction of cases and controls is unknown and valid estimates of the absolute probabilities of disease presence cannot be calculated [ 12 ].
A nested case-control study in diagnostic research includes the full population or cohort of patients suspected of the target disease. The 'true' disease status is obtained for all these patients with the reference standard. Hence, there is no referral or partial verification bias. The results of the index tests can then be obtained for all subjects with the target condition but only for a sample of the subjects without the target condition. Usually all patients with the target disease are included, but this could as well be a sample of the cases. Besides the absence of bias, all measures of diagnostic accuracy, including the positive and negative predictive values, can simply be obtained by weighing the controls with the case-control sampling fraction, as explained in Figure 1 .
Theoretical example of a full study population and a nested case-control sample . The index test result and the outcome are obtained for all patients of the study population. The case-control ratio was 1:4 (sampling fraction (SF) = 160/400 = 0.40). Valid diagnostic accuracy measures can be obtained from the nested case-control sample, by multiplying the controls with 1/sampling fraction. For example, the positive predictive value (PPV) of a full study population can be calculated with a/(a + b), in this example 30/(30 + 100) = 0.23. In a nested case-control sample the PPV is calculated with a/(a + (1/SF)*b), in this example: 30/(30 + 2.5*40) = 0.23. In a case-control sample however, the controls are sampled from a source population with unknown size. Therefore, the sample fraction is unknown and valid estimate of the PPV cannot be calculated.
Potential advantages of a nested case-control design in diagnostic research
The nested case-control study design can be advantageous over a full cross-sectional cohort design when actual disease prevalence in subjects suspected of a target condition is low, the index test is costly to perform, or if the index test is invasive and may lead to side effects. Under these conditions, one limits patient burden and saves time and money as the index test is performed in only a sample of the control subjects.
Furthermore, the nested case-control design is of particular value when stored data (serum, images etc.) of an existing study population are re-analysed for diagnostic research purposes. Using a nested case-control design, only data of a sample of the full study population need to be retrieved and analysed without having to perform a new diagnostic study from the start. This may for example apply to evaluation of tumour markers to detect cancer, but also for imaging or electrophysiology tests.
Diagnostic accuracy estimates derived from a nested case-control study, should be virtually identical to a full cohort analysis. However, the variability of the accuracy estimates will increase with decreasing sample size. We illustrate this with data of a diagnostic study on a cohort of patients who were suspected of DVT.
A cross-sectional study was performed among a cohort of adult patients suspected of deep vein thrombosis (DVT) in primary care. This suspicion was primarily defined by the presence of a painful and swollen or red leg that existed no longer than 30 days. Details on the setting, data collection and main results have been described previously. [ 19 , 20 ] In brief, the full study population included 1295 consecutive patients who visited one of the participating primary care physicians with above symptoms and signs of DVT. Patients were excluded if pulmonary embolism was suspected. The general practitioner systematically documented information on patient history and physical examination. Patient history included information such as age, gender, history of malignancy, and recent surgery. Physical examination included swelling of the affected limb and difference in circumference of the calves calculated as the circumference (in centimetres) of affected limb minus circumference of unaffected limb, further referred to as calf difference test. Subsequently, all patients were referred to undergo D-dimer testing. In line with available guidelines and previous studies, the D-dimer test result was considered abnormal if the test yielded a D-dimer level ≥ 500 ng/ml. [ 21 , 22 ] Finally, they all underwent the reference test, i.e. repeated compression ultrasonography (CUS) of the lower extremities. In patients with a normal first CUS measurement, the CUS was repeated after seven days. DVT was considered present if one CUS measurement was abnormal. The echographist was blinded to the results of patient history, physical examination, and the D-dimer assay.
Nested case-control samples
Nested case-control samples were drawn from the full study population (n = 1295). In all samples, we included always all 289 cases with DVT. Controls were randomly sampled from the 1006 subjects without DVT. We applied four different and frequently used case-control ratios, i.e. one control for each case (1:1), two controls for each case (1:2), three controls for each case (1:3) and four controls for each case (1:4). For example, a sample with case-control ratio of 1:1 contained 289 cases and 289 random subjects out of 1006 controls (sampling fraction 289/1006 = 0.287). In the 1:4 approach, we sampled with replacement. For each case-control ratio, 100 nested case-control samples were drawn.
We focussed on two important diagnostic tests for DVT, i.e. the dichotomous D-dimer test and the continuous calf difference test. The latter was specifically chosen as it allowed for the estimation and thus comparison of the area under the ROC curve (ROC area). Diagnostic accuracy measures of both tests were estimated for the four case-control ratios and compared with those obtained from the full study population. Measures of diagnostic accuracy included sensitivity and specificity, positive and negative predictive values and the odds ratio (OR) for the D-dimer test, and the OR and the ROC area for the calf difference test.
In the analysis of the nested case-control samples, we multiplied control samples by [1/sample fraction] corresponding to the case-control ratio (1:1 = 3.48; 1:2 = 1.74; 1:3 = 1.16; 1:4 = 0.87). For each case-control ratio, the point estimates and variability were determined. The median estimate of the 100 samples was considered as the point estimate. Analyses were performed using SPSS version 12.0 and S-plus version 6.0.
In the full study population, the prevalence of DVT was 22% (n = 289), the D-dimer test was abnormal in 69% of the patients (n = 892) and the mean difference in calf circumference was 2.3 cm (Table 1 ). The prevalence of DVT was 50%, 33%, 25% and 20% in the nested case-control samples as a result of the sampling ratios (1:1, 1:2, 1:3 and 1:4, respectively). The distributions of the test characteristics in the control samples were similar as for the patients from the full study population without DVT (Table 1 ).
In the full study population the sensitivity and negative predictive value were high for the D-dimer test, 0.94 and 0.96, respectively (Table 2 ), whereas the specificity and positive predictive value were relatively low. The OR for the calf difference test was 1.44 and the ROC area was 0.69.
The average estimates of diagnostic accuracy for each of the four case-control ratios were similar to the corresponding estimates of the full study population (Figure 2 ). For example, the negative predictive value of the D-dimer test was 0.955 in both the full study population and for the four case-control ratios. The OR of the calf difference test was 1.44 in the full study population and the OR derived from the nested case-control samples were on average also 1.44.
Estimates of diagnostic accuracy of the D-dimer test and calf difference test for the 100 nested case-control samples with case-control ratios ranging from 1:1 to 1:4 . The boxes indicate mean values and corresponding interquartile ranges (25 th and 75 th percentile). Whiskers indicate 2.5 th and 97.5 th percentiles. The dotted lines represent the values estimated in the full study population.
The use of (conventional) case-control studies in diagnostic research has often been associated with biased estimates of diagnostic accuracy, due to the incorrect sampling of subjects [ 3 – 6 , 18 ]. Moreover, this study design does not allow for the estimation of the desired absolute disease probabilities. We discussed and showed that a case-control study nested within a well defined cohort of subjects suspected of a particular target disease with known sample size can yield valid estimates of diagnostic accuracy of an index test, including the absolute probabilities of disease presence or absence. Diagnostic accuracy parameters derived from a full (cross-sectional) cohort of patients suspected of DVT were similar to the estimates derived from various nested case-control samples averaged over 100 simulations. Expectedly, the variability decreased with increasing number of controls, making the measures estimated in the larger case-control samples more precise.
As discussed, the number of subjects from which the index test results need to be retrieved can substantially be reduced with a nested case-control design. Hence, the nested case-control design is particularly advantageous when the prevalence of the target condition in the cohort of patients suspected of the target disease is rare, when the index test results are costly or difficult to collect and for re-analysing stored images or specimen. However, precision of the diagnostic accuracy measures will be hampered by increased variability when too little control patients are included.
Rutjes et al nicely discussed limitations of different study designs in diagnostic research [ 6 ]. They proposed the 'two-gate design with representative sampling' (which resembles the nested case-control design in this paper) as a valid design. We confirmed their proposition with a quantitative analysis of a diagnostic study. Rutjes et al suggested not to use the term 'nested case-control' to prevent confusion with etiologic studies where this design is commonly applied. Indeed, diagnostic and etiologic research differs fundamentally, first and foremost on the concept of time. Diagnostic accuracy studies are, in contrast to etiologic studies, typically cross-sectional in nature. Furthermore, diagnostic associations between index and reference tests are purely descriptive, whereas in etiologic studies causal associations and potential confounding are involved. Despite these major differences we believe there is no reason not to use the term nested case-control study in diagnostic research as well. The term inherently refers to the method of sampling of study subjects which can be the same in a diagnostic or etiologic setting, and has no direct bearing on the other issues typically related to etiologic case control studies.
Our findings support the view that the nested case-control study is a valid and efficient design for diagnostic studies. We believe that the nested case-control approach should be applied more often in diagnostic research, and also be (re)appraised in current guidelines on diagnostic methodology.
Knottnerus JA, van Weel C, Muris JW: Evaluation of diagnostic procedures. BMJ. 2002, 324 (7335): 477-480. 10.1136/bmj.324.7335.477.
Article PubMed PubMed Central Google Scholar
Knottnerus JA, Muris JW: Assessment of the accuracy of diagnostic tests: the cross-sectional study. J Clin Epidemiol. 2003, 56 (11): 1118-1128. 10.1016/S0895-4356(03)00206-3.
Article CAS PubMed Google Scholar
Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, Meulen van der JHP, Bossuyt PMM: Empirical evidence of design-related bias in studies of diagnostic tests. JAMA. 1999, 282: 1061-1066. 10.1001/jama.282.11.1061.
Rutjes AW, Reitsma JB, Di Nisio M, Smidt N, van Rijn JC, Bossuyt PM: Evidence of bias and variation in diagnostic accuracy studies. CMAJ. 2006, 174 (4): 469-476.
Whiting P, Rutjes AW, Reitsma JB, Glas AS, Bossuyt PM, Kleijnen J: Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med. 2004, 140 (3): 189-202.
Article PubMed Google Scholar
Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM: Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem. 2005, 51 (8): 1335-1341. 10.1373/clinchem.2005.048595.
Kraemer H: Evaluating Medical Tests. 1992, London, UK , Sage Publications
Mantel N: Synthetic retrospective studies and related topics. Biometrics. 1973, 29 (3): 479-486. 10.2307/2529171.
Essebag V, Genest J, Suissa S, Pilote L: The nested case-control study in cardiology. Am Heart J. 2003, 146 (4): 581-590. 10.1016/S0002-8703(03)00512-X.
Ernster VL: Nested case-control studies. Prev Med. 1994, 23 (5): 587-590. 10.1006/pmed.1994.1093.
Langholz B: Case-Control Study, Nested. Encyclopedia of Biostatistics. Edited by: Armitage PCT. 2005, New York , John Wiley & Sons, 646-665. 2nd
Rothman KJ, Greenland S: Modern epidemiology. 1998, Philadelphia , Lincot-Raven Publishers, Second
Mantel N, Haenszel W: Statistical aspects of the analysis of data from retrospective studies of disease. J Natl Cancer Inst. 1959, 22 (4): 719-748.
CAS PubMed Google Scholar
Ransohoff DF, Feinstein AR: Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med. 1978, 299 (17): 926-930.
Begg CB, Greenes RA: Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics. 1983, 39: 297-215. 10.2307/2530820.
Article Google Scholar
Knottnerus JA, Leffers JP: The influence of referral patterns on the characteristics of diagnostic tests. J Clin Epidemiol. 1992, 45: 1143-1154. 10.1016/0895-4356(92)90155-G.
van der Schouw YT, van Dijk R, Verbeek ALM: Problems in selecting the adequate patient population from existing data files for assessment studies of new diagnostic tests. J Clin Epidemiol. 1995, 48: 417-422. 10.1016/0895-4356(94)00144-F.
Oostenbrink R, Moons KG, Bleeker SE, Moll HA, Grobbee DE: Diagnostic research on routine care data: prospects and problems. J Clin Epidemiol. 2003, 56 (6): 501-506. 10.1016/S0895-4356(03)00080-5.
Oudega R, Hoes AW, Moons KG: The Wells rule does not adequately rule out deep venous thrombosis in primary care patients. Ann Intern Med. 2005, 143 (2): 100-107.
Oudega R, Moons KG, Hoes AW: Limited value of patient history and physical examination in diagnosing deep vein thrombosis in primary care. Fam Pract. 2005, 22 (1): 86-91. 10.1093/fampra/cmh718.
Perrier A, Desmarais S, Miron M, de Moerloose P, Lepage R, Slosman D, Didier D, Unger P, Patenaude J, Bounameaux H: Non-invasive diagnosis of venous thromboembolism in outpatients. Lancet. 1999, 353: 190-195. 10.1016/S0140-6736(98)05248-9.
Schutgens RE, Ackermark P, Haas FJ, Nieuwenhuis HK, Peltenburg HG, Pijlman AH, Pruijm M, Oltmans R, Kelder JC, Biesma DH: Combination of a normal D-dimer concentration and a non-high pretest clinical probability score is a safe strategy to exclude deep venous thrombosis. Circulation. 2003, 107 (4): 593-597. 10.1161/01.CIR.0000045670.12988.1E.
The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/8/48/prepub
For this research project we received financial support from the Netherlands Organization for Scientific Research, grant number: ZON-MW904-66-112. The funding source had no influence on the design, data analysis and report of this study.
Authors and affiliations.
Julius Center for Health Sciences and Primary Care, University Medical Center, Utrecht, The Netherlands
Cornelis J Biesheuvel, Yvonne Vergouwe, Ruud Oudega, Arno W Hoes, Diederick E Grobbee & Karel GM Moons
The Children's Hospital at Westmead, Sydney, Australia
Cornelis J Biesheuvel
You can also search for this author in PubMed Google Scholar
Correspondence to Karel GM Moons .
The authors declare that they have no competing interests.
All authors commented on the draft and the interpretation of the findings, read and approved the final manuscript. CJB was responsible for the design, statistical analysis and wrote the original manuscript. YV was responsible for the design and statistical analysis. RO was responsible for the data collection. AWH was responsible for expertise in case-control design. DEG and KGMM were responsible for conception and design of the study and coordination.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Authors’ original file for figure 1
Authors’ original file for figure 2, rights and permissions.
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Reprints and Permissions
About this article
Cite this article.
Biesheuvel, C.J., Vergouwe, Y., Oudega, R. et al. Advantages of the nested case-control design in diagnostic research. BMC Med Res Methodol 8 , 48 (2008). https://doi.org/10.1186/1471-2288-8-48
Received : 07 March 2008
Accepted : 21 July 2008
Published : 21 July 2008
DOI : https://doi.org/10.1186/1471-2288-8-48
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
- Diagnostic Accuracy
- Deep Vein Thrombosis
- Target Disease
- Diagnostic Accuracy Study
BMC Medical Research Methodology
Nested case-control studies
- 1 Department of Epidemiology and Biostatistics, School of Medicine, University of California, San Francisco 94143-0560.
- PMID: 7845919
- DOI: 10.1006/pmed.1994.1093
The nested case-control study design (or the case-control in a cohort study) is described here and compared with other designs, including the classic case-control and cohort studies and the case-cohort study. In the nested case-control study, cases of a disease that occur in a defined cohort are identified and, for each, a specified number of matched controls is selected from among those in the cohort who have not developed the disease by the time of disease occurrence in the case. For many research questions, the nested case-control design potentially offers impressive reductions in costs and efforts of data collection and analysis compared with the full cohort approach, with relatively minor loss in statistical efficiency. The nested case-control design is particularly advantageous for studies of biologic precursors of disease. To advance its prevention research agenda, NIH might be encouraged to maintain a registry of new and existing cohorts, with an inventory of data collected for each; to foster the development of specimen banks; and to serve as a clearinghouse for information about optimal storage conditions for various types of specimens.
- Case-Control Studies*
- Cohort Studies
- Preventive Medicine
- I have forgotten my password
Medicina Intensiva is the journal of the Spanish Society of Intensive and Critical Care Medicine and Coronary Units (SEMICIUC), and has become the reference publication in Spanish in its field. Medicina Intensiva mainly publishes Original Articles, Reviews, Clinical Notes, Images in Intensive Medicine, and Information relevant to the specialty. All works go through a rigorous selection process.
From the 1 st of January 2022 onwards, it will be mandatory to submit the conflict of interest of each author with the second submission of the manuscript (see instructions for authors ).
Index Medicus / MEDLINE / EMBASE / Excerpta Medica / SCOPUS / MEDES / Science Citation Index Expanded, Journal of Citation Reports
The Impact Factor measures the average number of citations received in a particular year by papers published in the journal during the two preceding years. © Clarivate Analytics, Journal Citation Reports 2022
CiteScore measures average citations received per document published.
SRJ is a prestige metric based on the idea that not all citations are the same. SJR uses a similar algorithm as the Google page rank; it provides a quantitative and qualitative measure of the journal's impact.
SNIP measures contextual citation impact by wighting citations based on the total number of citations in a subject field.
- Palabras clave
- Cohort and case-control studies. hybrid studies
- Cohort studies
- Case-control studies
- Case-control studies nested in a cohort
- Selection of controls
- Measures of association in nested case-control studies
- Practical application to research in intensive care
- Introduction to competitive risk
- Calculation of the cumulative incidence function
- Modeling and effect of the covariables
- Key aspects
- Construction of a classification tree
- Number of nodes
- Advantages and disadvantages with respect to other multivariate models
- Conflicts of interest
In nested case-control studies, sampling of controls is usually done by density of incidence and pairing. With regard to the classic control cases studies, nested ones are more efficient, allow the calculation of the incidence of the disease and they have more internal validity due to the lower presence of bias. Competitive risks techniques can be used if we study different types of events and focus on the time and type of the first event. Recursive partitioning is a type of multivariate analysis whose purpose is the construction of classification algorithms, and it is especially useful when there are a large number of predictive variables with complex relationships with the event.
En los estudios de casos y controles anidados, el muestreo de los controles se hace habitualmente por densidad de incidencia y mediante emparejamiento. Con respecto a los casos control clásicos, son más eficientes, permiten el cálculo de la incidencia de la enfermedad y cuentan con más validez interna por la menor presencia de sesgo. Las técnicas de riesgos competitivos pueden usarse si se estudian diferentes tipos de eventos y nos centramos en el tiempo y el tipo del primer evento. El particionamiento recursivo es un tipo de análisis multivariante cuyo propósito es la construcción de algoritmos de clasificación, especialmente útiles cuando hay un gran número de variables predictoras con relaciones complejas con el evento objeto de estudio.
Research in health inevitably begins with the definition of the clinical problem we are dealing with and which we seek to resolve. While this may seem obvious, the need to ask ourselves what we want to do, what the reasons are, and whether someone else has already asked the same questions might not be so obvious.
It is necessary to contrast the information we intend to generate in relation to the clinical problem of our patients in the Intensive Care Unit (ICU) with the data found in the literature. We need to be consequent with the available evidence and with our objectives. In other words, it currently does not seem pertinent to conduct an observational study on the effect of an adequate antibiotic treatment upon mortality among critical patients with septic shock. We always must seek to carry out quality studies with an impact. This does not mean that we are always obliged to conduct randomized experimental studies, though it is also meaningless to carry out just one more of a long series of descriptive cohort studies of limited local value. Another important issue is ethics. Before conducting a study, we must take ethical particulars into account, since we always must remember that the ultimate aim of research is to improve patient quality of life. This means that it would be clearly unacceptable to carry out a clinical trial in which we adequately treat a group of patients and decide to suspend such treatment to see if mortality increases as a result.
Will we be carrying out some kind of intervention?
Yes → clinical trial (experimental design)
No → observational study
Do the data we are going to use correspond to individuals or to groups of individuals?
Individuals → individual studies
Do we have a causal hypothesis or is a description first needed to establish the hypothesis?
Descriptive → cross-sectional studies
How are we going to measure the causal relationship?
Forwards → (exposure → effect). Cohort study
Backwards → (effect → exposure). Case-control study
In this chapter we will focus on some advanced designs that are still little used in clinical research in general and in the ICU setting in particular.
In recent times some studies have made use of designs that are somewhat different from what we commonly see in scientific publications. This may be because the authors seek to overcome as far as possible the limitations of the so-called classical methods, or because of the rising interest and advances of these more modern and robust methods. Both explanations are probably involved, however. This chapter will deal with one such design, included among the so-called hybrid studies, specifically the nested case-control study design.
Within research methodology, the most important area may be that referred to study design, for in sum, if we want to answer a research question arising from the observation of our patients, we need to know how to adequately design a study in order for the conclusions drawn to possess the required validity. It is not enough to simply have a question requiring an answer; we also need to know what design or type of study is adequate for the purpose.
Since case-control studies nested in a cohort are a kind of blend between cohort studies and case-control studies, we feel that both types of study should be contextualized here.
These two types of studies are longitudinal and analytical observational studies of individual data. In other words, they are studies in which we do not intervene but only observe what happens; each subject is a unit of the study we carry out over time to verify a cause-effect hypothesis. In practice, these are the most numerous studies, since they afford a good level of evidence without the need for great resources. A good summary of the differences between both types of studies can be found in Fig. 1 .
Differences between cohort studies and case-control studies.
A cohort is a group of patients that have at least one characteristic in common and are observed over a period of time, e.g., patients with ventilator-associated pneumonia, patients with ischemic stroke subjected to anticoagulation therapy, or septic shock patients with hypoxemia. This type of design is used to observe patients that are or have been exposed to a certain factor or circumstance, establishing comparisons of the prevalence or incidence of a certain event with respect to another group that is not or has not been exposed to the same factor. Therefore, the most logical chronology for the conduction of studies of this kind would involve observation from a given point in time onwards. The most advantageous consequence of this approach is the possibility of calculating the incidence of the event and therefore the relative risk (RR) of its occurrence between exposed and non-exposed individuals. 1
The usefulness of cohort studies is that they allow us to verify causal hypotheses. In other words, they allow us to reject or accept a certain hypothesis alternative to another null hypothesis initially accepted up to that time, due to a lack of elements of judgment allowing us to replace it with another hypothesis supported by greater evidence. In fact, cohort studies are the best design for identifying causal associations between a risk factor and a disease (where experimental studies cannot be made). However, their main limitation is referred to the comparability of the groups under study, i.e., determination of whether the two groups being compared (exposed versus non-exposed) are interchangeable.
We can illustrate this with an example. Suppose that our hypothesis is that the administration of adequate antibiotic treatment prior to admission to the ICU of septic shock patients reduces in-hospital mortality. Our sampling population would be the patients admitted under conditions of septic shock, separating them into individuals that receive adequate antibiotic treatment prior to admission to the ICU and individuals that receive such treatment once already admitted to the ICU. Follow-up is carried out for some months and we finally compare the in-hospital mortality in the two groups. The main defect of the study is that we do not know whether the group of patients treated before admission to the ICU is identical to the group treated after admission to the ICU. In other words, is the fact of administering adequate antibiotic treatment prior to admission to the ICU influenced by some other variable we have not considered?
Among the cohort studies, and conditioned to the timing of inclusion, we can find fixed or dynamic cohorts. Depending on the selection of the cohorts, the latter can involve internal or external comparisons and, according to the start of the study, they may be prospective or retrospective. A retrospective cohort does not mean that the chronological orientation is from the time of appearance of the Event to the study of the Factor (E > F), but that the information is retrieved from the past and not from the present time. 1
Case-control studies involve a non-experimental analytical epidemiological design, i.e., they are based on observation, and a priori are more efficient in verifying or contrasting hypotheses. In studies of this kind we start with the effect or event, and we seek to study its antecedents. Two groups of patients are selected for this purpose, called cases and controls, according to whether the effect (disease, death or other) appears or not. The groups are compared for previous exposures or characteristics to determine whether they are associated to the study effect or not. Therefore, the most common chronology of the observation is that which takes the previous exposures or characteristics into account, and from there we try to determine whether they are associated to the effect under study or not. For this reason the case-control design goes not allow us to calculate the incidence or RR, except in infrequent situations.
In contrast, the measure of association used in these studies is the odds ratio (OR). This measure could be understood as the ratio between the proportion of patients with antecedents of exposure to the factor under study and the proportion without such previous exposure. In other words, if there is no association between exposure and effect, there will be no reason to believe that such exposure occurs differently between cases and controls, and the OR therefore would be equal to 1.
The main disadvantages of this type of design are its increased vulnerability to the presence of certain systematic errors or biases; the inability to detect weak associations between exposure and response; and the fact that it may prove difficult (and sometimes almost impossible) to validate the information obtained regarding exposure.
In practice, case-control studies are made because we have a series of cases and wish to analyze the predisposing factors that have generated those cases, based on comparison against a control population. What we must remember is that both the cases and the controls must come from one same original cohort; if this is not the case, i.e., if the cases and controls represent different populations, we run into what is known as Berkson's bias. 2
The term “control” is used in experimental epidemiology in reference to the group that receives the conventional treatment or placebo, though it must be remembered that case-control studies are of an observational nature and should not be confused with clinical trials or interventional studies ( Fig. 2 ).
Conditions of case-control studies.
Any patient in the ICU without nosocomial infection
Patients with nosocomial infection caused by other organisms
Patients of the same age range and gender
The correct answer would be any patient in the ICU, since any patient in the hospital would be susceptible to nosocomial infection but has not developed such an infection. Both the cases and controls come from the same original cohort, i.e., patients admitted to the ICU.
After our brief description of cohort studies and case-control studies, it should be mentioned that case-control studies nested in a cohort (nested case-control studies) belong to what is commonly referred to as “hybrid” studies, since they possess features of both cohort studies and case-control studies, though obviating some of their limitations.
The first known hybrid study was published in 1962 and analyzed the relationship between in utero exposure to X-rays and the subsequent risk of cancer. 3 Nested studies analyze all the cases appearing in a stable cohort followed-up on over time, and the controls consist of a sample of subjects from that same cohort. Investigators commonly have a cohort they have been studying and following-up on for a certain period of time, with the compilation of different types of data and the filing of imaging studies and/or samples, with the purpose of conducting a future study when the patients are seen to produce unexpected responses. In other words, we have information on possible exposure, and when the response occurs we already have data with which to work and explore possible causal relationships.
This means that we are monitoring a dynamic population (that in which stability of the entry and exit of individuals is assumed) to detect all the cases of the target disease. These cases in turn are compared with a reference group (not necessarily controls as understood up until to now) that has been selected on a random basis or by pairing of the same population from which the cases originate. 4
In general terms, we can distinguish two types of nested studies: simple nested studies and those that use density of incidence. Both types may be either prospective or retrospective. In the first case the response is infrequent, and an initial measurement of exposure is sufficient. The investigator first identifies all the participants of the cohort that exhibit the response at the end of follow-up (cases), and then establishes a random sample of those who have not exhibited the response (controls). The investigator then analyzes the predictive variables in both groups and compares the levels or categories of the risk factor in the cases against the controls. In studies involving density of incidence, follow-up may be variable, or exposure may vary over time. These are therefore dynamic cohorts, and sampling of the controls is made by density of incidence and pairing; we therefore need to wait for all the cases to have been generated in order to select the reference population. Here measurement at a single point is not enough and we must consider that the controls need to be selected as individuals belonging to the same cohort and exposed in the same way as the cases, i.e., individuals at risk, but who have not yet shown the response. In this design, since the controls are patients from the initial cohort, we lose statistical precision – though this fact is partly compensated by the decrease in the number of subjects studied, by the lesser cost of data compilation, and by a usually shorter duration of follow-up ( Fig. 3 ).
Simple nested case-control study.
Adapted from Hulley, 5 2014.
In nested case-control studies the information referred to the risk factors of interest and the principal variables have been compiled at the start of follow-up on a prospective basis and before the disease develops; as a result, there is a lesser risk of incurring in the classical information bias of case-control studies, which are of a retrospective nature.
We start from a large initial cohort which – as has been commented – is often available from previous studies. This cohort is used to generate a case-control design in order to reduce the number of subjects in which independent variables or covariables need to be managed (instead of having to consider those of the entire cohort for the statistical analysis). Case selection is immediate, since these are our patients. We first need to identify them, assuming a case definition as homogeneous as possible. The only particularity here is that we collect all the cases during a given period and in a defined population. Furthermore, since the incidence of most diseases studied is relatively low, it is of interest to select all the cases appearing in the cohort – though any other sampling fraction could be used.
In fact, according to the sampling method used based on the individual patients of the initial cohort and yielding two groups, we will have different types of nested designs: case-control studies nested in a cohort, and cohort and case studies. In the case of nested case-control studies, we use a sampling scheme known as risk group sampling, since the selection of an individual as a control depends on this individual being at risk, i.e., he or she must be a member of the cohort at the time when the case is selected or identified. The cases and the global individuals at risk that do not develop the event constitute the risk group.
With regard to the selection of controls in these nested studies, the method described in the case of classical case-control studies proves acceptable. It is advisable to pair them considering confounding and time-dependent variables such as for example the years the cases have been included in the cohort. In this way one same individual can serve as control several times and become a case at some other time – a fact that must be considered in the statistical analysis of the study. Furthermore, pairing for time-dependent variables limits the analysis of such variables in the hybrid nested designs, though if exposure is time-dependent, these studies do not have to compile information beyond the time of case selection.
Although one control is usually selected per case, if the study sample size is limited we can select more than one control per case with a view to boosting the statistical power of the study, provided the proportion of 4:1 (four controls for every case) is not exceeded.
Pairing is a method that is relatively simple to understand and offers some important advantages, including the capacity to balance cases and controls in the stratum of the variable for which they are paired. In this way, if pairing is perfect (in the case of dichotomic or quantitative variables where the same threshold is used for pairing), the control of confounding influences is almost total. Furthermore, pairing allows us to detect interactions between exposure and the factor used for pairing. In contrast, pairing also has some limitations, including the fact that this is a time-consuming method, and it is essential to apply specific statistical tests for paired data. The complexity of the analysis increases as a result and is almost never accompanied by a parallel increase in the precision of estimation of the parameters. Moreover, if the variable used for pairing is not a confounding variable, the final estimation will be imprecise. In addition to these inconveniences, the development of multivariate regression models has relegated pairing as a system for the control of confounders.
In contrast to the classical case-control studies, in nested studies, since the cases are identified a priori and are recorded as the study response or disease manifests, the incidence measured as density can be calculated without problems, and this will allow us to estimate relative risks. This is an important difference with respect to the conventional case-control studies in which the OR is usually calculated as measure of association, since the OR can only be similar to RR when the prevalence of the effect is very low. Accordingly, the difference between OR and RR increases as the incidence of the disease under study increases. 6,7
As an example, if we wish to study nosocomial infection in the ICU – a frequent problem with a prevalence according to the local epidemiology of over 20% – the nested design would not be the most appropriate strategy for studying the risk factors underlying such infection, due to the strong distortion between OR and RR – though it could be used to study the prolongation of stay attributable to nosocomial infection.
These characteristics must be taken into account in the analysis, which proves somewhat more complex, though with the advantage that the OR is always a statistically non-biased estimator of the risk ratio. Furthermore, these studies are very efficient in analyzing a risk factor or for controlling a confounding factor if the necessary information for the entire cohort is not available or, if such information is available, obtaining it proves very expensive – as when having to perform measurements in biological samples, for example.
In order to carry out a nested case-control study, we first define the initial cohort of patients to be studied and establish the risk period. This is followed by identification of the cases, including the appearance dates, and we then obtain a sample of controls paired to each of the cases. Lastly, we define and quantify the predictive variables. In using this type of selection it is clear that a subject initially identified as a control could develop the event of interest during follow-up and subsequently be selected as a case. In the presence of any selection bias, the fact that controls are subsequently selected as cases compensates such bias to a degree. In any case, this situation is not a source of error or bias, since in cohort studies one same individual can contribute both to the numerator and to the denominator, and this same situation is maintained in this type of strategy.
This type of design is recommended for studying infrequent diseases in dynamic cohorts in which the determination of exposure and its changes over time, in all the cohort members, would prove very costly.
Another situation in which this design is recommended is when costly determinations are required. An example could be its use in a key line of research in recent years, focused on the construction of predictive models allowing us to determine as early as possible the probability of developing certain syndromes or disorders directly related to a poor clinical outcome. This comprises the study of different diagnostic or prognostic biomarkers as risk indicators. Their use is becoming particularly relevant in the critical care setting, fundamentally due to the fact that they represent a scantly invasive way of determining patient susceptibility to certain events such as sepsis, or of knowing how their measurement at certain timepoints is correlated to clinical outcomes of great relevance, such as mortality in the ICU. 8–10
A practical case of the application of this type of nested design in an ICU is the study of risk factors for readmission to the ICU following an initial stay among liver transplant patients, recently carried out by a Canadian group. 11
In this study the authors used a case-control nested in a cohort of liver transplant patients design in which each case (i.e., each transplant patient requiring admission to the ICU) was randomly assigned a control forming part of the cohort. The cohort in this case was represented by all the transplant patients in the study period (7 years). As mentioned, this type of design is used for the study of scantly prevalent events.
Following analysis of the data, with statistical comparison of the cases (patients readmitted to the ICU) and controls (patients without the need for readmission to the ICU), the authors concluded that readmission to the ICU has a negative impact upon the clinical outcome of these patients, and they moreover specified which factors are related to such need for readmission. 12
Another example of the use of this type of hybrid design could be the study of the consequences of mesenteric embolization following aortoiliac endovascular surgery. In this example, the authors selected the controls on a random basis but additionally paired them for age and gender. 12
As a final comment, nested case-control studies are more similar to classical case-control studies than to cohort studies. The fundamental difference between them is that in the nested design sampling of the controls is usually performed by density of incidence and with pairing. These studies are more efficient, allow us to calculate the incidence of the disease, and have more internal validity as a consequence of the lesser presence of bias.
In prospective evaluation studies, the outcome is obtained from the longitudinal evaluation of a cohort of subjects in a period of time until the phenomenon of interest occurs (referred to as the event). As an example, the event may be death, myocardial infarction or the recurrence of disease. The statistical analysis used for estimating these outcomes is known as analysis of time to event or – more commonly – analysis of survival. The most frequent method for estimating the probability of an event is the nonparametric approach, generally referred to as the Kaplan–Meier (KM) method.
The KM method analyzes the subjects that experience the event in a certain period of time, and the subjects who do not experience the event and do not complete follow-up are referred to as censored cases, since they do not present the event of interest. It is not infrequent for a participant in a study to experience more than one type of event during follow-up. A situation of competitive risk (CR) is observed when the appearance of a type of event modifies the capacity to observe the event of interest of the study.
A clear example of this is when the event to be studied is patient survival after heart valve replacement surgery as treatment for infectious endocarditis. In this case the CR is the suffering of stroke during admission, since such patients cannot be subjected to surgery.
There are many examples in the literature of the use of these CR techniques, 8,13–15 though the main issue for the investigator is to decide whether or not to take CR into account. If CR is not taken into account, the analysis is limited to the usual time to event analysis. However, this approach overestimates the true probability. 16–20 The magnitude of the overestimation is what should cause us to decide whether to take CR into account or not.
Returning to the previous example, the mortality rate after heart surgery for the treatment of endocarditis is 40%, but 25% of the patients have neurological complications before surgery – thus indicating that the estimate may be very different from what is actually observed.
When encountering data with CR, it is essential to estimate the absolute risk of occurrence of an event of interest to a timepoint t over follow-up. This risk is calculated by the cumulative incidence function (CIF), which is defined for each type of event separately and increases over time. The CIF of an event at timepoint t is defined as the probability that an event of this type will occur at any time between baseline and timepoint t. If the data do not include censored individuals, the CIF at timepoint t can be estimated as the proportion of subjects that experience this type of event until timepoint t divided by the total number of subjects in the global body of data. As time progresses, the CIF increases from zero to the total proportion of events of this type in the data.
Cox proportional hazard models are used to assess the effect of the covariables and an event of interest in the absence of CR. 21 Such models are difficult to interpret in the presence of CR, however.
A number of regression models have been proposed in CIF – the most popular being the model of Fine and Gray, 22 which has also been incorporated to the main statistical packages, including R, STATA and SAS. 23,24 The resulting effect measure for each covariable is called the subdistribution hazard ratio (sHR). While the numerical interpretation of sHR is not direct, sHR = 1 means that there is no association between the covariable and the corresponding CIF; sHR > 1 means that an increase in the value of the covariable is associated to increased risk; and sHR 1 implies the opposite. Moreover, the further sHR is from 1, the greater the estimated effect size in CIF. The assumption of risk proportionality over follow-up remains a requirement.
Competitive risk occurs when during the observation period for a specific event of interest other events may occur that can modify the occurrence of that event. In a more general sense, CR methods can be used if different types of events are studied and we focus on the time and type of the first event.
The basic descriptive statistics of the CR data comprise the CIF, which describes the absolute risk of an event of interest over time. The KM method should not be used in the presence of competing events, since it overestimates the true absolute risk.
A complication of CR is that the covariables can affect the absolute risk and the event rate differently. Regression models based on CR (e.g., Fine–Gray models) explore the association between the covariables and absolute risk, and are therefore essential for medical decision making and for prognostic research questions. On the other hand, specific event rate models (e.g., specific Cox proportional hazard models) are to be preferred for answering etiological research questions.
A full description of the CR data should include modeling of all the types of events and not only the main event of interest.
The CR models can evaluate the effect of an intervention upon the individual components of a composite assessment criterion.
Recursive partitioning is a type of multivariate analysis used to produce classification algorithms. These algorithms were first published in 1963, 25 and in turn gave rise to other algorithms over the years. 26 The most widely used in the field of health was introduced by Breiman et al. in 1984. 27 With these tools we can classify observations and develop prediction systems based on a series of decision rules.
These algorithms are useful when the studied event has numerous predictor variables with complex relationships among them, and are widely used in bioinformatics and in genetic studies. 28
Classification and regression trees are a nonparametric procedure for the prediction of a dependent variable or response on the basis of a series of independent variables or predictors. The response may be of a categorical nature.
The tree is constructed through the recurrent division of data. This division of the population seeks to produce subpopulations that are homogeneous with respect to the dependent variable. These partitions are successively repeated until the degree of homogeneity cannot be further incremented through another partition. 29 The choice of variable for performing the partition is always based on a criterion of homogeneity of the subpopulations resulting from the partition. Complete homogeneity of the nodes is rarely achieved, but there are functions that determine the degree of impurity as a measure of the degree of homogeneity of the nodes.
If we are talking about total mortality in a series of patients with infectious endocarditis, 30 we have three variables that can serve to classify patient mortality, namely age, gender and the type of affected valve (native/prosthetic). The total mortality rate of the cohort is 29.8% (401/1345). The mortality rate as per male gender is 29%, that of patients over 70 years of age is 41%, and that of patients with endocarditis over a prosthetic valve is 40%.
If we divide the initial node as represented in A and the impurity of each node is:
I (initial) = 0.70 × (1–0.70) + 0.30 × (1–0.30) = 0.42
I (masculine) = 0.71 × (1–0.71) + 0.29 × (1–0.29) = 0.410
I (feminine) = 0.68 × (1–0.68) + 0.32 × (1–0.32) = 0.435the decrease in impurity of this partition is given by:
Δ I = 1345 × I (initial) − (911 × I [masculine] + 434 × I [ feminine]) = 0.87
Continuing with the other two examples ( Fig. 4 B and C), the reduction in impurity is:
(A) Partition of the initial mortality node according to gender. (B) Partition of the initial node according to age. (C) Partition of the initial node according to the type of valve.
Δ I (age) = 1.98Δ I (valve) = 14.55
We can see that with similar percentage mortality rates, dividing the tree by the variable corresponding to the type of valve affected results in far greater reduction of impurity, with an increase in classification capacity.
One of the most important issues is determining the final number of partitions of a tree or, in other words, determining the size of the tree. If the division process ends too soon, we will not have obtained the full classification capacity of the tree – i.e., under-adjustment occurs. In contrast, if we perform too many divisions, we run the risk of classifying random particularities of the data – a situation known as over-adjustment.
In order to secure the correct size of the tree (what is known as an honest tree), we must model the sample in several attempts to reach this optimum point.
Clinically more intuitive models are generated. 31
The order of the classification can be varied to create decision rules of greater sensitivity and specificity, 32 since we can identify nonlinear relationships with the dependent variables.
Precision may be incremented, and the approach is particularly useful in identifying interactions that can be entered in multivariate models. 33
Not applicable to continuous variables that would have to be dichotomized. Nevertheless, we can select the most adequate cut-off point as alternative to the receiver operating characteristic (ROC) curves. 34
The authors declare that they have no conflicts of interest.
Please cite this article as: Gutiérrez-Pizarraya A, García-Cabrera E, Álvarez-Márquez E. Métodos estadísticos alternativos y su aplicación a la investigación en Cuidados Intensivos. Med Intensiva. 2018;42:490–499.
- Subscribe to our newsletter
- Impact of vaccination on admissions to an intensive care unit for COVID-19 in a third-level hospital
- Delirium in COVID-19. Practical aspects of a frequent association
- Comparison of the clinical characteristics and mortality in acute respiratory distress syndrome due to COVID-19 versus due to Influenza A-H1N1pdm09
- Descriptive analysis of SARS-CoV-2 pandemia impact on pediatric intensive care unit admissions
- Send to a friend
- Export reference
- Instructions for authors
- Submit an article
- Ethics in publishing
- Visual abstract
- Articles in press
- Current Issue
- Open Access Option
- Aims and scope
- Editorial Board
- Most often read last 3 years
- All metrics
- Léalo en español
- Download PDF