Walter A. Kukull, PhD and also Mary Ganguli, MD, MPH
From the room of public health (W.A.K.), college of Washington institution of windy Health, Seattle; and Departments that Psychiatry, Neurology, and also Epidemiology (M.G.), college of Pittsburgh institution of Medicine and Graduate college of windy Health, Pittsburgh, PA.

Corresponding author.

Clinical and also epidemiologic investigations room paying increasing attention to the crucial constructs that “representativeness” of research samples and “generalizability” of research results. This is a laudable trend and yet, these key concepts are regularly misconstrued and conflated, masking the main issues that internal and external validity. The authors define these issues and demonstrate exactly how they are concerned one another and to generalizability. Giving examples, they determine threats to validity from different forms the bias and also confounding. They likewise lay out relevant practical issues in study design, from sample an option to assessment of exposures, in both clinic-based and also population-based settings.

Only come the degree we space able to explain empirical facts can we attain the major objective of scientific research, namely not simply to document the phenomena of ours experience, but to discover from them, through basing upon castle theoretical generalizations which permit us to anticipate new occurrences and to control, at least to some extent, the transforms in our environment.1(p12)

“This research sample is no representative that the population!” “Our results are no generalizable …” such comments are significantly familiar but what precisely do they mean? how do research design, topic ascertainment, and “representativeness” the a sample impact “generalizability” the results? execute study results generalize only from statistically drawn samples that a usual underlying population? has actually “lack that generalizability” end up being the low-hanging fruit, ripe because that plucking by the casual critic?


Confusion about generalizability has emerged from the conflation that 2 fundamental questions. First, room the outcomes of the examine true, or are they one artifact the the way the research was designed or conducted; i.e., is the research is inside valid? Second, room the study results likely to apply, generally or special, in other study settings or samples; i.e., room the study outcomes externally valid?

Thoughtful study design, cautious data collection, and appropriate statistical evaluation are at the main point of any kind of study"s internal validity. Even if it is or no those within valid results will then broadly “generalize,” to various other study settings, samples, or populations, is as lot a matter of judgment as of statistics inference. The generalizability of a study"s results counts on the researcher"s capacity to separate the “relevant” from the “irrelevant” truth of the study, and then carry forward a judgment about the appropriate facts,2 which would certainly be straightforward if we always knew what could eventually turn out to it is in relevant. ~ all, us generalize results from animal studies come humans, if the common biologic procedure or disease mechanism is “relevant” and varieties is reasonably “irrelevant.” We likewise draw vast inferences from randomized managed trials, even though this studies frequently have certain inclusion and exclusion criteria, fairly than being population probability samples. In various other words, generalization is the “big picture” interpretation of a study"s results as soon as they are established to be internally valid.


The statistical ideas of sampling theory and also hypothesis experimentation have come to be intermingled with the id of generalizability. Strict estimation of quantities based on a probability sample of a “population,” vs assessing all members of that population, remained things of considerable argument amongst statisticians till the early on 20th century.3 Sampling was adopted of necessity since studying the entire populace was not feasible. Same samples must administer valid estimates of the population characteristics being studied. This quite reasonable ide evolved in usual usage so that “population” ended up being synonymous v “all persons or all cases.” It adhered to that to accomplish representative and also generalizable sample estimates, a probability sample that “all” have to be drawn. Logically, then, “all” need to somehow it is in enumerated prior to representative samples can be drawn. The bite the the vicious circle becomes noticeable when “all” literally way all in a nation or continent. However enumeration might be achievable when care is required to establish more finite population boundaries.

Statisticians Kruskal and also Mosteller3–6 carried out a detailed examination of nonscientific, “extrastatistical scientific,” and also statistical literary works to classify uses of the ax representative sample or sampling. Those interpretations are 1) “general, unjustified acclaim for the data”; 2) “absence (or presence) that selective forces”; 3) “mirror or miniature that the population”; 4) “typical or ideal case … the represents that (the population) top top average”; 5) “coverage of the populace … (sample) comprise at least one item from every stratum …”; 6) “a vague term to be made precise” through specification the a specific statistical sampling scheme, e.g., basic random sampling. In statistical literature, representative sampling meanings include a) “a details sampling method”; b) “permitting great estimation”; and also c) “good enough for a specific purpose.”4 The conflicts and also ambiguities among the over uses are obvious, but how execute we look for clarity in our study discourse?


So is over there in fact any value to population-based research studies (Indeed over there is!), and if so, exactly how should we define a “population”? We very first define it by developing its boundaries (e.g., counties, insurance allowance memberships, schools, voter it is registered lists). The population is consisted of entirely the members with an illness (cases) and members without an illness (noncases), leaving nobody out. Ideally, we would capture and also study every cases, together they occur. Together a comparison group, we would also include either all noncases, or a probability sample that noncases.7 The an option of “boundaries” because that a study population influences internal and external validity. If we deliberately or inadvertently “gerrymander” our boundaries, so the the aspect of attention is much more (or less) common among cases than amongst noncases, the examine base will certainly be biased and our outcomes will be spurious or misleading.

Adequately designed population-based research studies minimize the possibility that an option factors will have unintended adverse consequences on the research results. Further, since any type of effect we can measure relies as lot on the comparison group as it does ~ above the situation group, appropriate choice is no less vital for the noncases 보다 it is because that cases. This is true even if it is the examine is clinic-based or population-based. Population-based research anchors the comparison team to the cases.

Clinic-based investigations space exemplified by those carried out at Alzheimer"s an illness Research Centers (ADRCs). They generally examine high-risk, family-based, clinic-based, or hospital-based groups, to watch association v treatment or disease. This is an efficient method to facilitate in-depth study that “clean” diagnostic subgroups. The outside validity that these research studies rests top top the judgment of whether the topic selection process itself can have spuriously influenced the results. This determination is regularly harder in clinic-based researches than in population-based studies. Replication in an live independence sample is therefore key, but replication is more elusive and daunting with clinic-based studies, together we comment on later.

Regardless of even if it is the study sample is clinic-based or population-based, just how well and fully we recognize “disease” (including preclinical or asymptomatic disease), not only in our case group, but also among those in ours comparison group, deserve to adversely influence results. For example, think about a examine of Alzheimer an illness (AD) in which, unbeknownst come the subjects and the investigators, the cognitively normal regulate group consists of a large proportion the persons through underlying advertisement pathology. The result diagnostic misclassification, resulted in by including true “cases” among the noncases, would certainly spuriously distort and also weaken the observed results. This distortion can happen in clinic-based or population-based studies; it is a issue of interior validity tied come diagnostic accuracy, rather than an concern of representativeness or generalizability.


Bias reasons observed measurements or results to differ from their true values due to the fact that of systematic, but unintended, “errors,” for example, in the method we ascertain and enroll study subjects (selection bias), or the means we collect data indigenous them (information bias). Statistical definition of research results, nevertheless of p value, is completely irrelevant together a means of assessing results when bias is active.

Selection bias.

Selection predisposition is regularly subtle, and requires cautious thought come discern that is potential impact on the hypotheses being tested. For example, would an option bias render clinic-based ADRC study results suspect, if no invalid? Unfortunately, the price is not simple; it counts on what is gift studied and whether “selection” right into the ADRC study distorts the true association. There space numerous advantages to recruiting study participants from committed memory disorder clinics, as in the common ADRC. Both ad cases and also healthy controls room selected (as volunteer or referrals) under very particular circumstances the ensure their contribution to ad research. Lock either have (cases) or do not have actually (controls) the clinical/pathologic features typical of AD. Situations fulfill the research study diagnostic criteria for AD, they have “reliable informants” that will companion them come clinic visits; neither cases nor controls deserve to have various exclusionary functions (e.g., comorbid punch or significant psychiatric disorder); every are urged to pertained to the clinic and participate completely in the research, consisting of neuroimaging and lumbar puncture; many are eager to get in clinical trials, and many consent to eventual autopsy. Advertisement cases that fit the over profile room admirable for their enthusiasm and altruism, yet may no be typical, no one a probability sample of all advertisement cases in the population base from whence castle came. The differential circulation of study components between advertisement cases who did and also did no enroll might give us an indication of whether bias may be attenuating or exaggerating the certain study results, if us were maybe to obtain that information. Therefore, the astute reader asks: “Can the underlying population base, indigenous which the subjects came, be described? might the populace base"s established borders or inclusion attributes have influenced the results? Was topic enrollment in any means influenced through the components being studied?” In a clinic-based study it is seldom easy to describe the unenrolled instances (or unenrolled noncases) from the underlying population base in stimulate to do such comparisons. The helps internal validity very little to insurance claim that the enrollees" age, race, and also sex distributions room in comparable proportions come the population of the bordering county, if age, race, and sex have small to carry out with the variable being studied, and also if authorized is differentially linked with the components being studied.

Note that population-based studies space not inherently protected from bias; people sampled from the community, who are not seeking services, may consent or refuse to get involved in research, and also their willingness to take part is i can not qualify to be random. If we were concerned about an option bias in a study examining pesticide exposure together a risk variable for Parkinson disease (PD), we can ask, “Were PD cases who had actually not to be exposed to pesticides an ext (or less) most likely to refuse enrollment in our examine than PD instances who had been exposed?”

Selection predisposition may be not simply inadvertent but also unavoidable. Part years ago, a frighten finding8 was reported that ad cases who volunteered or were described an ADRC were significantly more likely to carry the APOE*4 genotype than were newly recognized advertisement cases caught through surveillance of a health maintenance organization population base within the exact same metropolitan area. The ADRC sample had yielded a biased overestimate of APOE*4 allele frequency, and also of its estimated relative risk, because ADRC cases were inadvertently selected top top the communication of age, and it to be unnoticed that the likelihood of carrying an APOE*4 allele decreases with age. Over there is no method the ADRC investigators can have recognize this inadvertent an option bias had they not likewise had access to a population sample native the exact same base. A later on meta-analysis that APOE*4 allele results quantified the relationship in between age and also risk of ad associated through APOE alleles, and also showed that advertisement risk because of APOE*4 genotype is reduced in populace samples than in specialty clinic samples.9APOE allele frequency additionally could be affected by research recruitment. Family history of ad seems to promote participation in both clinical and also population-based studies entailing memory loss, and also is likewise associated v APOE*4 frequency, thereby perhaps biasing the size of APOE effect.

Survival bias is a kind of selection bias that is beyond the regulate of the selector. For example, part African populations have actually high APOE*4 frequency yet have not displayed an elevated association between APOE*4 and AD.10,11 while there could be multiple factors for this paradox, one possibility is that individuals with the APOE*4 genotype had died of heart disease before growing old sufficient to construct dementia.

Prevalence predisposition (length bias) is similar to survival bias. In the 1990s, numerous case-control studies confirmed a protective result of cigarette smoking on advertisement occurrence.12 Assume the both ad and smoking shorten life expectancy and also that advertisement cases enrolled in those researches some time after symptom onset. If age alone to be the basis because that potential an option bias, smoking should cause premature mortality equally among those who are and also those who room not destined to construct AD. However, there is another aspect of an option bias called prevalence or length bias: at any given time, prevalent, i.e., existing, situations are those whose survive with an illness (disease duration) to be of greater length. If smokers with advertisement die sooner after ad onset 보다 nonsmokers through AD, those prevalent advertisement cases obtainable for study would “selectively” it is in nonsmokers. A scenario known as “competing risks” occurs as soon as smoking influences the danger both that death and also of AD.13 This would enhance the observed excess of smoking amongst “controls” and also thereby inflate the noticeable protective association in between smoking and also AD. Subsequently, longitudinal research studies of smokers and nonsmokers showed an increased risk of advertisement incidence linked with smoking,12 arguing that selection bias can have defined the earlier cross-sectional examine results.