michael d. green, d. michal freedman, and leon gordis
Michael D. Green, B.S., J.D., is Bess & Walter Williams Chair in Law, Wake Forest University Schoolof Law, Winston-Salem, North Carolina.
D. Michal Freedman, J.D., Ph.D., M.P.H., is Epidemiologist, Division of Cancer Epidemiology andGenetics, National Cancer Institute, Bethesda, Maryland.
Leon Gordis, M.D., Dr.P.H., is Professor of Epidemiology, Johns Hopkins School of Public Health, andProfessor of Pediatrics, Johns Hopkins School of Medicine, Baltimore, Maryland.
contents
I.Introduction, 335
II.What Different Kinds of Epidemiologic Studies Exist? 338
A.Experimental and Observational Studies of Suspected Toxic Agents, 338B.The Types of Observational Study Design, 339
1.Cohort studies, 340
2.Case-control studies, 3423.Cross-sectional studies, 3434.Ecological studies, 344
C.Epidemiologic and Toxicologic Studies, 345III. How Should Results of an Epidemiologic Study Be Interpreted? 348
A.Relative Risk, 348B.Odds Ratio, 350
C.Attributable Risk, 351
D.Adjustment for Study Groups That Are Not Comparable, 352IV.What Sources of Error Might Have Produced a False Result? 354
A.What Statistical Methods Exist to Evaluate the Possibility of SamplingError? 355
1.False positive error and statistical significance, 3562.False negative error, 3623.Power, 362
B.What Biases May Have Contributed to an Erroneous Association? 363
1.Selection bias, 3632.Information bias, 365
3.Other conceptual problems, 369
C.Could a Confounding Factor Be Responsible for the Study Result? 369
1.What techniques can be used to prevent or limit confounding? 3722.What techniques can be used to identify confounding factors? 3733.What techniques can be used to control for confounding factors? 373
333
Reference Manual on Scientific Evidence
V.General Causation: Is an Exposure a Cause of the Disease? 374
A.Is There a Temporal Relationship? 376
B.How Strong Is the Association Between the Exposure and Disease? 376C.Is There a Dose–Response Relationship? 377D.Have the Results Been Replicated? 377
E.Is the Association Biologically Plausible (Consistent
with Existing Knowledge)? 378
F.Have Alternative Explanations Been Considered? 378G.What Is the Effect of Ceasing Exposure? 378H.Does the Association Exhibit Specificity? 379
I.Are the Findings Consistent with Other Relevant Knowledge? 379VI. What Methods Exist for Combining the Results of Multiple Studies? 380VII. What Role Does Epidemiology Play in Proving Specific Causation? 381Glossary of Terms, 387
References on Epidemiology, 398
References on Law and Epidemiology, 398
334
Reference Guide on Epidemiology
I. Introduction
Epidemiology is the field of public health and medicine that studies the inci-dence, distribution, and etiology of disease in human populations. The purposeof epidemiology is to better understand disease causation and to prevent diseasein groups of individuals. Epidemiology assumes that disease is not distributedrandomly in a group of individuals and that identifiable subgroups, includingthose exposed to certain agents, are at increased risk of contracting particulardiseases.1
Judges and juries increasingly are presented with epidemiologic evidence asthe basis of an expert’s opinion on causation.2 In the courtroom, epidemiologicresearch findings3 are offered to establish or dispute whether exposure to anagent4 caused a harmful effect or disease.5 Epidemiologic evidence identifies
1.Although epidemiologists may conduct studies of beneficial agents that prevent or cure diseaseor other medical conditions, this reference guide refers exclusively to outcomes as diseases, because theyare the relevant outcomes in most judicial proceedings in which epidemiology is involved.
2.Epidemiologic studies have been well received by courts trying mass tort suits. Well-conductedstudies are uniformly admitted. 2 Modern Scientific Evidence: The Law and Science of Expert Testi-mony §28-1.1, at 302–03 (David L. Faigman et al. eds., 1997) [hereinafter Modern Scientific Evi-dence]. It is important to note that often the expert testifying before the court is not the scientist whoconducted the study or series of studies. See, e.g., DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941,953 (3d Cir. 1990) (pediatric pharmacologist expert’s credentials sufficient pursuant to Fed. R. Evid.702 to interpret epidemiologic studies and render an opinion based thereon); cf. Landrigan v. CelotexCorp., 605 A.2d 1079, 1088 (N.J. 1992) (epidemiologist permitted to testify to both general causationand specific causation); Loudermill v. Dow Chem. Co., 863 F.2d 566, 569 (8th Cir. 1988) (toxicologistpermitted to testify that chemical caused decedent’s death).
3.An epidemiologic study, which often is published in a medical journal or other scientific journal,is hearsay. An epidemiologic study that is performed by the government, such as one performed by theCenters for Disease Control (CDC), may be admissible based on the hearsay exception for governmentrecords contained in Fed. R. Evid. 803(8)(C). See Ellis v. International Playtex, Inc., 745 F.2d 292,300–01 (4th Cir. 1984); Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 899 (N.D. Iowa 1982),aff’d sub nom. Kehm v. Procter & Gamble Mfg. Co., 724 F.2d 613 (8th Cir. 1983). A study that is notconducted by the government might qualify for the learned treatise exception to the hearsay rule, Fed.R. Evid. 803(18), or possibly the catchall exceptions, Fed. R. Evid. 803(24) & 804(5). See Ellis, 745F.2d at 305, 306 & n.18.
In any case, an epidemiologic study might be part of the basis of an expert’s opinion and need not beindependently admissible pursuant to Fed. R. Evid. 703. See In re “Agent Orange” Prod. Liab. Litig.,611 F. Supp. 1223, 1240 (E.D.N.Y. 1985), aff’d, 818 F.2d 187 (2d Cir. 1987), cert. denied, 487 U.S.1234 (1988); cf. Grassis v. Johns-Manville Corp., 591 A.2d 671, 676 (N.J. Super. Ct. App. Div. 1991)(epidemiologic study offered in evidence to support expert’s opinion under New Jersey evidentiary ruleequivalent to Fed. R. Evid. 703).
4.We use agent to refer to any substance external to the human body that potentially causes diseaseor other health effects. Thus, drugs, devices, chemicals, radiation, and minerals (e.g., asbestos) are allagents whose toxicity an epidemiologist might explore. A single agent or a number of independentagents may cause disease, or the combined presence of two or more agents may be necessary for thedevelopment of the disease. Epidemiologists also conduct studies of individual characteristics, such asblood pressure and diet, which might pose risks, but those studies are rarely of interest in judicialproceedings. Epidemiologists may also conduct studies of drugs and other pharmaceutical products toassess their efficacy and safety.
5.DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 945–48, 953–59 (3d Cir. 1990) (litigation
335
Reference Manual on Scientific Evidence
agents that are associated with an increased risk of disease in groups of individu-als, quantifies the amount of excess disease that is associated with an agent, andprovides a profile of the type of individual who is likely to contract a diseaseafter being exposed to an agent. Epidemiology focuses on the question of gen-eral causation (i.e., is the agent capable of causing disease?) rather than that ofspecific causation (i.e., did it cause disease in a particular individual?).6 For ex-ample, in the 1950s Doll and Hill and others published articles about the in-creased risk of lung cancer in cigarette smokers. Doll and Hill’s studies showedthat smokers who smoked ten to twenty cigarettes a day had a lung cancermortality rate that was about ten times higher than that for nonsmokers.7 Thesestudies identified an association between smoking cigarettes and death fromlung cancer, which contributed to the determination that smoking causes lungcancer.
However, it should be emphasized that an association is not equivalent to causa-tion.8 An association identified in an epidemiologic study may or may not becausal.9 Assessing whether an association is causal requires an understanding of
over morning sickness drug, Bendectin); Cook v. United States, 545 F. Supp. 306, 307–16 (N.D. Cal.1982) (swine flu vaccine alleged to have caused plaintiff’s Guillain-Barré disease); Allen v. UnitedStates, 588 F. Supp. 247, 416–25 (D. Utah 1984) (residents near atomic test site claimed exposure toradiation caused leukemia and other cancers), rev’d on other grounds, 816 F.2d 1417 (10th Cir. 1987), cert.denied, 484 U.S. 1004 (1988); In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 780–90(E.D.N.Y. 1984) (Vietnam veterans exposed to Agent Orange and dioxin contaminant brought suit forvarious diseases and birth defects in their offspring), aff’d, 818 F.2d 145 (2d Cir. 1987); Christophersenv. Allied-Signal Corp., 939 F.2d 1106, 1115 (5th Cir. 1991) (cancer alleged to have resulted fromexposure to nickel-cadmium fumes), cert. denied, 503 U.S. 912 (1992); Kehm v. Procter & Gamble Co.,580 F. Supp. 890, 898–902 (N.D. Iowa 1982) (toxic shock syndrome alleged to result from use of Relytampons), aff’d sub nom. Kehm v. Procter & Gamble Mfg. Co., 724 F.2d 613 (8th Cir. 1983).
6.This terminology and the distinction between general causation and specific causation is widelyrecognized in court opinions. See, e.g., Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873,875–76 (W.D. Tex. 1997) (recognizing the different concepts of general causation and specific causa-tion), appeal dismissed, 139 F.3d 899 (5th Cir. 1998); Cavallo v. Star Enter., 892 F. Supp. 756, 771 n.34(E.D. Va. 1995), aff’d in part and rev’d in part, 100 F.3d 1150 (4th Cir. 1996), cert. denied, 522 U.S. 1044(1998); Casey v. Ohio Med. Prods., 877 F. Supp. 1380, 1382 (N.D. Cal. 1995). For a discussion ofspecific causation, see infra §VII.
7.Richard Doll & A. Bradford Hill, Lung Cancer and Other Causes of Death in Relation to Smoking, 2Brit. Med. J. 1071 (1956).
8.See Kelley v. American Heyer-Schulte Corp., 957 F. Supp 873, 878 (W.D. Tex. 1997), appealdismissed, 139 F.3d 899 (5th Cir. 1998). Association is more fully discussed infra §III. The term is usedto describe the relationship between two events (e.g., exposure to a chemical agent and development ofdisease) that occur more frequently together than one would expect by chance. Association does notnecessarily imply a causal effect. Causation is used to describe the association between two events whenone event is a necessary link in a chain of events that results in the effect. Of course, alternative causalchains may exist that do not include the agent but that result in the same effect. Epidemiologic methodscannot deductively prove causation; indeed, all empirically based science cannot affirmatively prove acausal relation. See, e.g., Stephan F. Lanes, The Logic of Causal Inference in Medicine, in Causal Inference59 (Kenneth J. Rothman ed., 1988). However, epidemiologic evidence can justify an inference that anagent causes a disease. See infra §V.9.See infra §IV.
336
Reference Guide on Epidemiology
the strengths and weaknesses of the study’s design and implementation, as wellas a judgment about how the study findings fit with other scientific knowledge.It is important to emphasize that most studies have flaws.10 Some flaws areinevitable given the limits of technology and resources. In evaluating epidemio-logic evidence, the key questions, then, are the extent to which a study’s flawscompromise its findings and whether the effect of the flaws can be assessed andtaken into account in making inferences.
A final caveat is that employing the results of group-based studies of risk tomake a causal determination for an individual plaintiff is beyond the limits ofepidemiology. Nevertheless, a substantial body of legal precedent has developedthat addresses the use of epidemiologic evidence to prove causation for an indi-vidual litigant through probabilistic means, and these cases are discussed later inthis reference guide.11
The following sections of this reference guide address a number of criticalissues that arise in considering the admissibility of, and weight to be accorded to,epidemiologic research findings. Over the past couple of decades, courts fre-quently have confronted the use of epidemiologic studies as evidence and rec-ognized their utility in proving causation. As the Third Circuit observed inDeLuca v. Merrell Dow Pharmaceuticals, Inc.: “The reliability of expert testimonyfounded on reasoning from epidemiological data is generally a fit subject forjudicial notice; epidemiology is a well-established branch of science and medi-cine, and epidemiological evidence has been accepted in numerous cases.”12
Three basic issues arise when epidemiology is used in legal disputes and themethodological soundness of a study and its implications for resolution of thequestion of causation must be assessed:
1.Do the results of an epidemiologic study reveal an association between anagent and disease?
2.What sources of error in the study may have contributed to an inaccurateresult?
3.If the agent is associated with disease, is the relationship causal?
Section II explains the different kinds of epidemiologic studies, and section IIIaddresses the meaning of their outcomes. Section IV examines concerns aboutthe methodological validity of a study, including the problem of sampling er-10.See In re Orthopedic Bone Screw Prods. Liab. Litig., MDL No. 1014, 1997 U.S. Dist. LEXIS6441, at *26–*27 (E.D. Pa. May 5, 1997) (holding that despite potential for several biases in a study that“may ... render its conclusions inaccurate,” the study was sufficiently reliable to be admissible); JosephL. Gastwirth, Reference Guide on Survey Research, 36 Jurimetrics J. 181, 185 (1996) (review essay) (“Onecan always point to a potential flaw in a statistical analysis.”).11.See infra §VII.
12.911 F.2d 941, 954 (3d Cir. 1990); see also Smith v. Ortho Pharm. Corp., 770 F. Supp. 1561,1571 (N.D. Ga. 1991) (explaining increased reliance of courts on epidemiologic evidence in toxicsubstances litigation).
337
Reference Manual on Scientific Evidence
ror.13 Section V discusses general causation, considering whether an agent iscapable of causing disease. Section VI deals with methods for combining theresults of multiple epidemiologic studies, and the difficulties entailed in extract-ing a single global measure of risk from multiple studies. Additional legal ques-tions that arise in most toxic substances cases are whether population-based epi-demiologic evidence can be used to infer specific causation, and if so, how.Section VII examines issues of specific causation, considering whether an agentcaused an individual’s disease.
II.What Different Kinds of EpidemiologicStudies Exist?
A. Experimental and Observational Studies ofSuspected Toxic Agents
To determine whether an agent is related to the risk of developing a certaindisease or an adverse health outcome, we might ideally want to conduct anexperimental study in which the subjects would be randomly assigned to one oftwo groups: one group exposed to the agent of interest and the other not ex-posed. After a period of time, the study participants in both groups would beevaluated for development of the disease. This type of study, called a random-ized trial, clinical trial, or true experiment, is considered the gold standard fordetermining the relationship of an agent to a disease or health outcome. Such astudy design is often used to evaluate new drugs or medical treatments and is thebest way to ensure that any observed difference between the two groups inoutcome is likely to be the result of exposure to the drug or medical treatment.Randomization minimizes the likelihood that there are differences in rel-evant characteristics between those exposed to the agent and those not exposed.Researchers conducting clinical trials attempt to use study designs that are pla-cebo controlled, which means that the group not receiving the agent or treat-ment is given a placebo, and that use double blinding, which means that neitherthe participants nor those conducting the study know which group is receivingthe agent or treatment and which group is given the placebo. However, ethicaland practical constraints limit the use of such experimental methodologies toassessing the value of agents that are thought to be beneficial to human beings.
13.For a more in-depth discussion of the statistical basis of epidemiology, see David H. Kaye &David A. Freedman, Reference Guide on Statistics §II.A, in this manual, and two case studies: JosephSanders, The Bendectin Litigation: A Case Study in the Life Cycle of Mass Torts, 43 Hastings L.J. 301 (1992);Devra L. Davis et al., Assessing the Power and Quality of Epidemiologic Studies of Asbestos-Exposed Popula-tions, 1 Toxicological & Indus. Health 93 (1985). See also References on Epidemiology and Referenceson Law and Epidemiology at the end of this reference guide.
338
Reference Guide on Epidemiology
When an agent’s effects are suspected to be harmful, we cannot knowinglyexpose people to the agent.14 Instead of the investigator controlling who isexposed to the agent and who is not, most epidemiologic studies are observa-tional—that is, they “observe” a group of individuals who have been exposed toan agent of interest, such as cigarette smoking or an industrial chemical, andcompare them with another group of individuals who have not been so ex-posed. Thus, the investigator identifies a group of subjects who have been know-ingly or unknowingly exposed and compares their rate of disease or death withthat of an unexposed group. In contrast to clinical studies, in which potentialrisk factors can be controlled, epidemiologic investigations generally focus onindividuals living in the community, for whom characteristics other than theone of interest, such as diet, exercise, exposure to other environmental agents,and genetic background, may contribute to the risk of developing the disease inquestion. Since these characteristics cannot be controlled directly by the inves-tigator, the investigator addresses their possible role in the relationship beingstudied by considering them in the design of the study and in the analysis andinterpretation of the study results (see infra section IV).
B. The Types of Observational Study Design
Several different types of observational epidemiologic studies can be conducted.15Study designs may be chosen because of suitability for investigating the questionof interest, timing constraints, resource limitations, or other considerations. Animportant question that might be asked initially about a given epidemiologicstudy is whether the study design used was appropriate to the research question.Most observational studies collect data about both exposure and health out-come in every individual in the study. The two main types of observationalstudies are cohort studies and case-control studies. A third type of observationalstudy is a cross-sectional study, although cross-sectional studies are rarely usefulin identifying toxic agents.16 A final type of observational study, one in whichdata about individuals is not gathered, but rather population data about expo-14.Experimental studies in which human beings are exposed to agents known or thought to betoxic are ethically proscribed. See Ethyl Corp. v. United States Envtl. Protection Agency, 541 F.2d 1,26 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976). Experimental studies can be used where the agentunder investigation is believed to be beneficial, as is the case in the development and testing of newpharmaceutical drugs. See, e.g., E.R. Squibb & Sons, Inc. v. Stuart Pharms., No. 90-1178, 1990 U.S.Dist. LEXIS 15788 (D.N.J. Oct. 16, 1990); Gordon H. Guyatt, Using Randomized Trials inPharmacoepidemiology, in Drug Epidemiology and Post-Marketing Surveillance 59 (Brian L. Strom &Giampaolo Velo eds., 1992). Experimental studies may also be conducted that entail discontinuation ofexposure to a harmful agent, such as studies in which smokers are randomly assigned to a variety ofsmoking-cessation programs or no cessation.
15.Other epidemiologic studies collect data about the group as a whole, rather than about eachindividual in the group. These group studies are discussed infra §II.B.4.16.See infra §II.B.3.
339
Reference Manual on Scientific Evidence
sure and disease are used, is an ecological study.
The difference between cohort studies and case-control studies is that cohortstudies measure and compare the incidence of disease in the exposed and unex-posed (“control”) groups, while case-control studies measure and compare thefrequency of exposure in the group with the disease (the “cases”) and the groupwithout the disease (the “controls”). Thus, a cohort study takes the exposedstatus of participants (the independent variable) and examines its effect on inci-dence of disease (the dependent variable). A case-control study takes the diseasestatus as the independent variable and examines its relationship with exposure,which is the dependent variable. In a case-control study, the rates of exposure inthe cases and the rates in the controls are compared, and the odds of having thedisease when exposed to a suspected agent can be compared with the odds whennot exposed. The critical difference between cohort studies and case-controlstudies is that cohort studies begin with exposed people and unexposed people,while case-control studies begin with individuals who are selected based onwhether they have the disease or do not have the disease and their exposure tothe agent in question is measured. The goal of both types of studies is to deter-mine if there is an association between exposure to an agent and a disease, andthe strength (magnitude) of that association.
1. Cohort studies
In cohort studies17 the researcher identifies two groups of individuals: (1) indi-viduals who have been exposed to a substance that is considered a possible causeof a disease and (2) individuals who have not been exposed (see Figure 1).18Both groups are followed for a specified length of time, and the proportions ofindividuals in each group who develop the disease are compared.19 Thus, asillustrated in Table 1, a researcher would compare the proportion of unexposedindividuals (controls) with the disease (b/(a + b)) with the proportion of ex-posed individuals (cohort) with the disease (d/(c + d)). If the exposure causes
17.Cohort studies also are referred to as prospective studies and follow-up studies.
18.In some studies, there may be several groups, each with a different magnitude of exposure tothe agent being studied. Thus, a study of cigarette smokers might include heavy smokers (>3 packs aday), moderate smokers (1–2 packs a day), and light smokers (<1 pack a day). See, e.g., Robert A.Rinsky et al., Benzene and Leukemia: An Epidemiologic Risk Assessment, 316 New Eng. J. Med. 1044(1987).
19.Sometimes retrospective cohort studies are conducted, in which the researcher gathers histori-cal data about exposure and disease outcome of the exposed cohort. Harold A. Kahn, An Introductionto Epidemiologic Methods 39–41 (1983). Irving Selikoff, in his seminal study of asbestotic disease ininsulation workers, included several hundred workers who had died before he began the study. Selikoffwas able to obtain information about exposure from union records and information about disease fromhospital and autopsy records. Irving J. Selikoff et al., The Occurrence of Asbestosis Among Insulation Workersin the United States, 132 Annals N.Y. Acad. Sci. 139, 143 (1965).
340
Reference Guide on Epidemiology
the disease, the researcher would expect a greater proportion of the exposedindividuals than of the unexposed individuals to develop the disease.20Figure 1. Design of a Cohort Study
DefinedPopulationExposedNot ExposedDevelopDiseaseDo NotDevelopDiseaseDevelopDiseaseDo NotDevelopDiseaseTable 1.Cross-Tabulation of Exposure by Disease Status
No Disease
Disease
Not ExposedExposedacbd
One advantage of the cohort study design is that the temporal relationshipbetween exposure and disease can often be established more readily. By trackingthe exposed and unexposed groups over time, the researcher can determine thetime of disease onset. This temporal relationship is critical to the question ofcausation, since exposure must precede disease onset if exposure caused thedisease.
As an example, in 1950 a cohort study was begun to determine whetheruranium miners exposed to radon were at increased risk for lung cancer as com-pared with nonminers. The study group (also referred to as the exposed cohort)consisted of 3,400 white, underground miners. The control group (which neednot be the same size as the exposed cohort) comprised white nonminers fromthe same geographic area. Members of the exposed cohort were examined ev-20.Researchers often examine the rate of disease or death in the exposed and control groups. Therate of disease or death entails consideration of the number within a time period. All smokers andnonsmokers will, if followed for 100 years, die. Smokers will die at a greater rate than nonsmokers.
341
Reference Manual on Scientific Evidence
ery three years, and the degree of this cohort’s exposure to radon was measuredfrom samples taken in the mines. Ongoing testing for radioactivity and periodicmedical monitoring of lungs permitted the researchers to examine whether dis-ease was linked to prior work exposure to radiation and allowed them to discernthe relationship between exposure to radiation and disease. Exposure to radia-tion was associated with the development of lung cancer in uranium miners.21
The cohort design is often used in occupational studies such as the one justcited. Since the design is not experimental, and the investigator has no controlover what other exposures a subject in the study may have had, an increased riskof disease among the exposed group may be caused by agents other than theexposure of interest. A cohort study of workers in a certain industry that paysbelow-average wages might find a higher risk of cancer in those workers. Thismay be because they work in that industry, or, among other reasons, it may bebecause low-wage groups are exposed to other harmful agents, such as environ-mental toxins present in higher concentrations in their neighborhoods. In thestudy design, the researcher must attempt to identify factors other than the ex-posure that may be responsible for the increased risk of disease. If data are gath-ered on other possible etiologic factors, the researcher generally uses statisticalmethods22 to assess whether a true association exists between working in theindustry and cancer. Evaluating whether the association is causal involves addi-tional analysis, as discussed in section V.
2. Case-control studies
In case-control studies,23 the researcher begins with a group of individuals whohave a disease (cases) and then selects a group of individuals who do not have thedisease (controls). The researcher then compares the groups in terms of pastexposures. If a certain exposure is associated with or caused the disease, a higherproportion of past exposure among the cases than among the controls would beexpected (see Figure 2).
Thus, for example, in the late 1960s, doctors in Boston were confronted withan unusual incidence of vaginal adenocarcinoma in young female patients. Thosepatients became the “cases” in a case-control study (because they had the diseasein question) and were matched with “controls,” who did not have the disease.Controls were selected based on their being born in the same hospitals and atthe same time as the cases. The cases and controls were compared for exposure
21.This example is based on a study description in Abraham M. Lilienfeld & David E. Lilienfeld,Foundations of Epidemiology 237–39 (2d ed. 1980). The original study is Joseph K. Wagoner et al.,Radiation as the Cause of Lung Cancer Among Uranium Miners, 273 New Eng. J. Med. 181 (1965).22.See Daniel L. Rubinfeld, Reference Guide on Multiple Regression §II.B, in this manual.23.Case-control studies are also referred to as retrospective studies, because researchers gatherhistorical information about rates of exposure to an agent in the case and control groups.
342
Reference Guide on Epidemiology
to agents that might be responsible, and researchers found maternal ingestion ofDES (diethylstilbestrol) in all but one of the cases but none of the controls.24Figure 2. Design of a Case-Control Study
ExposedNot ExposedExposedNot ExposedDiseaseCASESNo DiseaseCONTROLSAn advantage of the case-control study is that it usually can be completed inless time and with less expense than a cohort study. Case-control studies are alsoparticularly useful in the study of rare diseases, because if a cohort study wereconducted, an extremely large group would have to be studied in order toobserve the development of a sufficient number of cases for analysis.25 A num-ber of potential problems with case-control studies are discussed in section IV.B.
3. Cross-sectional studies
A third type of observational study is a cross-sectional study. In this type ofstudy, individuals are interviewed or examined, and the presence of both theexposure of interest and the disease of interest is determined in each individualat a single point in time. Cross-sectional studies determine the presence (preva-lence) of both exposure and disease in the subjects and do not determine thedevelopment of disease or risk of disease (incidence). Moreover, since bothexposure and disease are determined in an individual at the same point in time,it is not possible to establish the temporal relation between exposure and dis-ease—that is, that the exposure preceded the disease, which would be necessaryfor drawing any causal inference. Thus, a researcher may use a cross-sectionalstudy to determine the connection between a personal characteristic that doesnot change over time, such as blood type, and existence of a disease, such asaplastic anemia, by examining individuals and determining their blood typesand whether they suffer from aplastic anemia. Cross-sectional studies are infre-quently used when the exposure of interest is an environmental toxic agent(current smoking status is a poor measure of an individual’s history of smoking),
24.See Arthur L. Herbst et al., Adenocarcinoma of the Vagina: Association of Maternal Stilbestrol Therapywith Tumor Appearance, 284 New Eng. J. Med. 878 (1971).
25.Thus, for example, to detect a doubling of disease caused by exposure to an agent where theincidence of disease is 1 in 100 in the unexposed population would require sample sizes of 3,100 eachfor a cohort study, but only 177 each for a case-control study. Harold A. Kahn & Christopher T.Sempos, Statistical Methods in Epidemiology 66 (1989).
343
Reference Manual on Scientific Evidence
but these studies can provide valuable leads to further directions for research.26
4. Ecological studies
Up to now, we have discussed studies in which data on both exposure andhealth outcome are obtained for each individual included in the study.27 Incontrast, studies that collect data only about the group as a whole are calledecological studies.28 In ecological studies, information about individuals is gen-erally not gathered; instead, overall rates of disease or death for different groupsare obtained and compared. The objective is to identify some difference be-tween the two groups, such as diet, genetic makeup, or alcohol consumption,that might explain differences in the risk of disease observed in the two groups.29Such studies may be useful for identifying associations, but they rarely providedefinitive causal answers. The difficulty is illustrated below with an ecologicalstudy of the relationship between dietary fat and cancer.
If a researcher were interested in determining whether a high dietary fatintake is associated with breast cancer, he or she could compare different coun-tries in terms of their average fat intakes and their average rates of breast cancer.If a country with a high average fat intake also tends to have a high rate of breastcancer, the finding would suggest an association between dietary fat and breastcancer. However, such a finding would be far from conclusive, because it lacksparticularized information about an individual’s exposure and disease status (i.e.,whether an individual with high fat intake is more likely to have breast can-cer).30 In addition to the lack of information about an individual’s intake of fat,the researcher does not know about the individual’s exposures to other agents(or other factors, such as a mother’s age at first birth) that may also be respon-sible for the increased risk of breast cancer. This lack of information about eachindividual’s exposure to an agent and disease status detracts from the usefulnessof the study and can lead to an erroneous inference about the relationship be-tween fat intake and breast cancer, a problem known as an ecological fallacy.The fallacy is assuming that, on average, the individuals in the study who have
26.For more information (and references) about cross-sectional studies, see Leon Gordis, Epide-miology 137–39 (1996).
27.Some individual studies may be conducted in which all members of a group or community aretreated as exposed to an agent of interest (e.g., a contaminated water system) and disease status isdetermined individually. These studies should be distinguished from ecological studies.
28.In Renaud v. Martin Marietta Corp., 749 F. Supp. 1545, 1551 (D. Colo. 1990), aff’d, 972 F.2d304 (10th Cir. 1992), the plaintiffs attempted to rely on an excess incidence of cancers in their neigh-borhood to prove causation. Unfortunately, the court confused the role of epidemiology in provingcausation with the issue of the plaintiffs’ exposure to the alleged carcinogen and never addressed theevidentiary value of the plaintiffs’ evidence of a disease cluster (i.e., an unusually high incidence of aparticular disease in a neighborhood or community). Id. at 1554.
29.David E. Lilienfeld & Paul D. Stolley, Foundations of Epidemiology 12 (3d ed. 1994).
30.For a discussion of the data on this question and what they might mean, see David Freedman etal., Statistics (3d ed. 1998).
344
Reference Guide on Epidemiology
suffered from breast cancer consumed more dietary fat than those who have notsuffered from the disease. This assumption may not be true. Nevertheless, thestudy is useful in that it identifies an area for further research: the fat intake ofindividuals who have breast cancer as compared with the fat intake of those whodo not. Researchers who identify a difference in disease or death in a demo-graphic study may follow up with a study based on gathering data about indi-viduals.
Another epidemiologic approach is to compare disease rates over time andfocus on disease rates before and after a point in time when some event ofinterest took place.31 For example, thalidomide’s teratogenicity (capacity to causebirth defects) was discovered after Dr. Widukind Lenz found a dramatic in-crease in the incidence of limb reduction birth defects in Germany beginning in1960. Yet other than with such powerful agents as thalidomide, which increasedthe incidence of limb reduction defects by several orders of magnitude, thesesecular-trend studies (also known as time-line studies) are less reliable and lessable to detect modest causal effects than the observational studies described above.Other factors that affect the measurement or existence of the disease, such asimproved diagnostic techniques and changes in lifestyle or age demographics,may change over time. If those factors can be identified and measured, it may bepossible to control for them with statistical methods. Of course, unknown fac-tors cannot be controlled for in these or any other kind of epidemiologic stud-ies.
C. Epidemiologic and Toxicologic Studies
In addition to observational epidemiology, toxicology models based on animalstudies (in vivo) may be used to determine toxicity in humans.32 Animal studieshave a number of advantages. They can be conducted as true experiments, andresearchers control all aspects of the animals’ lives. Thus, they can avoid theproblem of confounding,33 which epidemiology often confronts. Exposure canbe carefully controlled and measured. Refusals to participate in a study are notan issue, and loss to follow-up very often is minimal. Ethical limitations arediminished, and animals can be sacrificed and their tissues examined, which mayimprove the accuracy of disease assessment. Animal studies often provide useful
31.In Wilson v. Merrell Dow Pharmaceuticals, Inc., 893 F.2d 1149, 1152–53 (10th Cir. 1990), thedefendant introduced evidence showing total sales of Bendectin and the incidence of birth defectsduring the 1970–1984 period. In 1983, Bendectin was removed from the market, but the rate of birthdefects did not change. The Tenth Circuit affirmed the lower court’s ruling that the time-line datawere admissible and that the defendant’s expert witnesses could rely on them in rendering their opin-ions.
32.For an in-depth discussion of toxicology, see Bernard D. Goldstein & Mary Sue Henifin,Reference Guide on Toxicology, in this manual.33.See infra §IV.C.
345
Reference Manual on Scientific Evidence
information about pathological mechanisms and play a complementary role toepidemiology by assisting researchers in framing hypotheses and in developingstudy designs for epidemiologic studies.
Animal studies have two significant disadvantages, however. First, animalstudy results must be extrapolated to another species—human beings—and dif-ferences in absorption, metabolism, and other factors may result in interspeciesvariation in responses. For example, one powerful human teratogen, thalido-mide, does not cause birth defects in most rodent species.34 Similarly, someknown teratogens in animals are not believed to be human teratogens. In gen-eral, it is often difficult to confirm that an agent known to be toxic in animals issafe for human beings.35 The second difficulty with inferring human causationfrom animal studies is that the high doses customarily used in animal studiesrequire consideration of the dose–response relationship and whether a thresholdno-effect dose exists.36 Those matters are almost always fraught with consider-able, and currently unresolvable, uncertainty.37
Toxicologists also use in vitro methods, in which human or animal tissue orcells are grown in laboratories and exposed to certain substances. The problemwith this approach is also extrapolation—whether one can generalize the find-ings from the artificial setting of tissues in laboratories to whole human beings.38
Often toxicologic studies are the only or best available evidence of toxicity.Epidemiologic studies are difficult, time-consuming, and expensive, and conse-quently they do not exist for a large array of environmental agents. Where bothanimal toxicology and epidemiologic studies are available, no universal rulesexist for how to interpret or reconcile them.39 Careful assessment of the meth-34.Phillip Knightley et al., Suffer the Children: The Story of Thalidomide 271–72 (1979).
35.See Ian C.T. Nesbit & Nathan J. Karch, Chemical Hazards to Human Reproduction 98–106(1983); International Agency for Research on Cancer (IARC), Interpretation of Negative Epidemio-logical Evidence for Carcinogenicity (N.J. Wald & R. Doll eds., 1985).36.See infra §V.C & note 119.
37.See General Elec. Co. v. Joiner, 522 U.S. 136, 143–45 (1997) (holding that the district courtdid not abuse its discretion in excluding expert testimony on causation based on expert’s failure toexplain how animal studies supported expert’s opinion that agent caused disease in humans).
38.For a further discussion of these issues, see Bernard D. Goldstein & Mary Sue Henifin, Refer-ence Guide on Toxicology §III.A, in this manual.
39.See IARC, supra note 35 (identifying a number of substances and comparing animal toxicologyevidence with epidemiologic evidence).
A number of courts have grappled with the role of animal studies in proving causation in a toxicsubstance case. One line of cases takes a very dim view of their probative value. For example, in Brockv. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, 313 (5th Cir. 1989), cert. denied, 494 U.S. 1046(1990), the court noted the “very limited usefulness of animal studies when confronted with questionsof toxicity.” A similar view is reflected in Richardson v. Richardson-Merrell, Inc., 857 F.2d 823, 830 (D.C.Cir. 1988), cert. denied, 493 U.S. 882 (1989); Bell v. Swift Adhesives, Inc., 804 F. Supp. 1577, 1579–80(S.D. Ga. 1992); and Cadarian v. Merrell Dow Pharmaceuticals, Inc., 745 F. Supp. 409, 412 (E.D. Mich.1989). Other courts have been more amenable to the use of animal toxicology in proving causation.
346
Reference Guide on Epidemiology
odological validity and power40 of the epidemiologic evidence must be under-taken, and the quality of the toxicologic studies and the questions of interspeciesextrapolation and dose–response relationship must be considered.41
Thus, in Marder v. G.D. Searle & Co., 630 F. Supp. 1087, 1094 (D. Md. 1986), aff’d sub nom. Wheelahanv. G.D. Searle & Co., 814 F.2d 655 (4th Cir. 1987), the court observed: “There is a range of scientificmethods for investigating questions of causation—for example, toxicology and animal studies, clinicalresearch, and epidemiology—which all have distinct advantages and disadvantages.” See also Villari v.Terminix Int’l, Inc., 692 F. Supp. 568, 571 (E.D. Pa. 1988); Peterson v. Sealed Air Corp., Nos. 86-C3498, 88-C9859 Consol., 1991 U.S. Dist. LEXIS 5333, at *27–*29 (N.D. Ill. Apr. 23, 1991); cf. In rePaoli R.R. Yard PCB Litig., 916 F.2d 829, 853–54 (3d Cir. 1990) (questioning the exclusion of animalstudies by the lower court), cert. denied, 499 U.S. 961 (1991). The Third Circuit in a subsequent opinionin Paoli observed:
[I]n order for animal studies to be admissible to prove causation in humans, there must be goodgrounds to extrapolate from animals to humans, just as the methodology of the studies must constitutegood grounds to reach conclusions about the animals themselves. Thus, the requirement of reliability,or “good grounds,” extends to each step in an expert’s analysis all the way through the step thatconnects the work of the expert to the particular case.
In re Paoli R.R. Yard PCB Litig., 35 F.3d 717, 743 (3d Cir. 1994), cert. denied, 513 U.S. 1190 (1995);see also Cavallo v. Star Enter., 892 F. Supp. 756, 761–63 (E.D. Va. 1995) (courts must examine each ofthe steps that lead to an expert’s opinion), aff’d in part and rev’d in part, 100 F.3d 1150 (4th Cir. 1996),cert. denied, 522 U.S. 1044 (1998).
One explanation for these conflicting lines of cases may be that when there is a substantial body ofepidemiologic evidence that addresses the causal issue, animal toxicology has much less probative value.That was the case, for example, in the Bendectin cases of Richardson, Brock, and Cadarian. Whereepidemiologic evidence is not available, animal toxicology may be thought to play a more prominentrole in resolving a causal dispute. See Michael D. Green, Expert Witnesses and Sufficiency of Evidence inToxic Substances Litigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L. Rev. 643,680–82 (1992) (arguing that plaintiffs should be required to prove causation by a preponderance of theavailable evidence); Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1359 (6th Cir.), cert. denied,506 U.S. 826 (1992); In re Paoli R.R. Yard PCB Litig., No. 86-2229, 1992 U.S. Dist. LEXIS 16287,at *16 (E.D. Pa. Oct. 21, 1992). For another explanation of these cases, see Gerald W. Boston, A Mass-Exposure Model of Toxic Causation: The Control of Scientific Proof and the Regulatory Experience, 18 Colum.J. Envtl. L. 181 (1993) (arguing that epidemiologic evidence should be required in mass-exposure casesbut not in isolated-exposure cases). See also IARC, supra note 35; Bernard D. Goldstein & Mary SueHenifin, Reference Guide on Toxicology §I.F, in this manual. The Supreme Court, in General ElectricCo. v. Joiner, 522 U.S. 136, 144–45 (1997), suggested that there is not a categorical rule for toxicologicstudies, observing, “[W]hether animal studies can ever be a proper foundation for an expert’s opinion[is] not the issue.... The [animal] studies were so dissimilar to the facts presented in this litigation thatit was not an abuse of discretion for the District Court to have rejected the experts’ reliance on them.”40.See infra §IV.A.3.
41.See Ellen F. Heineman & Shelia Hoar Zahm, The Role of Epidemiology in Hazard Evaluation, 9Toxic Substances J. 255, 258–62 (1989).
347
Reference Manual on Scientific Evidence
III. How Should Results of an EpidemiologicStudy Be Interpreted?
Epidemiologists are ultimately interested in whether a causal relationship existsbetween an agent and a disease. However, the first question an epidemiologistaddresses is whether an association exists between exposure to the agent anddisease. An association between exposure to an agent and disease exists whenthey occur together more frequently than one would expect by chance.42 Al-though a causal relationship is one possible explanation for an observed associa-tion between an exposure and a disease, an association does not necessarily meanthat there is a cause–effect relationship. Interpreting the meaning of an observedassociation is discussed below.
This section begins by describing the ways of expressing the existence andstrength of an association between exposure and disease. It reviews ways inwhich an incorrect result can be produced because of the sampling methodsused in all observational epidemiologic studies and then examines statisticalmethods for evaluating whether an association is real or due to sampling error.The strength of an association between exposure and disease can be stated asa relative risk, an odds ratio, or an attributable risk (often abbreviated as “RR,”“OR,” and “AR,” respectively). Each of these measurements of associationexamines the degree to which the risk of disease increases when individuals areexposed to an agent.
A. Relative Risk
A commonly used approach for expressing the association between an agent anddisease is relative risk (RR). It is defined as the ratio of the incidence rate (oftenreferred to as incidence) of disease in exposed individuals to the incidence ratein unexposed individuals:
Incidence rate in the exposedRelative Risk (RR) =Incidence rate in the unexposedThe incidence rate of disease reflects the number of cases of disease thatdevelop during a specified period of time divided by the number of persons inthe cohort under study.43 Thus, the incidence rate expresses the risk that a
42.A negative association implies that the agent has a protective or curative effect. Because theconcern in toxic substances litigation is whether an agent caused disease, this reference guide focuses onpositive associations.
43.Epidemiologists also use the concept of prevalence, which measures the existence of disease ina population at a given point in time, regardless of when the disease developed. Prevalence is expressedas the proportion of the population with the disease at the chosen time. See Gordis, supra note 26, at 32–34.
348
Reference Guide on Epidemiology
member of the population will develop the disease within a specified period oftime.
For example, a researcher studies 100 individuals who are exposed to anagent and 200 who are not exposed. After one year, 40 of the exposed individu-als are diagnosed as having a disease, and 20 of the unexposed individuals alsoare diagnosed as having the disease. The relative risk of contracting the disease iscalculated as follows:
•The incidence rate of disease in the exposed individuals is 40 cases per yearper 100 persons (40/100), or 0.4.
•The incidence rate of disease in the unexposed individuals is 20 cases peryear per 200 persons (20/200), or 0.1.
•The relative risk is calculated as the incidence rate in the exposed group(0.4) divided by the incidence rate in the unexposed group (0.1), or 4.0.A relative risk of 4.0 indicates that the risk of disease in the exposed group is fourtimes as high as the risk of disease in the unexposed group.44
In general, the relative risk can be interpreted as follows:
•If the relative risk equals 1.0, the risk in exposed individuals is the same asthe risk in unexposed individuals. There is no association between exposureto the agent and disease.
•If the relative risk is greater than 1.0, the risk in exposed individuals isgreater than the risk in unexposed individuals. There is a positive associa-tion between exposure to the agent and the disease, which could be causal.•If the relative risk is less than 1.0, the risk in exposed individuals is less thanthe risk in unexposed individuals. There is a negative association, whichcould reflect a protective or curative effect of the agent on risk of disease.For example, immunizations lower the risk of disease. The results suggestthat immunization is associated with a decrease in disease and may have aprotective effect on the risk of disease.
Although relative risk is a straightforward concept, care must be taken ininterpreting it. Researchers should scrutinize their results for error. Error in thedesign of a study could yield an incorrect relative risk. Sources of bias and con-founding should be examined.45 Whenever an association is uncovered, furtheranalysis should be conducted to determine if the association is real or due to anerror or bias. Similarly, a study that does not find an association between anagent and disease may be erroneous because of bias or random error.
44.See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 947 (3d Cir. 1990); Gaul v. UnitedStates, 582 F. Supp. 1122, 1125 n.9 (D. Del. 1984).45.See infra §IV.B–C.
349
Reference Manual on Scientific Evidence
B. Odds Ratio
The odds ratio (OR) is similar to a relative risk in that it expresses in quantitativeterms the association between exposure to an agent and a disease.46 In a case-control study, the odds ratio is the ratio of the odds that a case (one with thedisease) was exposed to the odds that a control (one without the disease) wasexposed. In a cohort study, the odds ratio is the ratio of the odds of developinga disease when exposed to a suspected agent to the odds of developing thedisease when not exposed. The odds ratio approximates the relative risk whenthe disease is rare.47
Consider a case-control study, with results as shown schematically in a 2 x 2table (Table 2):
Table 2. Cross-Tabulation of Cases and Controls by Exposure Status
Cases
Controls
ExposedNot Exposedacbd
In a case-control study
Odds Ratio (OR) =the odds that a case was exposedthe odds that a control was exposedLooking at the above 2 x 2 table, this ratio can be calculated as
a/cb/d
This works out to ad/bc. Since we are multiplying two diagonal cells in thetable and dividing by the product of the other two diagonal cells, the odds ratiois also called the cross-products ratio.
Consider the following hypothetical study: A researcher identifies 100 indi-viduals with a disease who serve as “cases” and 100 people without the diseasewho serve as “controls” for her case-control study. Forty of the 100 cases wereexposed to the agent and 60 were not. Among the control group, 20 peoplewere exposed and 80 were not. The data can be presented in a 2 x 2 table (Table3):
46.A relative risk cannot be calculated for a case-control study, because a case-control study beginsby examining a group of persons who already have the disease. That aspect of the study design preventsa researcher from determining the rate at which individuals develop the disease. Without a rate orincidence of disease, a researcher cannot calculate a relative risk.
47.See Marcello Pagano & Kimberlee Gauvreau, Principles of Biostatistics 320–22 (1993). Forfurther detail about the odds ratio and its calculation, see Kahn & Sempos, supra note 25, at 47–56.
350
Reference Guide on Epidemiology
Table 3. Case-Control Study Outcome
Cases(with disease)
Controls(no disease)
ExposedNot ExposedTotal40601002080100
The calculation of the odds ratio would be
OR =40/60= 2.6720/80If the disease is relatively rare in the general population (about 5% or less), theodds ratio is a good approximation of the relative risk, which means that there isalmost a tripling of the disease in those exposed to the agent.48
C. Attributable Risk
A frequently used measurement of risk is the attributable risk (AR). The attrib-utable risk represents the amount of disease among exposed individuals that canbe attributed to the exposure. It can also be expressed as the proportion of thedisease among exposed individuals that is associated with the exposure (alsocalled the “attributable proportion of risk,” the “etiologic fraction” or “attribut-able risk percent”). The attributable risk reflects the maximum proportion ofthe disease that can be attributed to exposure to an agent and consequently themaximum proportion of disease that could be potentially prevented by blockingthe effect of the exposure or by eliminating the exposure.49 In other words, ifthe association is causal, the attributable risk is the proportion of disease in anexposed population that might be caused by the agent and that might be pre-vented by eliminating exposure to that agent (see Figure 3).50
48.The odds ratio is usually marginally greater than the relative risk. As the disease in questionbecomes more common, the difference between the odds ratio and the relative risk grows.
49.Kenneth J. Rothman & Sander Greenland, Modern Epidemiology 53–55 (2d ed. 1998). Seealso Landrigan v. Celotex Corp., 605 A.2d 1079, 1086 (N.J. 1992) (illustrating that a relative risk of 1.55conforms to an attributable risk of 35%, i.e., (1.55-1.0)/1.55=.35 or 35%).
50.Risk is not zero for the control group (those not exposed) when there are other causal chainsthat cause the disease which do not require exposure to the agent. For example, some birth defects arethe result of genetic sources, which do not require the presence of any environmental agent. Also, somedegree of risk in the control group may be the result of background exposure to the agent being studied.For example, nonsmokers in a control group may have been exposed to passive cigarette smoke, whichis responsible for some cases of lung cancer and other diseases. See also Ethyl Corp. v. United StatesEnvtl. Protection Agency, 541 F.2d 1, 25 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976). There are somediseases that do not occur without exposure to an agent; these are known as signature diseases. See infranote 128.
351
Reference Manual on Scientific Evidence
Figure 3. Risks in Exposed and Unexposed Groups
Incidence Due toExposureIncidence NotDue to Exposure{{ExposedGroupUnexposedGroupTo determine the proportion of a disease that is attributable to an exposure, aresearcher would need to know the incidence of the disease in the exposedgroup and the incidence of disease in the unexposed group. The attributable riskis
AR =(incidence in the exposed) - (incidence in the unexposed)incidence in the exposedThe attributable risk can be calculated using the example described in sectionIII.A. Suppose a researcher studies 100 individuals who are exposed to a sub-stance and 200 who are not exposed. After one year, 40 of the exposed indi-viduals are diagnosed as having a disease, and 20 of the unexposed individualsare also diagnosed as having the disease.
•The incidence of disease in the exposed group is 40 persons out of 100 whocontract the disease in a year.
•The incidence of disease in the unexposed group is 20 persons out of 200(or 10 out of 100) who contract the disease in a year.
•The proportion of disease that is attributable to the exposure is 30 personsout of 40, or 75%.
This means that 75% of the disease in the exposed group is attributable to theexposure. We should emphasize here that “attributable” does not necessarilymean “caused by.” Up to this point we have only addressed associations. Infer-ring causation from an association is addressed in section V.
D. Adjustment for Study Groups That Are Not Comparable
Populations often differ in characteristics that relate to disease risk, such as age,sex, and race. Florida has a much higher death rate than Alaska.51 Is sunshinedangerous? Perhaps, but the Florida population is much older than the Alaskapopulation, and some adjustment must be made for the different age demo-51.See Lilienfeld & Stolley, supra note 29, at 68–70 (mortality rate in Florida approximately threetimes what it is in Alaska).
352
Reference Guide on Epidemiology
graphics. The technique used to accomplish this is called adjustment, and twotypes of adjustment are used—direct and indirect.
In direct age adjustment, a standard population is used in order to eliminatethe effects of any age differences between two study populations. Thus, in com-paring two populations, A and B, the age-specific mortality rates for PopulationA are applied to each age group of the standard reference population, and thenumbers of deaths expected in each age group of the standard population arecalculated. These expected numbers of deaths are then totaled to yield the num-ber of deaths expected in the standard population if it experienced the mortalityrisk of Population A. The same procedure is then carried out for Population B.Using these expected numbers of deaths, mortality rates are calculated for thestandard population on the basis of the number of deaths expected if it had themortality experience of Population A and the number of deaths expected if ithad the mortality experience of Population B. We can then compare these rates,called age-adjusted rates, knowing that any difference between these rates can-not be attributed to differences in age, since both age-adjusted rates were gener-ated using the same standard population.
A second approach, indirect age adjustment, is often used, for example, instudying mortality in an occupationally exposed population, such as miners orconstruction workers. To answer the question whether a population of minershas a higher mortality rate than we would expect in a similar population notengaged in mining, we must apply the age-specific rates for a known popula-tion, such as all men of the same age, to each age group in the population ofinterest. This will yield the number of deaths expected in each age group in thepopulation of interest if this population had had the mortality experience of theknown population. The number of deaths expected is thus calculated for eachage group and totaled; the numbers of deaths that were actually observed in thatpopulation are counted. The ratio of the total number of deaths actually ob-served to the total number of deaths that would be expected if the population ofinterest actually had the mortality experience of the known population is thencalculated. This ratio is called the standardized mortality ratio (SMR). When theoutcome of interest is disease rather than death, it is called the standardizedmorbidity ratio.52 If the ratio equals 1.0, the observed number of deaths equalsthe expected number of deaths, and the mortality experience of the populationof interest is no different from that of the known population. If the SMR isgreater than 1.0, the population of interest has a higher mortality risk than thatof the known population, and if the SMR is less than 1.0, the population ofinterest has a lower mortality risk than that of the known population.
52.See In re Joint E. & S. Dist. Asbestos Litig., 52 F.3d 1124, 1128 (2d Cir. 1995) (using SMR todescribe relative risk of an agent in causing disease). For an example of adjustment used to calculate anSMR for workers exposed to benzene, see Robert A. Rinsky et al., Benzene and Leukemia: An Epidemio-logic Risk Assessment, 316 New Eng. J. Med. 1044 (1987).
353
Reference Manual on Scientific Evidence
Thus, age adjustment provides a way to compare populations while in effectholding age constant. Adjustment is used not only for comparing mortality ratesin different populations but also for comparing rates in different groups of sub-jects selected for study in epidemiologic investigations. Although this discussionhas focused on adjusting for age, it is also possible to adjust for any number ofother variables, such as gender, race, occupation, and socioeconomic status. It isalso possible to adjust for several factors simultaneously.53
IV.What Sources of Error Might Have Produced
a False Result?
Incorrect study results occur in a variety of ways. A study may find a positiveassociation (relative risk greater than 1.0) when there is no association. Or astudy may erroneously conclude that there is no association when in realitythere is. A study may also find an association when one truly exists, but theassociation found may be greater or less than the real association.
There are three explanations why an association found in a study may beerroneous: chance, bias, and confounding. Before any inferences about causa-tion are drawn from a study, the possibility of these phenomena must be exam-ined.54
The findings of a study may be the result of chance (or sampling error) be-cause virtually all epidemiologic studies are based on sampling a small propor-tion of the relevant population. During the design of a study, the size of thesample can be increased to reduce (but not eliminate) the likelihood of samplingerror. Once a study has been completed, statistical methods (discussed in thenext subsection) permit an assessment of whether the results of a study are likelyto represent a true association or random error.
The two main techniques for assessing random error are statistical signifi-cance and confidence intervals. A study that is statistically significant has resultsthat are unlikely to be the result of random error, although the level of signifi-cance used entails a somewhat arbitrary determination.55 A confidence interval
53.For further elaboration on adjustment, see Rothman & Greenland, supra note 49, at 234–35;Gordis, supra note 26, at 49–52; Philip Cole, Causality in Epidemiology, Health Policy, and Law, [1997] 27Envtl. L. Rep. (Envtl. L. Inst.) 10279, 10281 (June 1997).
54.See Cole, supra note 53, at 10285. In DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911 F.2d 941,955 (3d Cir. 1990), the court recognized and discussed random sampling error. It then went on to referto other errors (i.e., systematic bias) that create as much or more error in the outcome of a study. For asimilar description of error in study procedure and random sampling, see David H. Kaye & David A.Freedman, Reference Guide on Statistics §IV, in this manual.
55.Describing a study result as “statistically significant” does not mean that the result—the relativerisk—is of a significant or substantial magnitude. Statistical significance does not address the magnitude of the
354
Reference Guide on Epidemiology
provides both the relative risk found in the study and a range (interval) withinwhich the true relative risk resides with some (arbitrarily chosen) level of confi-dence. Both of these techniques are explained in subsection IV.A.
Bias (or systematic error) also can produce error in the outcome of a study.Epidemiologists attempt to minimize the existence of bias through their studydesign, which is developed before they begin gathering data. However, eventhe best designed and conducted studies can have biases, which may be subtle.Consequently, after a study is completed it should be evaluated for potentialsources of bias. Sometimes, after bias is identified, the epidemiologist can deter-mine whether the bias would tend to inflate or dilute any association that mayexist. Identification of the bias may permit the epidemiologist to make an assess-ment of whether the study’s conclusions are valid. Epidemiologists may reana-lyze a study’s data to correct for a bias identified in a completed study or tovalidate the analytic methods used.56 Common biases and how they may pro-duce invalid results are described in subsection IV.B.
Finally, a study may reach incorrect conclusions about causation because,although the agent and disease are associated, the agent is not a true causalfactor. Rather, the agent may be associated with another agent that is the truecausal factor, and this factor confounds the relationship being examined in thestudy. Confounding is explained in subsection IV.C.
A.What Statistical Methods Exist to Evaluate the Possibility ofSampling Error?57
Before detailing the statistical methods used to assess random error (which weuse as synonymous with sampling error), we explain two concepts that are cen-tral to epidemiology and statistical analysis. Understanding these concepts shouldfacilitate comprehension of the statistical methods.
Epidemiologists often refer to the true association (also called “real associa-tion”), which is the association that really exists between an agent and a diseaseand that might be found by a perfect (but nonexistent) study. The true associa-tion is a concept that is used in evaluating the results of a given study eventhough its value is unknown. By contrast, a study’s outcome will produce anobserved association, which is known.
relative risk found in a study, only the likelihood that it would have resulted from random error if thereis no real association between the agent and disease.
56.E.g., Richard A. Kronmal et al., The Intrauterine Device and Pelvic Inflammatory Disease: TheWomen’s Health Study Reanalyzed, 44 J. Clinical Epidemiology 109 (1991) (reanalysis of a study thatfound an association between use of IUDs and pelvic inflammatory disease concluded that IUDs do notincrease the risk of pelvic inflammatory disease).
57.For a bibliography on the role of statistical significance in legal proceedings, see Sanders, supranote 13, at 329 n.138.
355
Reference Manual on Scientific Evidence
Scientists, including epidemiologists, generally begin an empirical study witha hypothesis that they seek to disprove,58 called the null hypothesis. The nullhypothesis states that there is no true association between an agent and a disease.Thus, the epidemiologist begins by technically assuming that the relative risk is1.0 and seeks to develop data that may disprove the hypothesis.59
1. False positive error and statistical significance
When a study results in a positive association (i.e., a relative risk greater than1.0), epidemiologists try to determine whether that outcome represents a trueassociation or is the result of random error.60 Random error is illustrated by afair coin yielding five heads out of five tosses,61 an occurrence that would result,purely by chance, in about 3% of a series of five tosses. Thus, even though thetrue relative risk is 1.0, an epidemiologic study may find a relative risk greaterthan 1.0 because of random error. An erroneous conclusion that the null hy-pothesis is false (i.e., a conclusion that there is a difference in risk when nodifference actually exists) owing to random error is called a false positive error ortype I error or alpha error.
Common sense leads one to believe that a large enough sample of individualsmust be studied if the study is to identify a relationship between exposure to anagent and disease that truly exists. Common sense also suggests that by enlargingthe sample size (the size of the study group), researchers can form a more accu-rate conclusion and reduce the chance of random error in their results. Bothstatements are correct and can be illustrated by a test to determine if a coin isfair. A test in which a coin is tossed 1,000 times is more helpful than a test inwhich the coin is tossed only 10 times. Common sense dictates that it is far morelikely that a test of a fair coin with 10 tosses will come up, for example, with
58.See, e.g., Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579, 593 (1993) (scientific method-ology involves generating and testing hypotheses). We should explain that this null-hypothesis testingmodel may be misleading. The reality is that the vast majority of epidemiologic studies are conductedbecause the researcher suspects that there is a causal effect and seeks to demonstrate that causal relation-ship. Nevertheless, epidemiologists prepare their study designs and test the plausibility that any associa-tion found in a study was the result of sampling error by using the null hypothesis.
59.See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 945 (3d Cir. 1990); Stephen E.Fienberg et al., Understanding and Evaluating Statistical Evidence in Litigation, 36 Jurimetrics J. 1, 21–24(1995).
60.Hypothesis testing is one of the most counterintuitive techniques in statistics. Given a set ofepidemiologic data, one wants to ask the straightforward, obvious question, What is the probability thatthe difference between two samples reflects a real difference between the populations from which theywere taken? Unfortunately, there is no way to answer this question directly or to calculate the probabil-ity. Instead, statisticians—and epidemiologists—address a related but very different question: If therereally is no difference between the populations, how probable is it that one would find a difference atleast as large as the observed difference between the samples? See Expert Evidence: A Practitioner’sGuide to Law, Science, and the FJC Manual 91 (Bert Black & Patrick W. Lee eds., 1997).61.DeLuca, 911 F.2d at 946–47.
356
Reference Guide on Epidemiology
80% heads than will a test with 1,000 tosses. For if the test is conducted withlarger numbers (1,000 tosses), the stability of the outcome of the test is less likelyto be influenced by random error, and the researcher would have greaterconfidence in the inferences drawn from the data.62
One means for evaluating the possibility that an observed association couldhave occurred as a result of random error is by calculating a p-value.63 A p-valuerepresents the probability that a positive association would result from randomerror if no association were in fact present.64 Thus, a p-value of .1 means thatthere is a 10% chance that if the true relative risk is 1.0, the observed relative risk(greater than 1.0) in the study was due to random error.65
To minimize false positive error, epidemiologists use a convention that the p-value must fall below some selected level known as alpha or significance levelfor the results of the study to be statistically significant.66 Thus, an outcome isstatistically significant when the observed p-value for the study falls below thepreselected significance level. The most common significance level, or alpha,
62.This explanation of numerical stability was drawn from Brief Amicus Curiae of Professor AlvanR. Feinstein in Support of Respondent at 12–13, Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579(1993) (No. 92-102). See also Allen v. United States, 588 F. Supp. 247, 417–18 (D. Utah 1984), rev’d onother grounds, 816 F.2d 1417 (10th Cir. 1987), cert. denied, 484 U.S. 1004 (1988). The Allen courtobserved that although “[s]mall communities or groups of people are deemed ‘statistically unstable’”and “data from small populations must be handled with care [, it] does not mean that [the data] cannotprovide substantial evidence in aid of our effort to describe and understand events.”
63.See also David H. Kaye & David A. Freedman, Reference Guide on Statistics §IV.B, in thismanual (p-value reflects the implausibility of the null hypothesis).
64.Technically, a p-value represents the probability that the study’s association or a larger onewould occur as a result of sampling error where no association (or, equivalently, the null hypothesis) isthe true situation. This means that if one conducted an examination of 20 associations in which the trueRR = 1, on average one of those examinations would result in a statistically significant, yet spurious,association.
Unfortunately, some have failed to appreciate the difference between a statement of the probabilitythat the study’s outcome would occur as a result of random error (the correct understanding of a p-value) if the true association were RR equal to 1 and a statement of the probability that the study’soutcome was due to random error (an incorrect understanding of a p-value). See, e.g., In re TMI CasesConsol. II, 922 F. Supp. 997, 1017 (M.D. Pa. 1996); Barnes v. Secretary of Dep’t of Health & HumanServs., No. 92-0032V, 1997 U.S. Claims LEXIS 212, at *22 (Fed. Cl. Sept. 15, 1997) (“The P value... [measures] the probability that the results could have happened by chance alone.”). Conventionalstatistical methodology does not permit calculation of the latter probability. However, the p-value isused to assess the plausibility that a positive association should be taken to disprove the null hypothesisand permit an inference, after assessing the factors discussed in section V infra, that the agent causesdisease.
65.Technically, a p-value of .1 means that if in fact there is no association, 10% of all similar studieswould be expected to yield an association the same as, or greater than, the one found in the study dueto random error.
66.Allen, 588 F. Supp. at 416–17 (discussing statistical significance and selection of a level of alpha);see also Sanders, supra note 13, at 343–44 (explaining alpha, beta, and their relationship to sample size);Developments in the Law—Confronting the New Challenges of Scientific Evidence, 108 Harv. L. Rev. 1481,1535–36, 1540–46 (1995) [hereinafter Developments in the Law].
357
Reference Manual on Scientific Evidence
used in science is .05.67 A .05 value means that the probability is 5% of observ-ing an association at least as large as that found in the study when in truth thereis no association.68 Although .05 is often the significance level selected, otherlevels can and have been used.69 Thus, in its study of the effects of secondhandsmoke, the Environmental Protection Agency (EPA) used a .10 standard forsignificance testing.70
67.A common error made by lawyers, judges, and academics is to equate the level of alpha withthe legal burden of proof. Thus, one will often see a statement that using an alpha of .05 for statisticalsignificance imposes a burden of proof on the plaintiff far higher than the civil burden of a preponder-ance of the evidence (i.e., greater than 50%). See, e.g., Ethyl Corp. v. United States Envtl. ProtectionAgency, 541 F.2d 1, 28 n.58 (D.C. Cir.), cert. denied, 426 U.S. 941 (1976); Hodges v. Secretary of Dep’tof Health & Human Servs., 9 F.3d 958, 967, 970 (Fed. Cir. 1993) (Newman, J., dissenting); Edward J.Imwinkelried, The Admissibility of Expert Testimony in Christophersen v. Allied-Signal Corp.: The Ne-glected Issue of the Validity of Nonscientific Reasoning by Scientific Witnesses, 70 Denv. U. L. Rev. 473, 478(1993).
This claim is incorrect, although the reasons are a bit complex and a full explanation would requiremore space and detail than is feasible here. Nevertheless, we sketch out a brief explanation: First, alphadoes not address the likelihood that a plaintiff’s disease was caused by exposure to the agent; the magni-tude of the association bears on that question. See infra §VII. Second, significance testing only bears onwhether the observed magnitude of association arose as a result of random chance, not on whether thenull hypothesis is true. Third, using stringent significance testing to avoid false positive error comes at acomplementary cost of inducing false negative error. See DeLuca v. Merrell Dow Pharms., Inc., 911F.2d 941, 947 (3d Cir. 1990). Fourth, using an alpha of .5 would not be equivalent to saying that theprobability the association found is real is 50%, and the probability that it is a result of random error is50%. Statistical methodology does not permit assessments of those probabilities. See Green, supra note39, at 686; Michael D. Green, Science Is to Law as the Burden of Proof Is to Significance Testing, 37 JurimetricsJ. 205 (1997) (book review); see also David H. Kaye, Apples and Oranges: Confidence Coefficients and theBurden of Persuasion, 73 Cornell L. Rev. 54, 66 (1987); David H. Kaye & David A. Freedman, Refer-ence Guide on Statistics §IV.B.2, in this manual; Developments in the Law, supra note 66, at 1551–56;Allen v. United States, 588 F. Supp. 247, 417 (D. Utah 1984) (“Whether a correlation between a causeand a group of effects is more likely than not—particularly in a legal sense—is a different question fromthat answered by tests of statistical significance ....”), rev’d on other grounds, 816 F.2d 1417 (10th Cir.1987), cert. denied, 484 U.S. 1004 (1988); Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1357n.2 (6th Cir.), cert. denied, 506 U.S. 826 (1992); cf. DeLuca, 911 F.2d at 959 n.24 (“The relationshipbetween confidence levels and the more likely than not standard of proof is a very complex one ... andin the absence of more education than can be found in this record, we decline to comment further onit.”).
68.This means that if one conducted an examination of a large number of associations in which thetrue RR equals 1, on average 1 in 20 associations found to be statistically significant at a .05 level wouldbe spurious. When researchers examine many possible associations that might exist in their data—known as data dredging—we should expect that even if there are no associations, those researchers willfind statistically significant associations in 1 of every 20 associations examined. See Rachel Nowak,Problems in Clinical Trials Go Far Beyond Misconduct, 264 Science 1538, 1539 (1994).
69.A significance test can be either one-tailed or two-tailed, depending on the null hypothesisselected by the researcher. Since most investigators of toxic substances are only interested in whetherthe agent increases the incidence of disease (as distinguished from providing protection from the dis-ease), a one-tailed test is often viewed as appropriate. For an explanation of the difference between one-tailed and two-tailed tests, see David H. Kaye & David A. Freedman, Reference Guide on Statistics§IV.C.2, in this manual.
70.U.S. Envtl. Protection Agency, Respiratory Health Effects of Passive Smoking: Lung Cancerand Other Disorders (1992); see also Turpin, 959 F.2d at 1353–54 n.1 (confidence level frequently set at
358
Reference Guide on Epidemiology
Statistical significance is a term that speaks only to the question of samplingerror—it does not address the magnitude of any association found in a study.71A study may be statistically significant but may find only a very weak associa-tion; conversely, a study with small sample sizes may find a high relative risk butstill not be statistically significant.72
There is some controversy among epidemiologists and biostatisticians aboutthe appropriate role of significance testing.73 To the strictest significance testers,any study whose p-value is not less than the level chosen for statistical significanceshould be rejected as inadequate to disprove the null hypothesis. Others are
95%, though 90% (which corresponds to an alpha of .10) is also used; selection of the value is “some-what arbitrary”).
71.Unfortunately, some courts have been confused about the relationship between statisticalsignificance and the magnitude of the association. See In re Joint E. & S. Dist. Asbestos Litig., 827 F.Supp. 1014, 1041 (S.D.N.Y. 1993), rev’d on other grounds, 52 F.3d 1124 (2d Cir. 1995) (concluding thatany relative risk less than 1.50 is statistically insignificant).
72.See Cole, supra note 53, at 10282. While statistical significance and association are two distinctconcepts, whether a study’s results are statistically significant does depend, in part, on the incidence ofdisease and the magnitude of any association found in the study. In other words, the more common thedisease and the greater the association between an agent and the disease, the more likely that a study’soutcome will be statistically significant, all other things being equal. Also critical to alpha is the numberof persons participating in the study. As the disease becomes more infrequent, the sample sizes decrease,and the associations found are weaker, it is less likely that the results will be statistically significant.73.Similar controversy exists among the courts that have confronted the issue of whether statisti-cally significant studies are required to satisfy the burden of production. The leading case advocatingstatistically significant studies is Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, 312 (5th Cir.),amended, 884 F.2d 167 (5th Cir. 1989), cert. denied, 494 U.S. 1046 (1990). Overturning a jury verdict forthe plaintiff in a Bendectin case, the court observed that no statistically significant study had beenpublished that found an increased relative risk for birth defects in children whose mothers had takenBendectin. The court concluded: “[W]e do not wish this case to stand as a bar to future Bendectin casesin the event that new and statistically significant studies emerge which would give a jury a firmer basison which to determine the issue of causation.” Brock v. Merrell Dow Pharms., Inc., 884 F.2d 167, 167(5th Cir. 1989).
A number of courts have followed the Brock decision or have indicated strong support for signifi-cance testing as a screening device. See Kelley v. American Heyer-Schulte Corp., 957 F. Supp. 873, 878(W.D. Tex. 1997) (lower end of confidence interval must be above 1.0—equivalent to requiring that astudy be statistically significant—before a study may be relied upon by an expert), appeal dismissed, 139F.3d 899 (5th Cir. 1998); Renaud v. Martin Marietta Corp., 749 F. Supp. 1545, 1555 (D. Colo. 1990)(quoting Brock approvingly), aff’d, 972 F.2d 304 (10th Cir. 1992); Thomas v. Hoffman-LaRoche, Inc.,731 F. Supp. 224, 228 (N.D. Miss. 1989) (granting judgment n.o.v. and observing that “there is a totalabsence of any statistically significant study to assist the jury in its determination of the issue of causa-tion”), aff’d on other grounds, 949 F.2d 806 (5th Cir.), cert. denied, 504 U.S. 956 (1992); Daubert v.Merrell Dow Pharms., Inc., 727 F. Supp. 570, 575 (S.D. Cal. 1989), aff’d on other grounds, 951 F.2d1128 (9th Cir. 1991), vacated, 509 U.S. 579 (1993); Wade-Greaux v. Whitehall Labs., Inc., 874 F.Supp. 1441 (D.V.I. 1994); Merrell Dow Pharms., Inc. v. Havner, 953 S.W.2d 706, 724 (Tex. 1997).By contrast, a number of courts appear more cautious about using significance testing as a necessarycondition, instead recognizing that assessing the likelihood of random error is important in determiningthe probative value of a study. In Allen v. United States, 588 F. Supp. 247, 417 (D. Utah 1984), the courtstated, “The cold statement that a given relationship is not ‘statistically significant’ cannot be read tomean there is no probability of a relationship.” The Third Circuit described confidence intervals (i.e.,the range of values within which the true value is thought to lie, with a specified level of confidence)
359
Reference Manual on Scientific Evidence
critical of using strict significance testing, which rejects all studies with an ob-served p-value below that specified level. Epidemiologic studies have becomeincreasingly sophisticated in addressing the issue of random error and examiningthe data from studies to ascertain what information they may provide about therelationship between an agent and a disease, without the rejection of all studiesthat are not statistically significant.74
Calculation of a confidence interval permits a more refined assessment ofappropriate inferences about the association found in an epidemiologic study.75A confidence interval is a range of values calculated from the results of a study,within which the true value is likely to fall; the width of the interval reflectsrandom error. The advantage of a confidence interval is that it displays moreinformation than significance testing. What a statement about whether a result isstatistically significant does not provide is the magnitude of the association foundin the study or an indication of how statistically stable that association is. Aconfidence interval for any study shows the relative risk determined in the studyas a point on a numerical axis. It also displays the boundaries of relative risk
and their use as an alternative to statistical significance in DeLuca v. Merrell Dow Pharmaceuticals, Inc., 911F.2d 941, 948–49 (3d Cir. 1990). See also Turpin v. Merrell Dow Pharms., Inc., 959 F.2d 1349, 1357(6th Cir.) (“The defendant’s claim overstates the persuasive power of these statistical studies. An analysisof this evidence demonstrates that it is possible that Bendectin causes birth defects even though thesestudies do not detect a significant association.”), cert. denied, 506 U.S. 826 (1992); In re Bendectin Prod.Liab. Litig., 732 F. Supp. 744, 748–49 (E.D. Mich. 1990) (rejecting defendant’s claim that plaintiffcould not prevail without statistically significant epidemiologic evidence); Berry v. CSX Transp., Inc.,709 So. 2d 552, 570 (Fla. Dist. Ct. App. 1998) (refusing to hold studies that were not statisticallysignificant inadmissible).
Although the trial court had relied in part on the absence of statistically significant epidemiologicstudies, the Supreme Court in Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), did notexplicitly address the matter. The Court did, however, refer to “the known or potential rate of error”in identifying factors relevant to the scientific validity of an expert’s methodology. Id. at 594. TheCourt did not address any specific rate of error, although two cases that it cited affirmed the admissibil-ity of voice spectrograph results that the courts reported were subject to a 2%–6% chance of error owingto either false matches or false eliminations. One commentator has concluded, “Daubert did not set athreshold level of statistical significance either for admissibility or for sufficiency of scientific evidence.”Developments in the Law, supra note 66, at 1535–36, 1540–46. The Supreme Court in General Electric Co.v. Joiner, 522 U.S. 136, 145–47 (1997), adverted to the lack of statistical significance in one study reliedon by an expert as a ground for ruling that the district court had not abused its discretion in excludingthe expert’s testimony.
74.See Sanders, supra note 13, at 342 (describing the improved handling and reporting of statisticalanalysis in studies of Bendectin after 1980).
75.Kenneth Rothman, Professor of Public Health at Boston University and Adjunct Professor ofEpidemiology at the Harvard School of Public Health, is one of the leaders in advocating the use ofconfidence intervals and rejecting strict significance testing. In DeLuca, 911 F.2d at 947, the ThirdCircuit discussed Rothman’s views on the appropriate level of alpha and the use of confidence intervals.In Turpin, 959 F.2d at 1353–54 n.1, the court discussed the relationship among confidence intervals,alpha, and power. The use of confidence intervals in evaluating sampling error more generally than inthe epidemiologic context is discussed in David H. Kaye & David A. Freedman, Reference Guide onStatistics §IV.A, in this manual.
360
Reference Guide on Epidemiology
consistent with the data found in the study based on one or several selectedlevels of alpha or statistical significance. An example of two confidence intervalsthat might be calculated for a study is displayed in Figure 4.Figure 4. Confidence Intervals
p < .05RR 0.8 1.1 1.5 2.2 3.4The confidence interval shown in Figure 4 represents a study that found arelative risk of 1.5, with boundaries of 0.8 to 3.4 for alpha equal to .05 (equiva-lently, a confidence level of .95) and boundaries of 1.1 to 2.2 for alpha equal to.10 (equivalently, a confidence level of .90). Because the boundaries of theconfidence interval with alpha set at .05 encompass a relative risk of 1.0, thestudy is not statistically significant at that level. By contrast, since the confidenceboundaries for alpha equal to .10 do not include a relative risk of 1.0, the studydoes have a positive finding that is statistically significant at that level of alpha.The larger the sample size in a study (all other things being equal), the narrowerthe confidence boundaries will be (indicating greater statistical stability), therebyreflecting the decreased likelihood that the association found in the study wouldoccur if the true association is 1.0.76
76.Where multiple epidemiologic studies are available, a technique known as meta-analysis (seeinfra §VI) may be used to combine the results of the studies to reduce the numerical instability of all thestudies. See generally Diana B. Petitti, Meta-analysis, Decision Analysis, and Cost-Effectiveness Analysis:Methods for Quantitative Synthesis in Medicine (2d ed. 2000). Meta-analysis is better suited to poolingresults from randomly controlled experimental studies, but if carefully performed it may also be helpfulfor observational studies, such as those in the epidemiologic field. See Zachary B. Gerbarg & Ralph I.Horwitz, Resolving Conflicting Clinical Trials: Guidelines for Meta-Analysis, 41 J. Clinical Epidemiology503 (1988).
In In re Paoli Railroad Yard PCB Litigation, 916 F.2d 829, 856–57 (3d Cir. 1990), cert. denied, 499U.S. 461 (1991), the court discussed the use and admissibility of meta-analysis as a scientific technique.Overturning the district court’s exclusion of a report using meta-analysis, the Third Circuit observedthat meta-analysis is a regularly used scientific technique. The court recognized that the techniquemight be poorly performed, and it required the district court to reconsider the validity of the expert’swork in performing the meta-analysis. See also E.R. Squibb & Sons, Inc. v. Stuart Pharms., No. 90-1178, 1990 U.S. Dist. LEXIS 15788, at *41 (D.N.J. Oct. 16, 1990) (acknowledging the utility of meta-analysis but rejecting its use in that case because one of the two studies included was poorly performed);Tobin v. Astra Pharm. Prods., Inc., 993 F.2d 528, 538–39 (6th Cir. 1992) (identifying an error in theperformance of a meta-analysis, in which the Food and Drug Administration (FDA) pooled data from
}361
}p < .10Reference Manual on Scientific Evidence
2. False negative error
False positives can be reduced by adopting more stringent values for alpha. Us-ing a level of .01 or .001 will result in fewer false positives than using an alpha of.05. The trade-off for reducing false positives is an increase in false negativeerrors (also called beta errors or type II errors). This concept reflects the possibil-ity that a study will be interpreted not to disprove the null hypothesis when infact there is a true association of a specified magnitude.77 The beta for any studycan be calculated only based on a specific alternative hypothesis about a givenpositive relative risk and a specific level of alpha selected;78 that is, beta, or thelikelihood of erroneously failing to reject the null hypothesis, depends on theselection of an alternative hypothesis about the magnitude of association and thelevel of alpha chosen.
3. Power
When a study fails to find a statistically significant association, an importantquestion is whether the result tends to exonerate the agent’s toxicity or is essen-tially inconclusive with regard to toxicity. The concept of power can be helpfulin evaluating whether a study’s outcome is exonerative or inconclusive.79
The power of a study expresses the probability of finding a statistically signifi-cant association of a given magnitude (if it exists) in light of the sample sizes usedin the study. The power of a study depends on several factors: the sample size;the level of alpha, or statistical significance, specified; the background incidenceof disease; and the specified relative risk that the researcher would like to de-tect.80 Power curves can be constructed that show the likelihood of finding anygiven relative risk in light of these factors. Often power curves are used in thedesign of a study to determine what size the study populations should be.81
The power of a study is the complement of beta (1 – β). Thus, a study witha likelihood of .25 of failing to detect a true relative risk of 2.082 or greater has apower of .75. This means the study has a 75% chance of detecting a true relativerisk of 2.0. If the power of a negative study to find a relative risk of 2.0 or greater
control groups in different studies in which some gave the controls a placebo and others gave thecontrols an alternative treatment), cert. denied, 510 U.S. 914 (1993).
77.See also DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 947 (3d Cir. 1990).78.See Green, supra note 39, at 684–89.
79.See Fienberg et al., supra note 59, at 22–23.
80.See Malcolm Gladwell, How Safe Are Your Breasts?, New Republic, Oct. 24, 1994, at 22, 26.81.For examples of power curves, see Kenneth J. Rothman, Modern Epidemiology 80 (1986);Pagano & Gauvreau, supra note 47, at 223.
82.We use a relative risk of 2.0 for illustrative purposes because of the legal significance somecourts have attributed to this magnitude of association. See infra §VII.
362
Reference Guide on Epidemiology
is low, it has significantly less probative value than a study with similar resultsbut a higher power.83
B. What Biases May Have Contributed to an ErroneousAssociation?
Systematic error or bias can produce an erroneous association in an epidemio-logic study. Bias may arise in the design or conduct of a study, data collection, ordata analysis. When scientists use the term bias, it does not necessarily carry animputation of prejudice or other subjective factors, such as the researcher’s de-sire for a particular outcome. The meaning of scientific bias differs from con-ventional (and legal) usage, in which bias refers to a partisan point of view.84Bias refers to anything (other than random sampling error) that results in error ina study and thereby compromises its validity. The two main classes of bias areselection bias (inappropriate selection of study subjects) and information bias (aflaw in measuring exposure or disease in the study groups).
Most epidemiologic studies have some degree of bias that may affect theoutcome. If major bias is present it may invalidate the study results. Finding thebias, however, can be difficult if not impossible. In reviewing the validity of anepidemiologic study, the epidemiologist must identify potential biases and ana-lyze the amount or kind of error that might have been induced by the bias.Often the direction of error can be determined; depending on the specific typeof bias, it may exaggerate the real association, dilute it, or even completely maskit.
1. Selection bias
Selection bias refers to the error in an observed association that is due to themethod of selection of cases and controls (in a case-control study) or exposedand unexposed individuals (in a cohort study).85 The selection of an appropriate
83.See also David H. Kaye & David A. Freedman, Reference Guide on Statistics §IV.C.1, in thismanual.
84.A Dictionary of Epidemiology 15 (John M. Last ed., 3d ed. 1995); Edmond A. Murphy, TheLogic of Medicine 239–62 (1976).
85.Selection bias is defined as “[e]rror due to systematic differences in characteristics betweenthose who are selected for study and those who are not.” A Dictionary of Epidemiology, supra note 84,at 153.
In In re “Agent Orange” Product Liability Litigation, 597 F. Supp. 740, 783 (E.D.N.Y. 1985), aff’d, 818F.2d 145 (2d Cir. 1987), cert. denied, 484 U.S. 1004 (1988), the court expressed concern about selectionbias. The exposed cohort consisted of young, healthy men who served in Vietnam. Comparing themortality rate of the exposed cohort and that of a control group made up of civilians might haveresulted in error that was due to selection bias. Failing to account for health status as an independentvariable tends to understate any association between exposure and disease in studies in which the ex-posed cohort is healthier.
363
Reference Manual on Scientific Evidence
control group has been described as the Achilles’ heel of a case-control study.86Selecting members of the control group (those without disease) is problematicin case-control studies if the control participants were selected for reasons thatare related to their having the exposure or potential risk factor being studied.Hospital-based studies, which are relatively common among researchers lo-cated in medical centers, illustrate the problem. Suppose an association is foundbetween coffee drinking and coronary heart disease in a study using hospitalpatients as controls. The problem is that the hospitalized control group mayinclude individuals who had been advised against drinking coffee for medicalreasons, such as to prevent aggravation of a peptic ulcer. In other words, thecontrols may become eligible for the study because of their medical condition,which is in turn related to their exposure status—their likelihood of avoidingcoffee. If this is true, the amount of coffee drinking in the control group wouldunderstate the extent of coffee drinking expected in people who do not havethe disease, and thus bias upwardly (i.e., exaggerate) any odds ratio observed.87Bias in hospital studies may also understate the true odds ratio when the expo-sures at issue led to the cases’ hospitalizations and also contributed to the con-trols’ chances of hospitalization.
Just as case-control study controls should be selected independently of theirexposure status, in cohort studies, unexposed controls should be selected inde-pendently of their disease risk. For example, in a cohort study of cervical cancer,those who are not at risk for the disease—women who have had their cervicesremoved and men—should be excluded from the study population. Inclusion ofsuch individuals as controls in a cohort study could result in erroneous findingsby overstating the association between the agent and the disease.
A further source of selection bias occurs when those selected to participaterefuse to participate or drop out before the study is completed. Many studieshave shown that individuals who participate in studies differ significantly fromthose who do not. If a significant portion of either study group refuses to par-ticipate in the study, the researcher should investigate reasons for refusal andwhether those who refused are different from those who agreed. The researchercan show that those in the study are not a biased sample by comparing relevantcharacteristics of individuals who refused to participate with those of individualswho participated to show the similarity of the groups or the degree of differ-ences. Similarly, if a significant number of subjects drop out of a study beforecompletion, there may be a problem in determining whether the remainingsubjects are representative of the original study populations. The researcher should
86.William B. Kannel & Thomas R. Dawber, Coffee and Coronary Disease, 289 New Eng. J. Med.100 (1973) (editorial).
87.Hershel Jick et al., Coffee and Myocardial Infarction, 289 New Eng. J. Med. 63 (1973).
364
Reference Guide on Epidemiology
examine whether the study groups are still representative of the original studypopulations.
The fact that a study may suffer from selection bias does not in itself invalidateits results. A number of factors may suggest that a bias, if present, had onlylimited effect. If the association is particularly strong, for example, bias is lesslikely to account for all of it. In addition, in studies with multiple control groups,the consistent finding of an association when cases are compared with differentcontrol groups suggests that possible biases applicable to a particular controlgroup are not invalidating.
2. Information bias
Information bias refers to the bias resulting from inaccurate information aboutthe study participants regarding either their disease or exposure status. In a case-control study, potential information bias is an important consideration becausethe researcher depends on information from the past to determine exposure anddisease and their temporal relationship. In some situations the researcher is re-quired to interview the subjects about past exposures, thus relying on the sub-jects’ memories. Research has shown that individuals with disease (cases) maymore readily recall past exposures than individuals with no disease (controls);88this creates a potential for bias called recall bias.
For example, consider a case-control study conducted to examine the causeof congenital malformations. The epidemiologist is interested in whether themalformations were caused by an infection during the mother’s pregnancy.89 Agroup of mothers of malformed infants (cases) and a group of mothers of infantswith no malformation (controls) are interviewed regarding infections duringpregnancy. Mothers of children with malformations may recall an inconsequen-tial fever or runny nose during pregnancy that readily would be forgotten by amother who had a normal infant. Even if in reality the infection rate in mothersof malformed children is no different from the rate in mothers of normal chil-dren, the result in this study would be an apparently higher rate of infection inthe mothers of the children with the malformations solely on the basis of recalldifferences between the two groups. The issue of recall bias can sometimes beevaluated by finding a second source of data to validate the subject’s response
88.Steven S. Coughlin, Recall Bias in Epidemiologic Studies, 43 J. Clinical Epidemiology 87 (1990).89.See Brock v. Merrell Dow Pharms., Inc., 874 F.2d 307, 311–12 (5th Cir. 1989) (discussion ofrecall bias among women who bear children with birth defects), cert. denied, 494 U.S. 1046 (1990). Wenote that the court was mistaken in its assertion that a confidence interval could correct for recall bias,or for any bias for that matter. Confidence intervals are a statistical device for analyzing error that mayresult from random sampling. Systematic errors (bias) in the design or data collection are not addressedby statistical methods, such as confidence intervals or statistical significance. See Green, supra note 39, at667–68; Vincent M. Brannigan et al., Risk, Statistical Inference, and the Law of Evidence: The Use ofEpidemiological Data in Toxic Tort Cases, 12 Risk Analysis 343, 344–45 (1992).
365
Reference Manual on Scientific Evidence
(e.g., blood test results from prenatal visits or medical records that documentsymptoms of infection).90 Alternatively, the mothers’ responses to questions aboutother exposures may shed light on the presence of a bias affecting the recall ofthe relevant exposures. Thus, if mothers of cases do not recall greater exposurethan controls’ mothers to pesticides, children with German measles, and so forth,then one can have greater confidence in their recall of illnesses.
Bias may also result from reliance on interviews with surrogates, individualsother than the study subjects. This is often necessary when, for example, a sub-ject (in a case-control study) has died of the disease under investigation.
There are many sources of information bias that affect the measure of expo-sure, including its intensity and duration. Exposure to the agent can be mea-sured directly or indirectly.91 Sometimes researchers use a biological marker as adirect measure of exposure to an agent—an alteration in tissue or body fluidsthat occurs as a result of an exposure and that can be detected in the laboratory.Biological markers are only available for a small number of toxins and onlyreveal whether a person was exposed. Biological markers rarely help determinethe intensity or duration of exposure.92
Monitoring devices also can be used to measure exposure directly but oftenare not available for exposures that occurred in the past. For past exposures,epidemiologists often use indirect means of measuring exposure, such as inter-viewing workers and reviewing employment records. Thus, all those employedto install asbestos insulation may be treated as having been exposed to asbestosduring the period that they were employed. However, there may be a widevariation of exposure within any job, and these measures may have limited ap-plicability to a given individual. If the agent of interest is a drug, medical orhospital records can be used to determine past exposure. Thus, retrospective
90.Two researchers who used a case-control study to examine the association between congenitalheart disease and the mother’s use of drugs during pregnancy corroborated interview data with themother’s medical records. See Sally Zierler & Kenneth J. Rothman, Congenital Heart Disease in Relationto Maternal Use of Bendectin and Other Drugs in Early Pregnancy, 313 New Eng. J. Med. 347, 347–48(1985).
91.See In re Paoli R.R. Yard PCB Litig., No. 86-2229, 1992 U.S. Dist LEXIS 18430, at *9–*11(E.D. Pa. Oct. 21, 1992) (discussing valid methods of determining exposure to chemicals).
92.Dose generally refers to the intensity or magnitude of exposure multiplied by the time exposed.See Sparks v. Owens-Illinois, Inc., 38 Cal. Rptr. 2d 739, 742 (Ct. App. 1995). For a discussion of thedifficulties of determining dose from atomic fallout, see Allen v. United States, 588 F. Supp. 247, 425–26(D. Utah 1984), rev’d on other grounds, 816 F.2d 1417 (10th Cir. 1987), cert. denied, 484 U.S. 1004(1988). The timing of exposure may also be critical, especially if the disease of interest is a birth defect.In Smith v. Ortho Pharmaceutical Corp., 770 F. Supp. 1561, 1577 (N.D. Ga. 1991), the court criticized astudy for its inadequate measure of exposure to spermicides. The researchers had defined exposure asreceipt of a prescription for spermicide within 600 days of delivery, but this definition of exposure is toobroad because environmental agents are only likely to cause birth defects during a narrow band of time.A different, but related, problem often arises in court. Determining the plaintiff’s exposure to thealleged toxic substance always involves a retrospective determination and may involve difficulties simi-
366
Reference Guide on Epidemiology
occupational or environmental measurements of exposure are usually less accu-rate than prospective studies or follow-up studies, especially ones in which adrug or medical intervention is the independent variable being measured.
The route (e.g., inhalation or absorption), duration, and intensity of expo-sure are important factors in assessing disease causation. Even with environmen-tal monitoring, the dose measured in the environment generally is not the sameas the dose that reaches internal target organs. If the researcher has calculated theinternal dose of exposure, the scientific basis for this calculation should be ex-amined for soundness.93
In assessing whether the data may reflect inaccurate information, one mustassess whether the data were collected from objective and reliable sources. Medicalrecords, government documents, employment records, death certificates, andinterviews are examples of data sources that are used by epidemiologists to mea-sure both exposure and disease status.94 The accuracy of a particular source mayaffect the validity of a research finding. If different data sources are used tocollect information about a study group, differences in the accuracy of thosesources may affect the validity of the findings. For example, using employmentrecords to gather information about exposure to narcotics probably would leadto inaccurate results, since employees tend to keep such information private. Ifthe researcher uses an unreliable source of data, the study may not be useful tothe court.
The kinds of quality-control procedures used may affect the accuracy of thedata. For data collected by interview, quality-control procedures should probethe reliability of the individual and whether the information is verified by othersources. For data collected and analyzed in the laboratory, quality-control pro-cedures should probe the validity and reliability of the laboratory test.
Information bias may also result from inaccurate measurement of disease sta-tus. The quality and sophistication of the diagnostic methods used to detect a
lar to those faced by an epidemiologist planning a study. Thus, in Christophersen v. Allied-Signal Corp.,939 F.2d 1106, 1113 (5th Cir. 1991), cert. denied, 503 U.S. 912 (1992), the court criticized the plaintiff’sexpert, who relied on an affidavit of a co-worker to determine the dose of nickel and cadmium towhich the decedent had been exposed.
In asbestos litigation, a number of courts have adopted a requirement that the plaintiff demonstrate(1) regular use by an employer of the defendant’s asbestos-containing product; (2) the plaintiff’s prox-imity to that product; and (3) exposure over an extended period of time. See, e.g., Lohrmann v. Pitts-burgh Corning Corp., 782 F.2d 1156, 1162–64 (4th Cir. 1986).
93.See also Bernard D. Goldstein & Mary Sue Henifin, Reference Guide on Toxicology §I.D, inthis manual.
94.Even these sources may produce unanticipated error. Identifying the causal connection be-tween asbestos and mesothelioma, a rare form of cancer, was complicated and delayed because doctorswho were unfamiliar with mesothelioma erroneously identified other causes of death in death certifi-cates. See David E. Lilienfeld & Paul D. Gunderson, The “Missing Cases” of Pleural Malignant Mesothe-lioma in Minnesota, 1979–81: Preliminary Report, 101 Pub. Health Rep. 395, 397–98 (1986).
367
Reference Manual on Scientific Evidence
disease should be assessed. The proportion of subjects who were examined alsoshould be questioned. If, for example, many of the subjects refused to be tested,the fact that the test used was of high quality would be of relatively little value.The scientific validity of the research findings is influenced by the reliabilityof the diagnosis of disease or health status.95 For example, a researcher interestedin studying spontaneous abortion in the first trimester needs to test women forpregnancy. Diagnostic criteria that are accepted by the medical community shouldbe used to make the diagnosis. If a diagnosis is made using an unreliable homepregnancy kit known to have a high rate of false positive results (indicatingpregnancy when the woman is not pregnant), the study will overestimate thenumber of spontaneous abortions.
Misclassification bias is a form of information bias in which, because of prob-lems with the information available, individuals in the study may be misclassifiedwith regard to exposure status or disease status. Misclassification bias has beensubdivided into differential misclassification and nondifferential misclassification.Nondifferential misclassification occurs when inaccuracies in determining ex-posure are independent of disease status or when inaccuracies in diagnoses areindependent of exposure status. This is a common problem resulting from thelimitations of data collection. Generally, nondifferential misclassification biasleads to a shift in the odds ratio toward one, or, in other words, toward a findingof no effect. Thus, if the errors are nondifferential, it is generally misguided tocriticize an apparent association between an exposure and disease on the groundsthat data were inaccurately classified. Instead, nondifferential misclassificationgenerally serves to reduce the observed association below its true magnitude.Differential misclassification refers to the differential error in determiningexposure in cases as compared with controls, or disease status in unexposedcohorts relative to exposed cohorts. In a case-control study this would occur,for example, if, in the process of anguishing over the possible causes of thedisease, parents of ill children recalled more exposures to a particular agent thanactually occurred, or if parents of the controls, for whom the issue was lessemotionally charged, recalled fewer. This can also occur in a cohort study inwhich, for example, birth control users, the exposed cohort, are monitoredmore closely for potential side effects, leading to a higher rate of diseaseidentification in that cohort than in the unexposed cohort. Depending on howthe misclassification occurs, a differential bias can produce an error in eitherdirection—the exaggeration or understatement of an association.
95.In In re Swine Flu Immunization Products Liability Litigation, 508 F. Supp. 897, 903 (D. Colo.1981), aff’d sub nom. Lima v. United States, 708 F.2d 502 (10th Cir. 1983), the court critically evaluateda study relied on by an expert whose testimony was stricken. In that study, determination of whether apatient had Guillain-Barré syndrome was made by medical clerks, not physicians who were familiarwith diagnostic criteria.
368
Reference Guide on Epidemiology
3. Other conceptual problems
Sometimes studies are flawed because of flawed definitions or premises that donot fall under the rubric of selection bias or information bias. For example, if theresearcher defines the disease of interest as all birth defects, rather than a specificbirth defect, he or she must have a scientific basis to hypothesize that the effectsof the agent being investigated could be so varied. If the effect is in fact morelimited, the result of this conceptualization error could be to dilute or mask anyreal effect that the agent might have on a specific type of birth defect.96
Examining a study for potential sources of bias is an important task that helpsdetermine the accuracy of a study’s conclusions. In addition, when a source ofbias is identified, it may be possible to determine whether the error tended toexaggerate or understate the true association. Thus, bias may exist in a study thatnevertheless has probative value.
Even if one concludes that the findings of a study are statistically stable andthat biases have not created significant error, additional considerations remain.As repeatedly noted, an association does not necessarily mean a causal relation-ship exists. To make a judgment about causation, a knowledgeable expert mustconsider the possibility of confounding factors. The expert must also evaluateseveral criteria to determine whether an inference of causation is appropriate.These matters are discussed below.
C. Could a Confounding Factor Be Responsible for the StudyResult?97
Even when an association exists, researchers must determine whether the expo-sure causes the disease or whether the exposure and disease are caused by someother confounding factor. A confounding factor is both a risk factor for thedisease and a factor associated with the exposure of interest. For example, re-searchers may conduct a study that finds individuals with gray hair have a higherrate of death than those with hair of another color. Instead of hair color havingan impact on death, the results might be explained by the confounding factor ofage. If old age is associated differentially with the gray-haired group (those withgray hair tend to be older), old age may be responsible for the association foundbetween hair color and death.98 Researchers must separate the relationship be-96.In Brock v. Merrell Dow Pharmaceuticals, Inc., 874 F.2d 307, 312 (5th Cir. 1989), cert. denied, 494U.S. 1046 (1990), the court discussed a reanalysis of a study in which the effect was narrowed from allcongenital malformations to limb reduction defects. The magnitude of the association changed by 50%when the effect was defined in this narrower fashion. See Rothman & Greenland, supra note 49, at 132(“Unwarranted assurances of a lack of any effect can easily emerge from studies in which a wide rangeof etiologically unrelated outcomes are grouped.”).
97.See Grassis v. Johns-Manville Corp., 591 A.2d 671, 675 (N.J. Super. Ct. App. Div. 1991)(discussing the possibility that confounders may lead to an erroneous inference of a causal relationship).98.This example is drawn from Kahn & Sempos, supra note 25, at 63.
369
Reference Manual on Scientific Evidence
tween gray hair and risk of death from that of old age and risk of death. Whenresearchers find an association between an agent and a disease, it is critical todetermine whether the association is causal or the result of confounding.99 Someepidemiologists classify confounding as a form of bias. However, confounding isa reality—that is, the observed association of a factor and a disease is actually theresult of an association with a third, confounding factor. Failure to recognizeconfounding can introduce a bias—error—into the findings of the study.
In 1981, Dr. Brian MacMahon, Professor and Chairman of the Departmentof Epidemiology at the Harvard School of Public Health, reported an associa-tion between coffee drinking and cancer of the pancreas in the New EnglandJournal of Medicine.100 This observation caused a great stir, and in fact, one coffeedistributor ran a large advertisement in the New York Times refuting the findingsof the study. What could MacMahon’s findings mean? One possibility is thatthe association is causal and that drinking coffee causes an increased risk of can-cer of the pancreas. However, there is also another possibility. We know thatsmoking is an important risk factor for cancer of the pancreas. We also knowthat it is difficult to find a smoker who does not drink coffee. Thus, drinkingcoffee and smoking are associated. An observed association between coffee con-sumption and an increased risk of cancer of the pancreas could reflect the factthat smoking causes cancer of the pancreas and that smoking also is associatedclosely with coffee consumption. The association MacMahon found betweendrinking coffee and pancreatic cancer could be due to the confounding factor ofsmoking. To be fair to MacMahon, we must note that he was aware of thepossibility of confounding and took it into account in his study design by gath-ering and analyzing data separately for smokers and nonsmokers. The associa-tion between coffee and pancreatic cancer remained even when smoking wastaken into account.
The main problem in many observational studies such as MacMahon’s is thatthe individuals are not assigned randomly to the groups being compared.101 Asdiscussed above, randomization maximizes the possibility that exposures other
99.Confounding can bias a study result by either exaggerating or diluting any true association. Oneexample of a confounding factor that may result in a study’s outcome understating an association isvaccination. Thus, if a group exposed to an agent has a higher rate of vaccination for the disease understudy than the unexposed group, the vaccination may reduce the rate of disease in the exposed group,thereby producing an association that is less than the true association without the confounding ofvaccination.
100.Brian MacMahon et al., Coffee and Cancer of the Pancreas, 304 New Eng. J. Med. 630 (1981).101.Randomization attempts to ensure that the presence of a characteristic, such as coffee drink-ing, is governed by chance, as opposed to being determined by the presence of an underlying medicalcondition. For additional comments on randomization and confounding, see the Glossary of Terms.
370
Reference Guide on Epidemiology
than the one under study are evenly distributed between the exposed and thecontrol cohorts.102 In observational studies, by contrast, other forces, includingself-selection, determine who is exposed to other (possibly causal) factors. Thelack of randomization leads to the potential problem of confounding. Thus, forexample, the exposed cohort might consist of those who are exposed at work toan agent suspected of being an industrial toxin. The members of this cohortmay, however, differ from controls by residence, socioeconomic status, age, orother extraneous factors.103 These other factors may be causing the disease, butbecause of potential confounding, an apparent (yet false) association of the dis-ease with exposure to the agent may appear. Confounders, like smoking in theMacMahon study, do not reflect an error made by the investigators; rather, theyreflect the inherently “uncontrolled” nature of observational studies. When theycan be identified, confounders should be taken into account. Confounding fac-tors that are suspected or known in advance can be controlled during the studydesign through study-group selection. Unanticipated confounding factors thatare suspected after data collection can sometimes be controlled during data analysis,if data have been gathered about them.
MacMahon’s study found that coffee drinkers had a higher rate of pancreaticcancer than those who did not drink coffee. To evaluate whether smoking is aconfounding factor, the researcher would divide each of the exposed and con-trol groups into smoking and nonsmoking subgroups to examine whether sub-jects’ smoking status affects the study results. If the outcome in the smokingsubgroups is the same as that in the nonsmoking subgroups, smoking is not aconfounding factor. If the subjects’ smoking status affects the outcome, thensmoking is a confounder, for which adjustment is required. If the associationbetween coffee drinking and pancreatic cancer completely disappears when thesubjects’ smoking status is considered, then smoking is a confounder that fullyaccounts for the association with coffee observed. Table 4 reveals a hypotheticalstudy’s results, with smoking being a weak confounding factor, which, whenaccounted for, does not eliminate the association between coffee drinking andcancer.
102.See Rothman & Greenland, supra note 49, at 124; see also supra § II.A.
103.See, e.g., In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 783 (E.D.N.Y. 1984)(discussing the problem of confounding that might result in a study of the effect of exposure to AgentOrange on Vietnam servicemen), aff’d, 818 F.2d 145 (2d Cir. 1987), cert. denied, 484 U.S. 1004 (1988).
371
Reference Manual on Scientific Evidence
Table 4. Pancreatic Cancer Study Data
PancreaticCancerStatus
All Subjects
Coffee
ControlsDrinkers
Smokers>1 Pack per Day
Coffee
ControlsDrinkers
Nonsmokers
Coffee
ControlsDrinkers
CancerNo CancerRR141,3931.1174763.987331.2112634.666601.062133.1Note: RR = relative risk.
There is always a real risk that an undiscovered or unrecognized confoundingfactor may contribute to a study’s findings, by either magnifying or reducing theobserved association.104 It is, however, necessary to keep that risk in perspective.Often the mere possibility of uncontrolled confounding is used to call into ques-tion the results of a study. This was certainly the strategy of those seeking, orunwittingly helping, to undermine the implications of the studies persuasivelylinking cigarette smoking to lung cancer. The critical question is whether it isplausible that the findings of a given study could indeed be due to unrecognizedconfounders.
1. What techniques can be used to prevent or limit confounding?
Choices in the design of a research project (e.g., methods for selecting the sub-jects) can prevent or limit confounding. When a factor or factors, such as age,sex, or even smoking status, are considered potential confounders in a study,investigators can limit the differential distribution of these factors in the studygroups by selecting controls to “match” cases (or the exposed group) in terms ofthese variables. If the two groups are matched, for example, by age, then anyassociation observed in the study cannot be due to age, the matched variable.105
Restricting the persons who are permitted as subjects in a study is anothermethod to control for confounders. If age or sex is suspected as a confounder,then the subjects enrolled in a study can be limited to those of one sex and thosewho are within a specified age range. When there is no variance among subjectsin a study with regard to a potential confounder, confounding as a result of thatvariable is eliminated.
104.Rothman & Greenland, supra note 49, at 120; see also supra § II.A.
105.Selecting a control population based on matched variables necessarily affects the representa-tiveness of the selected controls and may affect how generalizable the study results are to the populationat large. However, for a study to have merit, it must first be internally valid, that is, it must not besubject to unreasonable sources of bias or confounding. Only after a study has been shown to meet thisstandard does its universal applicability or generalizability to the population at large become an issue.When a study population is not representative of the general or target population, existing scientificknowledge may permit reasonable inferences about the study’s broader applicability, or additional con-firmatory studies of other populations may be necessary.
372
Reference Guide on Epidemiology
2. What techniques can be used to identify confounding factors?
Once the study data are ready to be analyzed, the researcher must assess a rangeof factors that could influence risk. In the case of MacMahon’s study, the re-searcher would evaluate whether smoking is a confounding factor by comparingthe risk of pancreatic cancer in all coffee drinkers (including smokers) with therisk in nonsmoking coffee drinkers. If the risk is substantially the same, smokingis not a confounding factor (e.g., smoking does not distort the relationship be-tween coffee drinking and the development of pancreatic cancer), which is whatMacMahon found. If the risk is substantially different, but still exists in thenonsmoking group, then smoking is a confounder, but doesn’t wholly accountfor the association with coffee. If the association disappears, then smoking is aconfounder that fully accounts for the association with coffee observed.
3. What techniques can be used to control for confounding factors?
To control for confounding factors during data analysis, researchers can use oneof two techniques: stratification or multivariate analysis.
Stratification reduces or eliminates confounding by evaluating the effect of anexposure at different levels (strata) of exposure to the confounding variable.Statistical methods then can be applied to combine the results of exposure ateach stratum into an overall single estimate of risk. For example, in MacMahon’sstudy of smoking and pancreatic cancer, if smoking had been a confoundingfactor, the researchers could have stratified the data by creating subgroups basedon how many cigarettes each subject smoked a day (e.g., a nonsmoking group,a light smoking group, a medium smoking group, and a heavy smoking group).When different rates of pancreatic cancer for people in each group who drinkthe same amount of coffee are compared, the effect of smoking on pancreaticcancer is revealed. The effect of the confounding factor can then be removedfrom the study results.
Multivariate analysis controls for the confounding factor through mathemati-cal modeling. Models are developed to describe the simultaneous effect of ex-posure and confounding factors on the increase in risk.106
Both of these methods allow for “adjustment” of the effect of confounders.They both modify an observed association to take into account the effect of riskfactors that are not the subject of the study and that may distort the associationbetween the exposure being studied and the disease outcomes.
If the association between exposure and disease remains after the researchercompletes the assessment and adjustment for confounding factors, the researcherthen applies the guidelines described in section V to determine whether aninference of causation is warranted.
106.For a more complete discussion, of multivariate analysis, see Daniel L. Rubinfeld, ReferenceGuide on Multiple Regression, in this manual.
373
Reference Manual on Scientific Evidence
V.General Causation: Is an Exposure a Cause ofthe Disease?
Once an association has been found between exposure to an agent and develop-ment of a disease, researchers consider whether the association reflects a truecause–effect relationship. When epidemiologists evaluate whether a cause–ef-fect relationship exists between an agent and disease, they are using the termcausation in a way similar to, but not identical with, the way the familiar “butfor,” or sine qua non, test is used in law for cause in fact. “An act or an omissionis not regarded as a cause of an event if the particular event would have occurredwithout it.”107 This is equivalent to describing the act or occurrence as a neces-sary link in a chain of events that results in the particular event.108 Epidemiolo-gists use causation to mean that an increase in the incidence of disease amongthe exposed subjects would not have occurred had they not been exposed to theagent. Thus, exposure is a necessary condition for the increase in the incidenceof disease among those exposed.109 The relationship between the epidemiologicconcept of cause and the legal question of whether exposure to an agent causedan individual’s disease is addressed in section VII.
As mentioned in section I, epidemiology cannot objectively prove causation;rather, causation is a judgment for epidemiologists and others interpreting theepidemiologic data. Moreover, scientific determinations of causation are inher-ently tentative. The scientific enterprise must always remain open to reassessingthe validity of past judgments as new evidence develops.
In assessing causation, researchers first look for alternative explanations forthe association, such as bias or confounding factors, which were discussed insection IV. Once this process is completed, researchers consider how guidelines
107.W. Page Keeton et al., Prosser and Keeton on the Law of Torts 265 (5th ed. 1984); see alsoRestatement (Second) of Torts §432(1) (1965).
When multiple causes are each operating and capable of causing an event, the but-for, or necessary-condition, concept for causation is problematic. This is the familiar “two-fires” scenario in which twoindependent fires simultaneously burn down a house and is sometimes referred to as overdeterminedcause. Neither fire is a but-for, or necessary condition, for the destruction of the house, because eitherfire would have destroyed the house. See id. §432(2). This two-fires situation is analogous to an indi-vidual being exposed to two agents, each of which is capable of causing the disease contracted by theindividual. A difference between the disease scenario and the fire scenario is that, in the former, one willhave no more than a probabilistic assessment of whether each of the exposures would have caused thedisease in the individual.108.See supra note 8.
109.See Rothman & Greenland, supra note 49, at 8 (“We can define a cause of a specific diseaseevent as an antecedent event, condition, or characteristic that was necessary for the occurrence of thedisease at the moment it occurred, given that other conditions are fixed.”); Allen v. United States, 588F. Supp. 247, 405 (D. Utah 1984) (quoting a physician on the meaning of the statement that radiationcauses cancer), rev’d on other grounds, 816 F.2d 1417 (10th Cir. 1987), cert. denied, 484 U.S. 1004 (1988).
374
Reference Guide on Epidemiology
for inferring causation from an association apply to the available evidence. Theseguidelines consist of several key inquiries that assist researchers in making ajudgment about causation.110 Most researchers are conservative when it comesto assessing causal relationships, often calling for stronger evidence and moreresearch before a conclusion of causation is drawn.111
The factors that guide epidemiologists in making judgments about causationare
1.temporal relationship;2.strength of the association;3.dose–response relationship;4.replication of the findings;
5.biological plausibility (coherence with existing knowledge);6.consideration of alternative explanations;7.cessation of exposure;
8.specificity of the association; and9.consistency with other knowledge.
There is no formula or algorithm that can be used to assess whether a causalinference is appropriate based on these guidelines. One or more factors may beabsent even when a true causal relationship exists. Similarly, the existence ofsome factors does not ensure that a causal relationship exists. Drawing causalinferences after finding an association and considering these factors requires judg-ment and searching analysis, based on biology, of why a factor or factors may beabsent despite a causal relationship, and vice-versa. While the drawing of causalinferences is informed by scientific expertise, it is not a determination that ismade by using scientific methodology.
110.See Mervyn Susser, Causal Thinking in the Health Sciences: Concepts and Strategies in Epi-demiology (1973); In re Joint E. & S. Dist. Asbestos Litig., 52 F.3d 1124, 1128–30 (2d Cir. 1995)(discussing lower courts’ use of factors to decide whether an inference of causation is justified when anassociation exists).
111.Berry v. CSX Transp., Inc., 709 So. 2d 552, 568 n.12 (Fla. Dist. Ct. App. 1998) (“Almost allgenres of research articles in the medical and behavioral sciences conclude their discussion with qualify-ing statements such as ‘there is still much to be learned.’ This is not, as might be assumed, an expressionof ignorance, but rather an expression that all scientific fields are open-ended and can progress fromtheir present state ....”); Hall v. Baxter Healthcare Corp., 947 F. Supp. 1387 App. B. at 1446–51 (D.Or. 1996) (report of Merwyn R. Greenlick, court-appointed epidemiologist). In Cadarian v. MerrellDow Pharmaceuticals, Inc., 745 F. Supp. 409 (E.D. Mich. 1989), the court refused to permit an expert torely on a study that the authors had concluded should not be used to support an inference of causationin the absence of independent confirmatory studies. The court did not address the question whether thedegree of certainty used by epidemiologists before making a conclusion of cause was consistent with thelegal standard. See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 957 (3d Cir. 1990) (standard ofproof for scientific community is not necessarily appropriate standard for expert opinion in civil litiga-tion); Wells v. Ortho Pharm. Corp., 788 F.2d 741, 745 (11th Cir.), cert. denied, 479 U.S. 950 (1986).
375
Reference Manual on Scientific Evidence
These guidelines reflect criteria proposed by the U.S. Surgeon General in1964112 in assessing the relationship between smoking and lung cancer and ex-panded upon by A. Bradford Hill in 1965.113
A. Is There a Temporal Relationship?
A temporal, or chronological, relationship must exist for causation. If an expo-sure causes disease, the exposure must occur before the disease develops.114 Ifthe exposure occurs after the disease develops, it cannot cause the disease. Al-though temporal relationship is often listed as one of many factors in assessingwhether an inference of causation is justified, it is a necessary factor: Withoutexposure before disease, causation cannot exist.
B. How Strong Is the Association Between the Exposure andDisease?115
The relative risk is one of the cornerstones for causal inferences.116 Relative riskmeasures the strength of the association. The higher the relative risk, the greaterthe likelihood that the relationship is causal.117 For cigarette smoking, for ex-ample, the estimated relative risk for lung cancer is very high, about 10.118 Thatis, the risk of lung cancer in smokers is approximately ten times the risk innonsmokers.
A relative risk of 10, as seen with smoking and lung cancer, is so high that itis extremely difficult to imagine any bias or confounding factor that might ac-count for it. The higher the relative risk, the stronger the association and thelower the chance that the effect is spurious. Although lower relative risks can
112.U.S. Dep’t of Health, Educ., and Welfare, Public Health Serv., Smoking and Health: Reportof the Advisory Committee to the Surgeon General (1964).
113.A. Bradford Hill, The Environment and Disease: Association or Causation?, 58 Proc. Royal Soc’yMed. 295 (1965) (Hill acknowledged that his factors could only serve to assist in the inferential process:“None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effecthypothesis and none can be required as a sine qua non.”).
114.See Carroll v. Litton Sys., Inc., No. B-C-88-253, 1990 U.S. Dist. LEXIS 16833, at *29(W.D.N.C. Oct. 29, 1990) (“[I]t is essential for... [the plaintiffs’ medical experts opining on causa-tion] to know that exposure preceded plaintiffs’ alleged symptoms in order for the exposure to beconsidered as a possible cause of those symptoms....”).
115.Assuming that an association is determined to be causal, the strength of the association plays animportant role legally in determining the specific causation question—whether the agent caused anindividual plaintiff’s injury. See infra §VII.116.See supra §III.A.
117.See Cook v. United States, 545 F. Supp. 306, 316 n.4 (N.D. Cal. 1982); Landrigan v. CelotexCorp., 605 A.2d 1079, 1085 (N.J. 1992). The use of the strength of the association as a factor does notreflect a belief that weaker effects occur less frequently than stronger effects. See Green, supra note 39, at652–53 n.39. Indeed, the apparent strength of a given agent is dependent on the prevalence of the othernecessary elements that must occur with the agent to produce the disease, rather than on some inherentcharacteristic of the agent itself. See Rothman & Greenland, supra note 49, at 9–11.118.See Doll & Hill, supra note 7.
376
Reference Guide on Epidemiology
reflect causality, the epidemiologist will scrutinize such associations more closelybecause there is a greater chance that they are the result of uncontrolled con-founding or biases.
C. Is There a Dose–Response Relationship?
A dose–response relationship means that the more intense the exposure, thegreater the risk of disease. Generally, higher exposures should increase the inci-dence (or severity) of disease. However, some causal agents do not exhibit adose–response relationship when, for example, there is a threshold phenom-enon (i.e., an exposure may not cause disease until the exposure exceeds a cer-tain dose).119 Thus, a dose–response relationship is strong, but not essential,evidence that the relationship between an agent and disease is causal.
D. Have the Results Been Replicated?
Rarely, if ever, does a single study conclusively demonstrate a cause–effect rela-tionship.120 It is important that a study be replicated in different populations andby different investigators before a causal relationship is accepted by epidemiolo-gists and other scientists.
The need to replicate research findings permeates most fields of science. Inepidemiology, research findings often are replicated in different populations.121Consistency in these findings is an important factor in making a judgment aboutcausation. Different studies that examine the same exposure–disease relationship
119.The question whether there is a no-effect threshold dose is a controversial one in a variety oftoxic substances areas. See, e.g., Irving J. Selikoff, Disability Compensation for Asbestos-AssociatedDisease in the United States: Report to the U.S. Department of Labor 181–220 (1981); Paul Kotin,Dose–Response Relationships and Threshold Concepts, 271 Annals N.Y. Acad. Sci. 22 (1976); K. Robock,Based on Available Data, Can We Project an Acceptable Standard for Industrial Use of Asbestos? Absolutely, 330Annals N.Y. Acad. Sci. 205 (1979); Ferebee v. Chevron Chem. Co., 736 F.2d 1529, 1536 (D.C. Cir.)(dose–response relationship for low doses is “one of the most sharply contested questions currentlybeing debated in the medical community”), cert. denied, 469 U.S. 1062 (1984); In re TMI Litig. Consol.Proc., 927 F. Supp. 834, 844–45 (M.D. Pa. 1996) (discussing low-dose extrapolation and no-doseeffects for radiation exposure).
Moreover, good evidence to support or refute the threshold-dose hypothesis is exceedingly unlikelybecause of the inability of epidemiology or animal toxicology to ascertain very small effects. Cf. ArnoldL. Brown, The Meaning of Risk Assessment, 37 Oncology 302, 303 (1980). Even the shape of the dose–response curve—whether linear or curvilinear, and if the latter, the shape of the curve—is a matter ofhypothesis and speculation. See Allen v. United States, 588 F. Supp. 247, 419–24 (D. Utah 1984), rev’don other grounds, 816 F.2d 1417 (10th Cir. 1987), cert. denied, 484 U.S. 1004 (1988); Troyen A. Brennan& Robert F. Carter, Legal and Scientific Probability of Causation for Cancer and Other Environmental Diseasein Individuals, 10 J. Health Pol’y & L. 33, 43–44 (1985).
120.In Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 901 (N.D. Iowa 1982), aff’d sub nom.Kehm v. Procter & Gamble Mfg. Co., 724 F.2d 613 (8th Cir. 1983), the court remarked on thepersuasive power of multiple independent studies, each of which reached the same finding of an asso-ciation between toxic shock syndrome and tampon use.
121.See Cadarian v. Merrell Dow Pharms., Inc., 745 F. Supp. 409, 412 (E.D. Mich. 1989) (hold-
377
Reference Manual on Scientific Evidence
generally should yield similar results. While inconsistent results do not rule outa causal nexus, any inconsistencies signal a need to explore whether differentresults can be reconciled with causality.
E. Is the Association Biologically Plausible (Consistent withExisting Knowledge)?122
Biological plausibility is not an easy criterion to use and depends upon existingknowledge about the mechanisms by which the disease develops. When bio-logical plausibility exists, it lends credence to an inference of causality. For ex-ample, the conclusion that high cholesterol is a cause of coronary heart disease isplausible because cholesterol is found in atherosclerotic plaques. However, ob-servations have been made in epidemiologic studies that were not biologicallyplausible at the time but subsequently were shown to be correct. When anobservation is inconsistent with current biological knowledge, it should not bediscarded, but the observation should be confirmed before significance is at-tached to it. The saliency of this factor varies depending on the extent of scientificknowledge about the cellular and subcellular mechanisms through which thedisease process works. The mechanisms of some diseases are understood betterthan the mechanisms of others.
F. Have Alternative Explanations Been Considered?
The importance of considering the possibility of bias and confounding and rul-ing out the possibilities was discussed above.123
G. What Is the Effect of Ceasing Exposure?
If an agent is a cause of a disease one would expect that cessation of exposure tothat agent ordinarily would reduce the risk of the disease. This has been the case,for example, with cigarette smoking and lung cancer. In many situations, how-ever, relevant data are simply not available regarding the possible effects of end-ing the exposure. But when such data are available and eliminating exposurereduces the incidence of disease, this factor strongly supports a causal relation-ship.
ing a study on Bendectin insufficient to support an expert’s opinion, because “the study’s authorsthemselves concluded that the results could not be interpreted without independent confirmatory evi-dence”).
122.A number of courts have adverted to this criterion in the course of their discussions of causa-tion in toxic substances cases. E.g., Cook v. United States, 545 F. Supp. 306, 314–15 (N.D. Cal. 1982)(discussing biological implausibility of a two-peak increase of disease when plotted against time); Landriganv. Celotex Corp., 605 A.2d 1079, 1085–86 (N.J. 1992) (discussing the existence vel non of biologicalplausibility). See also Bernard D. Goldstein & Mary Sue Henifin, Reference Guide on Toxicology,§III.E, in this manual.
123.See supra §IV.B–C.
378
Reference Guide on Epidemiology
H. Does the Association Exhibit Specificity?
An association exhibits specificity if the exposure is associated only with a singledisease or type of disease.124 The vast majority of agents do not cause a widevariety of effects. For example, asbestos causes mesothelioma and lung cancerand may cause one or two other cancers, but there is no evidence that it causesany other types of cancers. Thus, a study that finds that an agent is associatedwith many different diseases should be examined skeptically. Nevertheless, theremay be causal relationships in which this guideline is not satisfied. Cigarettemanufacturers have long claimed that because cigarettes have been linked tolung cancer, emphysema, bladder cancer, heart disease, pancreatic cancer, andother conditions, there is no specificity and the relationships are not causal.There is, however, at least one good reason why inferences about the healthconsequences of tobacco do not require specificity: because tobacco and ciga-rette smoke are not in fact single agents but consist of numerous harmful agents,smoking represents exposure to multiple agents, with multiple possible effects.Thus, while evidence of specificity may strengthen the case for causation, lackof specificity does not necessarily undermine it where there is a plausible bio-logical explanation for its absence.
I. Are the Findings Consistent with Other Relevant Knowledge?
In addressing the causal relationship of lung cancer to cigarette smoking, re-searchers examined trends over time for lung cancer and for cigarette sales in theUnited States. A marked increase in lung cancer death rates in men was ob-served, which appeared to follow the increase in sales of cigarettes. Had theincrease in lung cancer deaths followed a decrease in cigarette sales, it mighthave given researchers pause. It would not have precluded a causal inference,but the inconsistency of the trends in cigarette sales and lung cancer mortalitywould have had to be explained.
124.This criterion reflects the fact that although an agent causes one disease, it does not necessarilycause other diseases. See, e.g., Nelson v. American Sterilizer Co., 566 N.W.2d 671, 676–77 (Mich. Ct.App. 1997) (affirming dismissal of plaintiff’s claims that chemical exposure caused her liver disorder, butrecognizing that evidence supported claims for neuropathy and other illnesses); Sanderson v. Interna-tional Flavors & Fragrances, Inc., 950 F. Supp. 981, 996–98 (C.D. Cal. 1996).
379
Reference Manual on Scientific Evidence
VI.What Methods Exist for Combining the
Results of Multiple Studies?
Not infrequently, the court may be faced with a number of epidemiologic stud-ies whose findings differ. These may be studies in which one shows an associa-tion and the other does not, or studies which report associations, but of differentmagnitude. In view of the fact that epidemiologic studies may disagree and thatoften many of the studies are small and lack the statistical power needed fordefinitive conclusions, the technique of meta-analysis was developed.125 Meta-analysis is a method of pooling study results to arrive at a single figure to repre-sent the totality of the studies reviewed. It is a way of systematizing the time-honored approach of reviewing the literature, which is characteristic of science,and placing it in a standardized framework with quantitative methods for esti-mating risk. In a meta-analysis, studies are given different weights in proportionto the sizes of their study populations and other characteristics.126
Meta-analysis is most appropriate when used in pooling randomized experi-mental trials, because the studies included in the meta-analysis share the mostsignificant methodological characteristics, in particular, use of randomized as-signment of subjects to different exposure groups. However, often one is con-fronted with non-randomized observational studies of the effects of possibletoxic substances or agents. A method for summarizing such studies is greatlyneeded, but when meta-analysis is applied to observational studies—either case-control or cohort—it becomes more problematic. The reason for this is thatoften methodological differences among studies are much more pronouncedthan they are in randomized trials. Hence, the justification for pooling the re-sults and deriving a single estimate of risk, for example, is not always apparent.A number of problems and issues arise in meta-analysis. Should only pub-lished papers be included in the meta-analysis, or should any available studies beused, even if they have not been peer reviewed? How can the problem of differ-ences in the quality of the studies reviewed be taken into account? Can theresults of the meta-analysis itself be reproduced by other analysts? When there
125.See In re Paoli R.R. Yard PCB Litig., 916 F.2d 829, 856 (3d Cir. 1990), cert. denied, 499 U.S.961 (1991); Hines v. Consolidated Rail Corp., 926 F.2d 262, 273 (3d Cir. 1991); Allen v. InternationalBus. Mach. Corp., No. 94-264-LON, 1997 U.S. Dist. LEXIS 8016, at *71–*74 (meta-analysis ofobservational studies is a controversial subject among epidemiologists). Thus, contrary to the suggestionby at least one court, multiple studies with small numbers of subjects may be pooled to reduce thepossibility that sampling error is biasing the outcome. See In re Joint E. & S. Dist. Asbestos Litig., 827 F.Supp. 1014, 1042 (S.D.N.Y. 1993) (“[N]o matter how many studies yield a positive but statisticallyinsignificant SMR for colorectal cancer, the results remain statistically insignificant. Just as adding aseries of zeros together yields yet another zero as the product, adding a series of positive but statisticallyinsignificant SMRs together does not produce a statistically significant pattern.”), rev’d, 52 F.3d 1124(2d Cir. 1995); see also supra note 76.126.Petitti, supra note 76.
380
Reference Guide on Epidemiology
are several meta-analyses of a given relationship, why do the results of differentmeta-analyses often disagree? Another consideration is that often the differencesamong the individual studies included in a meta-analysis and the reasons for thedifferences are important in themselves and need to be understood; however,they may be masked in a meta-analysis. A final problem with meta-analyses isthat they generate a single estimate of risk and may lead to a false sense ofsecurity regarding the certainty of the estimate. People often tend to have aninordinate belief in the validity of the findings when a single number is attachedto them, and many of the difficulties that may arise in conducting a meta-analy-sis, especially of observational studies like epidemiologic ones, may consequentlybe overlooked.127
VII.What Role Does Epidemiology Play in
Proving Specific Causation?
Epidemiology is concerned with the incidence of disease in populations anddoes not address the question of the cause of an individual’s disease.128 Thisquestion, sometimes referred to as specific causation, is beyond the domain ofthe science of epidemiology. Epidemiology has its limits at the point where an
127.Much has been written about meta-analysis recently, and some experts consider the problemsof meta-analysis to outweigh the benefits at the present time. For example, Bailar has written thefollowing:
[P]roblems have been so frequent and so deep, and overstatements of the strength of conclusions soextreme, that one might well conclude there is something seriously and fundamentally wrong with themethod. For the present ... I still prefer the thoughtful, old-fashioned review of the literature by aknowledgeable expert who explains and defends the judgments that are presented. We have not yetreached a stage where these judgments can be passed on, even in part, to a formalized process such asmeta-analysis.
John C. Bailar III, Assessing Assessments, 277 Science 528, 529 (1997) (reviewing Morton Hunt, HowScience Takes Stock (1997)); see also Point/Counterpoint: Meta-analysis of Observational Studies, 140 Am.J. Epidemiology 770 (1994).
128.See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 945 & n.6 (3d Cir. 1990) (“Epide-miological studies do not provide direct evidence that a particular plaintiff was injured by exposure to asubstance.”); Smith v. Ortho Pharm. Corp., 770 F. Supp. 1561, 1577 (N.D. Ga. 1991); Grassis v.Johns-Manville Corp., 591 A.2d 671, 675 (N.J. Super. Ct. App. Div. 1991); Michael Dore, A Commen-tary on the Use of Epidemiological Evidence in Demonstrating Cause-in-Fact, 7 Harv. Envtl. L. Rev. 429, 436(1983).
There are some diseases that do not occur without exposure to a given toxic agent. This is the sameas saying that the toxic agent is a necessary cause for the disease, and the disease is sometimes referred toas a signature disease (also, the agent is pathognomonic), because the existence of the disease necessarilyimplies the causal role of the agent. See Kenneth S. Abraham & Richard A. Merrill, Scientific Uncertaintyin the Courts, Issues Sci. & Tech., Winter 1986, at 93, 101. Asbestosis is a signature disease for asbestos,and adenocarcinoma (in young adult women) is a signature disease for in utero DES exposure. See In re“Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 834 (E.D.N.Y. 1984) (Agent Orange allegedlycaused a wide variety of diseases in Vietnam veterans and their offspring), aff’d, 818 F.2d 145 (2d Cir.1987), cert. denied, 484 U.S. 1004 (1988).
381
Reference Manual on Scientific Evidence
inference is made that the relationship between an agent and a disease is causal(general causation) and where the magnitude of excess risk attributed to theagent has been determined; that is, epidemiology addresses whether an agentcan cause a disease, not whether an agent did cause a specific plaintiff’s dis-ease.129
Nevertheless, the specific causation issue is a necessary legal element in atoxic substance case. The plaintiff must establish not only that the defendant’sagent is capable of causing disease but also that it did cause the plaintiff’s disease.Thus, a number of courts have confronted the legal question of what is accept-able proof of specific causation and the role that epidemiologic evidence plays inanswering that question.130 This question is not a question that is addressed byepidemiology.131 Rather, it is a legal question a number of courts have grappledwith. An explanation of how these courts have resolved this question follows.The remainder of this section should be understood as an explanation of judicialopinions, not as epidemiology.
Before proceeding, one last caveat is in order. This section assumes that epi-demiologic evidence has been used as proof of causation for a given plaintiff.The discussion does not address whether a plaintiff must use epidemiologic evi-dence to prove causation.132
Two legal issues arise with regard to the role of epidemiology in provingindividual causation: admissibility and sufficiency of evidence to meet the bur-den of production. The first issue tends to receive less attention by the courtsbut nevertheless deserves mention. An epidemiologic study that is sufficientlyrigorous to justify a conclusion that it is scientifically valid should be admis-sible,133 as it tends to make an issue in dispute more or less likely.134
129.Cf. “Agent Orange,” 597 F. Supp. at 780.
130.In many instances causation can be established without epidemiologic evidence. When themechanism of causation is well understood, the causal relationship is well established, or the timingbetween cause and effect is close, scientific evidence of causation may not be required. This is fre-quently the situation when the plaintiff suffers traumatic injury rather than disease. This section ad-dresses only those situations in which causation is not evident and scientific evidence is required.131.Nevertheless, an epidemiologist may be helpful to the fact finder in answering this question.Some courts have permitted epidemiologists (or those who use epidemiologic methods) to testify aboutspecific causation. See Ambrosini v. Labarraque, 101 F.3d 129, 137–41 (D.C. Cir. 1996), cert. dismissed,520 U.S. 1205 (1997); Zuchowicz v. United States, 870 F. Supp. 15 (D. Conn. 1994); Landrigan v.Celotex Corp., 605 A.2d 1079, 1088–89 (N.J. 1992). In general, courts seem more concerned with thebasis of an expert’s opinion than with whether the expert is an epidemiologist or clinical physician. SeePorter v. Whitehall, 9 F.3d 607, 614 (7th Cir. 1992) (“curb side” opinion from clinician not admis-sible); Wade-Greaux v. Whitehall Labs., 874 F. Supp. 1441, 1469–72 (D.V.I.) (clinician’s multiplebases for opinion inadequate to support causation opinion), aff’d, 46 F.3d 1120 (3d Cir. 1994); Landrigan,605 A.2d at 1083–89 (permitting both clinicians and epidemiologists to testify to specific causationprovided the methodology used is sound).
132.See Green, supra note 39, at 672–73; 2 Modern Scientific Evidence, supra note 2, § 28-1.3.2 to-1.3.3, at 306–11.
133.See DeLuca, 911 F.2d at 958; cf. Kehm v. Procter & Gamble Co., 580 F. Supp. 890, 902 (N.D.Iowa 1982) (“These [epidemiologic] studies were highly probative on the issue of causation—they all
382
Reference Guide on Epidemiology
Far more courts have confronted the role that epidemiology plays with re-gard to the sufficiency of the evidence and the burden of production. The civilburden of proof is described most often as requiring the fact finder to “believethat what is sought to be proved . . . is more likely true than not true.”135 Therelative risk from epidemiologic studies can be adapted to this 50% plus standardto yield a probability or likelihood that an agent caused an individual’s dis-ease.136 An important caveat is necessary, however. The discussion below speaksin terms of the magnitude of the relative risk or association found in a study.However, before an association or relative risk is used to make a statementabout the probability of individual causation, the inferential judgment, describedin section V, that the association is truly causal rather than spurious is required:“[A]n agent cannot be considered to cause the illness of a specific person unless
concluded that an association between tampon use and menstrually related TSS [toxic shock syndrome]cases exists.”), aff’d sub nom. Kehm v. Procter & Gamble Mfg. Co., 724 F.2d 613 (8th Cir. 1984).Hearsay concerns may limit the independent admissibility of the study (see supra note 3), but thestudy could be relied on by an expert in forming an opinion and may be admissible pursuant to Fed. R.Evid. 703 as part of the underlying facts or data relied on by the expert.
In Ellis v. International Playtex, Inc., 745 F.2d 292, 303 (4th Cir. 1984), the court concluded thatcertain epidemiologic studies were admissible despite criticism of the methodology used in the studies.The court held that the claims of bias went to the studies’ weight rather than their admissibility. Cf.Christophersen v. Allied-Signal Corp., 939 F.2d 1106, 1109 (5th Cir. 1991) (“As a general rule, ques-tions relating to the bases and sources of an expert’s opinion affect the weight to be assigned thatopinion rather than its admissibility .... ”), cert. denied, 503 U.S. 912 (1992).
134.Even if evidence is relevant, it may be excluded if its probative value is substantially out-weighed by prejudice, confusion, or inefficiency. Fed. R. Evid. 403. However, exclusion of an other-wise relevant epidemiologic study on Rule 403 grounds is unlikely.
In Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579, 591 (1993), the Court invoked theconcept of “fit,” which addresses the relationship of an expert’s scientific opinion to the facts of the caseand the issues in dispute. In a toxic substance case in which cause in fact is disputed, an epidemiologicstudy of the same agent to which the plaintiff was exposed that examined the association with the samedisease from which the plaintiff suffers would undoubtedly have sufficient “fit” to be a part of the basisof an expert’s opinion. The Court’s concept of “fit,” borrowed from United States v. Downing, 753 F.2d1224, 1242 (3d Cir. 1985), appears equivalent to the more familiar evidentiary concept of probativevalue, albeit one requiring assessment of the scientific reasoning the expert used in drawing inferencesfrom methodology or data to opinion.
135.2 Edward J. Devitt & Charles B. Blackmar, Federal Jury Practice and Instruction §71.13 (3ded. 1977); see also United States v. Fatico, 458 F. Supp. 388, 403 (E.D.N.Y. 1978) (“Quantified, thepreponderance standard would be 50%+ probable.”), aff’d, 603 F.2d 1053 (2d Cir. 1979), cert. denied,444 U.S. 1073 (1980).
136.An adherent of the frequentist school of statistics would resist this adaptation, which mayexplain why so many epidemiologists and toxicologists also resist it. To take the step identified in thetext of using an epidemiologic study outcome to determine the probability of specific causation requiresa shift from a frequentist approach, which involves sampling or frequency data from an empirical test, toa subjective probability about a discrete event. Thus, a frequentist might assert, after conducting asampling test, that 60% of the balls in an opaque container are blue. The same frequentist would resistthe statement, “The probability that a single ball removed from the box and hidden behind a screen isblue is 60%.” The ball is either blue or not, and no frequentist data would permit the latter statement.“[T]here is no logically rigorous definition of what a statement of probability means with reference toan individual instance ....” Lee Loevinger, On Logic and Sociology, 32 Jurimetrics J. 527, 530 (1992); see
383
Reference Manual on Scientific Evidence
it is recognized as a cause of that disease in general.”137 The following discussionshould be read with this caveat in mind.138
The threshold for concluding that an agent was more likely than not thecause of an individual’s disease is a relative risk greater than 2.0. Recall that arelative risk of 1.0 means that the agent has no effect on the incidence of disease.When the relative risk reaches 2.0, the agent is responsible for an equal numberof cases of disease as all other background causes. Thus, a relative risk of 2.0(with certain qualifications noted below) implies a 50% likelihood that an ex-posed individual’s disease was caused by the agent. A relative risk greater than2.0 would permit an inference that an individual plaintiff’s disease was morelikely than not caused by the implicated agent.139 A substantial number of courtsin a variety of toxic substances cases have accepted this reasoning.140
also Steve Gold, Note, Causation in Toxic Torts: Burdens of Proof, Standards of Persuasion and StatisticalEvidence, 96 Yale L.J. 376, 382–92 (1986). Subjective probabilities about discrete events are the productof adherents to Bayes Theorem. See Kaye, supra note 67, at 54–62; David H. Kaye & David A. Freed-man, Reference Guide on Statistics §IV.D, in this manual.137.Cole, supra note 53, at 10284.
138.We emphasize this caveat, both because it is not intuitive and because some courts have failedto appreciate the difference between an association and a causal relationship. See, e.g., Forsyth v. EliLilly & Co., Civ. No. 95-00185 ACK, 1998 U.S. Dist. LEXIS 541, at *26–*31 (D. Haw. Jan. 5, 1998).But see Berry v. CSX Transp., Inc., 709 So. 2d 552, 568 (Fla. Dist. Ct. App. 1998) (“From epidemio-logical studies demonstrating an association, an epidemiologist may or may not infer that a causal rela-tionship exists.”).
139.See Davies v. Datapoint Corp., No. 94-56-P-DMC, 1995 U.S. Dist. LEXIS 21739, at *32–*35 (D. Me. Oct. 31, 1995) (holding that epidemiologist could testify about specific causation, basingsuch testimony on the probabilities derived from epidemiologic evidence).
140.See DeLuca v. Merrell Dow Pharms., Inc., 911 F.2d 941, 958–59 (3d Cir. 1990) (Bendectinallegedly caused limb reduction birth defects); In re Joint E. & S. Dist. Asbestos Litig., 964 F.2d 92 (2dCir. 1992) (relative risk less than 2.0 may still be sufficient to prove causation); Daubert v. Merrell DowPharms., Inc., 43 F.3d 1311, 1320 (9th Cir.) (requiring that plaintiff demonstrate a relative risk of 2),cert. denied, 516 U.S. 869 (1995); Pick v. American Med. Sys., Inc., 958 F. Supp. 1151, 1160 (E.D. La.1997) (recognizing that a relative risk of 2 implies a 50% probability of specific causation, but recogniz-ing that a study with a lower relative risk is admissible, although ultimately it may be insufficient tosupport a verdict on causation); Sanderson v. International Flavors & Fragrances, Inc., 950 F. Supp.981, 1000 (C.D. Cal. 1996) (acknowledging a relative risk of 2 as a threshold for plaintiff to provespecific causation); Manko v. United States, 636 F. Supp. 1419, 1434 (W.D. Mo. 1986) (swine fluvaccine allegedly caused Guillain-Barré syndrome), aff’d in part, 830 F.2d 831 (8th Cir. 1987); Marderv. G.D. Searle & Co., 630 F. Supp. 1087, 1092 (D. Md. 1986) (pelvic inflammatory disease allegedlycaused by Copper 7 IUD), aff’d without op. sub nom. Wheelahan v. G.D. Searle & Co., 814 F.2d 655 (4thCir. 1987); In re “Agent Orange” Prod. Liab. Litig., 597 F. Supp. 740, 835–37 (E.D.N.Y. 1984) (AgentOrange allegedly caused a wide variety of diseases in Vietnam veterans and their offspring), aff’d, 818F.2d 145 (2d Cir. 1987), cert. denied, 484 U.S. 1004 (1988); Cook v. United States, 545 F. Supp. 306,308 (N.D. Cal. 1982) (swine flu vaccine allegedly caused Guillain-Barré syndrome); Landrigan v. CelotexCorp., 605 A.2d 1079, 1087 (N.J. 1992) (relative risk greater than 2.0 “support[s] an inference that theexposure was the probable cause of the disease in a specific member of the exposed population”);Merrell Dow Pharms., Inc. v. Havner, 953 S.W.2d 706, 718 (Tex. 1997) (“The use of scientificallyreliable epidemiological studies and the requirement of more than a doubling of the risk strikes abalance between the needs of our legal system and the limits of science.”). But cf. In re Fibreboard Corp.,893 F.2d 706, 711–12 (5th Cir. 1990) (The court disapproved a trial in which several representative
384
Reference Guide on Epidemiology
An alternative, yet similar, means to address probabilities in individual cases isuse of the attributable risk parameter.141 The attributable risk is a measurementof the excess risk that can be attributed to an agent, above and beyond thebackground risk that is due to other causes.142 When the attributable risk ex-ceeds 50% (equivalent to a relative risk greater than 2.0), this logically mightlead one to believe that the agent was more likely than not the cause of theplaintiff’s disease.
The discussion above contains a number of assumptions: that the study wasunbiased, sampling error and confounding were judged unlikely or minimal,the causal factors discussed in section V point toward causation, and the relativerisk found in the study is a reasonably accurate measure of the extent of diseasecaused by the agent. It also assumes that the plaintiff in a given case is compa-rable to the subjects who made up the exposed cohort in the epidemiologicstudy and that there are no interactions with other causal agents.143
Evidence in a given case may challenge one or more of those assumptions.Bias in a study may suggest that the outcome found is inaccurate and should beestimated to be higher or lower than the actual result. A plaintiff may have beenexposed to a dose of the agent in question that is greater or lower than that towhich those in the study were exposed.144 A plaintiff may have individual fac-tors, such as higher age than those in the study, that make it less likely that
cases would be tried and the results extrapolated to a class of some 3,000 asbestos victims, withoutconsideration of any evidence about the individual victims. The court remarked that under Texas law,general causation, which ignores any proof particularistic to the individual plaintiff, could not be substi-tuted for cause in fact.).141.See supra §III.C.
142.Because cohort epidemiologic studies compare the incidences (rates) of disease, measures likethe relative risk and attributable risk are dependent on the time period during which disease is measuredin the study groups. Exposure to the agent may either accelerate the onset of the disease in a subjectwho would have contracted the disease at some later time—all wrongful death cases entail accelerationof death—or be the cause of disease that otherwise would never have occurred in the subject. Thiscreates some uncertainty (when pathological information does not permit determining which of theforegoing alternatives is the case) and ambiguity about the proper calculation of the attributable risk,that is, whether both alternatives should be included in the excess risk or just the latter. See SanderGreenland & James M. Robins, Conceptual Problems in the Definition and Interpretation of AttributableFractions, 128 Am. J. Epidemiology 1185 (1988). If information were available, the legal issue withregard to acceleration would be the characterization of the harm and the appropriate amount of dam-ages when a defendant’s tortious conduct accelerates development of the disease. See Restatement(Second) of Torts §924 cmt. e (1977); Keeton et al., supra note 107, § 52, at 353–54; Robert J. Peaslee,Multiple Causation and Damages, 47 Harv. L. Rev. 1127 (1934).143.See Greenland & Robins, supra note 142, at 1193.
144.See supra §V.C; see also Ferebee v. Chevron Chem. Co., 736 F.2d 1529, 1536 (D.C. Cir.)(“The dose–response relationship at low levels of exposure for admittedly toxic chemicals like paraquatis one of the most sharply contested questions currently being debated in the medical community.”),cert. denied, 469 U.S. 1062 (1984); In re Joint E. & S. Dist. Asbestos Litig., 774 F. Supp. 113, 115(S.D.N.Y. 1991) (discussing different relative risks associated with different doses), rev’d on other grounds,964 F.2d 92 (2d Cir. 1992).
385
Reference Manual on Scientific Evidence
exposure to the agent caused the plaintiff’s disease. Similarly, an individual plaintiffmay be able to rule out other known (background) causes of the disease, such asgenetics, that increase the likelihood that the agent was responsible for thatplaintiff’s disease. Pathological-mechanism evidence may be available for theplaintiff that is relevant to the cause of the plaintiff’s disease.145 Before any causalrelative risk from an epidemiologic study can be used to estimate the probabilitythat the agent in question caused an individual plaintiff’s disease, considerationof these (and similar) factors is required.146
Having additional evidence that bears on individual causation has led a fewcourts to conclude that a plaintiff may satisfy his or her burden of productioneven if a relative risk less than 2.0 emerges from the epidemiologic evidence.147For example, genetics might be known to be responsible for 50% of the inci-dence of a disease independent of exposure to the agent.148 If genetics can beruled out in an individual’s case, then a relative risk greater than 1.5 might besufficient to support an inference that the agent was more likely than not re-sponsible for the plaintiff’s disease.149
145.See Tobin v. Astra Pharm. Prods., Inc., 993 F.2d 528 (6th Cir.) (plaintiff’s expert relied pre-dominantly on pathogenic evidence), cert. denied, 510 U.S. 914 (1993).
146.See Merrell Dow Pharms., Inc. v. Havner, 953 S.W.2d 706, 720 (Tex. 1997); Mary CarterAndrues, Note, Proof of Cancer Causation in Toxic Waste Litigation, 61 S. Cal. L. Rev. 2075, 2100–04(1988). An example of a judge sitting as fact finder and considering individual factors for a number ofplaintiffs in deciding cause in fact is contained in Allen v. United States, 588 F. Supp. 247, 429–43 (D.Utah 1984), rev’d on other grounds, 816 F.2d 1417 (10th Cir. 1987), cert. denied, 484 U.S. 1004 (1988); seealso Manko v. United States, 636 F. Supp. 1419, 1437 (W.D. Mo. 1986), aff’d, 830 F.2d 831 (8th Cir.1987).
147.See, e.g., Grassis v. Johns-Manville Corp., 591 A.2d 671, 675 (N.J. Super. Ct. App. Div.1991): “The physician or other qualified expert may view the epidemiological studies and factor outother known risk factors such as family history, diet, alcohol consumption, smoking ... or other factorswhich might enhance the remaining risks, even though the risk in the study fell short of the 2.0correlation.” See also In re Joint E. & S. Dist. Asbestos Litig., 52 F.3d 1124 (2d Cir. 1995) (holding thatplaintiff could provide sufficient evidence of causation without proving a relative risk greater than 2); Inre Joint E. & S. Dist. Asbestos Litig., 964 F.2d 92, 97 (2d Cir. 1992), rev’g 758 F. Supp. 199, 202–03(S.D.N.Y. 1991) (requiring relative risk in excess of 2.0 for plaintiff to meet burden of production);Jones v. Owens-Corning Fiberglas Corp., 672 A.2d 230 (N.J. Super. Ct. App. Div. 1996).
148.See In re Paoli R.R. Yard PCB Litig., 35 F.3d 717, 758–59 (3d Cir. 1994) (discussing thetechnique of differential diagnosis to rule out other known causes of a disease for a specific individual).149.The use of probabilities in excess of .50 to support a verdict results in an all-or-nothingapproach to damages that some commentators have criticized. The criticism reflects the fact that defen-dants responsible for toxic agents with a relative risk just above 2.0 may be required to pay damages notonly for the disease that their agents caused, but also for all instances of the disease. Similarly, thosedefendants whose agents increase the risk of disease by less than a doubling may not be required to paydamages for any of the disease that their agents caused. See, e.g., 2 American Law Inst., Reporter’s Studyon Enterprise Responsibility for Personal Injury: Approaches to Legal and Institutional Change 369–75(1991). To date, courts have not adopted a rule that would apportion damages based on the probabilityof cause in fact in toxic substances cases.
386
Reference Guide on Epidemiology
Glossary of Terms
The following terms and definitions were adapted from a variety of sources,including A Dictionary of Epidemiology (John M. Last et al. eds. 3d ed. 1995);1 Joseph L. Gastwirth, Statistical Reasoning in Law and Public Policy (1988);James K. Brewer, Everything You Always Wanted To Know About Statistics,But Didn’t Know How To Ask (1978); and R.A. Fisher, Statistical Methods forResearch Workers (1973).
adjustment. Methods of modifying an observed association to take into ac-count the effect of risk factors that are not the focus of the study and thatdistort the observed association between the exposure being studied and thedisease outcome. See also direct age adjustment, indirect age adjustment.agent. Also, risk factor. A factor, such as a drug, microorganism, chemicalsubstance, or form of radiation, whose presence or absence can result in theoccurrence of a disease. A disease may be caused by a single agent or a num-ber of independent alternative agents, or the combined presence of a com-plex of two or more factors may be necessary for the development of thedisease.
alpha. The level of statistical significance chosen by a researcher to determine ifany association found in a study is sufficiently unlikely to have occurred bychance (as a result of random sampling error) if the null hypothesis (no asso-ciation) is true. Researchers commonly adopt an alpha of .05, but the choiceis arbitrary and other values can be justified.
alpha error. Also called type I error and false positive error, alpha error occurswhen a researcher rejects a null hypothesis when it is actually true (i.e., whenthere is no association). This can occur when an apparent difference is ob-served between the control group and the exposed group, but the differenceis not real (i.e., it occurred by chance). A common error made by lawyers,judges, and academics is to equate the level of alpha with the legal burden ofproof.
association. The degree of statistical relationship between two or more eventsor variables. Events are said to be associated when they occur more or lessfrequently together than one would expect by chance. Association does notnecessarily imply a causal relationship. Events are said not to have an associa-tion when the agent (or independent variable) has no apparent effect on theincidence of a disease (the dependent variable). This corresponds to a relativerisk of 1.0. A negative association means that the events occur less frequentlytogether than one would expect by chance, thereby implying a preventive orprotective role for the agent (e.g., a vaccine).
attributable proportion of risk (PAR). This term has been used to denotethe fraction of risk that is attributable to exposure to a substance (e.g., X% of
387
Reference Manual on Scientific Evidence
lung cancer is attributable to cigarettes). Synonymous terms include attribut-able fraction, attributable risk, and etiologic fraction. See attributable risk.attributable risk. The proportion of disease in exposed individuals that can beattributed to exposure to an agent, as distinguished from the proportion ofdisease attributed to all other causes.
background risk of disease. Background risk of disease (or background rateof disease) is the rate of disease in a population that has no known exposuresto an alleged risk factor for the disease. For example, the background risk forall birth defects is 3%–5% of live births.
beta error. Also called type II error and false negative error, beta error occurswhen a researcher fails to reject a null hypothesis when it is incorrect (i.e.,when there is an association). This can occur when no statistically significantdifference is detected between the control group and the exposed group, buta difference does exist.
bias. Any effect at any stage of investigation or inference tending to produceresults that depart systematically from the true values. In epidemiology, theterm bias does not necessarily carry an imputation of prejudice or other sub-jective factor, such as the experimenter’s desire for a particular outcome. Thisdiffers from conventional usage, in which bias refers to a partisan point ofview.
biological marker. A physiological change in tissue or body fluids that occursas a result of an exposure to an agent and that can be detected in the labora-tory. Biological markers are only available for a small number of chemicals.biological plausibility. Consideration of existing knowledge about humanbiology and disease pathology to provide a judgment about the plausibilitythat an agent causes a disease.
case-comparison study. See case-control study.
case-control study. Also, case-comparison study, case history study, case ref-erent study, retrospective study. A study that starts with the identification ofpersons with a disease (or other outcome variable) and a suitable control(comparison, reference) group of persons without the disease. Such a study isoften referred to as retrospective because it starts after the onset of disease andlooks back to the postulated causal factors.
case group. A group of individuals who have been exposed to the disease,intervention, procedure, or other variable whose influence is being studied.causation. Causation, as we use the term, denotes an event, condition, charac-teristic, or agent’s being a necessary element of a set of other events that canproduce an outcome, such as a disease. Other sets of events may also causethe disease. For example, smoking is a necessary element of a set of events
388
Reference Guide on Epidemiology
that result in lung cancer, yet there are other sets of events (without smoking)that cause lung cancer. Thus, a cause may be thought of as a necessary link inat least one causal chain that results in an outcome of interest. Epidemiolo-gists generally speak of causation in a group context; hence, they will inquirewhether an increased incidence of a disease in a cohort was “caused” byexposure to an agent.
clinical trial. An experimental study that is performed to assess the efficacy andsafety of a drug or other beneficial treatment. Unlike observational studies,clinical trials can be conducted as experiments and use randomization, be-cause the agent being studied is thought to be beneficial.
cohort. Any designated group of persons followed or traced over a period oftime to examine health or mortality experience.
cohort study. The method of epidemiologic study in which groups of indi-viduals can be identified who are, have been, or in the future may be differ-entially exposed to an agent or agents hypothesized to influence the probabil-ity of occurrence of a disease or other outcome. The groups are observed tofind out if the exposed group is more likely to develop disease. The alterna-tive terms for a cohort study (concurrent study, follow-up study, incidencestudy, longitudinal study, prospective study) describe an essential feature ofthe method, which is observation of the population for a sufficient number ofperson-years to generate reliable incidence or mortality rates in the popula-tion subsets. This generally implies study of a large population, study for aprolonged period (years), or both.
confidence interval. A range of values calculated from the results of a studywithin which the true value is likely to fall; the width of the interval reflectsrandom error. Thus, if a confidence level of .95 is selected for a study, 95% ofsimilar studies would result in the true relative risk falling within the confi-dence interval. The width of the confidence interval provides an indicationof the precision of the point estimate or relative risk found in the study; thenarrower the confidence interval, the greater the confidence in the relativerisk estimate found in the study. Where the confidence interval contains arelative risk of 1.0, the results of the study are not statistically significant.confounding factor. Also, confounder. A factor that is both a risk factor forthe disease and a factor associated with the exposure of interest. Confoundingrefers to a situation in which the effects of two processes are not separated.The distortion can lead to an erroneous result.
control group. A comparison group comprising individuals who have notbeen exposed to the disease, intervention, procedure, or other variable whoseinfluence is being studied.
389
Reference Manual on Scientific Evidence
cross-sectional study. A study that examines the relationship between diseaseand variables of interest as they exist in a population at a given time. A cross-sectional study measures the presence or absence of disease and other vari-ables in each member of the study population. The data are analyzed to de-termine if there is a relationship between the existence of the variables anddisease. Because cross-sectional studies examine only a particular moment intime, they reflect the prevalence (existence) rather than the incidence (rate)of disease and can offer only a limited view of the causal association betweenthe variables and disease. Because exposures to toxic agents often changeover time, cross-sectional studies are rarely used to assess the toxicity of exog-enous agents.
data dredging. Jargon that refers to results identified by researchers who, aftercompleting a study, pore through their data seeking to find any associationsthat may exist. In general, good research practice is to identify the hypothesesto be investigated in advance of the study; hence, data dredging is generallyfrowned on. In some cases, however, researchers conduct exploratory studiesdesigned to generate hypotheses for further study.demographic study. See ecological study.
dependent variable. The outcome that is being assessed in a study based onthe effect of another characteristic—the independent variable. Epidemiologicstudies attempt to determine whether there is an association between theindependent variable (exposure) and the dependent variable (incidence ofdisease).
differential misclassification. A form of bias that is due to the misclassificationof individuals or a variable of interest when the misclassification varies amongstudy groups. This type of bias occurs when, for example, individuals in astudy are incorrectly determined to be unexposed to the agent being studiedwhen in fact they are exposed. See nondifferential misclassification.
direct adjustment. A technique used to eliminate any difference between twostudy populations based on age, sex, or some other parameter that mightresult in confounding. Direct adjustment entails comparison of the study groupwith a large reference population to determine the expected rates based onthe characteristic, such as age, for which adjustment is being performed.dose. Dose generally refers to the intensity or magnitude of exposure to anagent multiplied by the duration of exposure. Dose may be used to refer onlyto the intensity of exposure.
dose–response relationship. A relationship in which a change in amount,intensity, or duration of exposure to an agent is associated with a change—either an increase or a decrease—in risk of disease.
390
Reference Guide on Epidemiology
double-blinding. A characteristic used in experimental studies in which nei-ther the individuals being studied nor the researchers know during the studywhether any individual has been assigned to the exposed or control group.Double-blinding is designed to prevent knowledge of the group to whichthe individual was assigned from biasing the outcome of the study.
ecological fallacy. An error that occurs when a correlation between an agentand disease in a group (ecological) is not reproduced when individuals arestudied. For example, at the ecological (group) level, a correlation has beenfound in several studies between the quality of drinking water and mortalityrates from heart disease; it would be an ecological fallacy to infer from thisalone that exposure to water of a particular level of hardness necessarilyinfluences the individual’s chances of contracting or dying of heart disease.ecological study. Also, demographic study. A study of the occurrence of dis-ease based on data from populations, rather than from individuals. An eco-logical study searches for associations between the incidence of disease andsuspected disease-causing agents in the studied populations. Researchers of-ten conduct ecological studies by examining easily available health statistics,making these studies relatively inexpensive in comparison with studies thatmeasure disease and exposure to agents on an individual basis.
epidemiology. The study of the distribution and determinants of disease orother health-related states and events in populations and the application ofthis study to control of health problems.
error. Random error (sampling error) is the error that is due to chance whenthe result obtained for a sample differs from the result that would be obtainedif the entire population (universe) were studied.
etiologic factor. An agent that plays a role in causing a disease.etiology. The cause of disease or other outcome of interest.
experimental study. A study in which the researcher directly controls theconditions. Experimental epidemiology studies (also clinical studies) entailrandom assignment of participants to the exposed and control groups (orsome other method of assignment designed to minimize differences betweenthe groups).
exposed, exposure. In epidemiology, the exposed group (or the exposed) isused to describe a group whose members have been exposed to an agent thatmay be a cause of a disease or health effect of interest, or possess a character-istic that is a determinant of a health outcome.false negative error. See beta error.false positive error. See alpha error.follow-up study. See cohort study.
391
Reference Manual on Scientific Evidence
general causation. General causation is concerned with whether an agentincreases the incidence of disease in a group and not whether the agent causedany given individual’s disease. Because of individual variation, a toxic agentgenerally will not cause disease in every exposed individual.
generalizable. A study is generalizable when the results are applicable to popu-lations other than the study population, such as the general population.in vitro. Within an artificial environment, such as a test tube (e.g., the cultiva-tion of tissue in vitro).
in vivo. Within a living organism (e.g., the cultivation of tissue in vivo).
incidence rate. The number of people in a specified population falling ill froma particular disease during a given period. More generally, the number ofnew events (e.g., new cases of a disease in a defined population) within aspecified period of time.
incidence study. See cohort study.
independent variable. A characteristic that is measured in a study and that issuspected to have an effect on the outcome of interest (the dependent vari-able). Thus, exposure to an agent is measured in a cohort study to determinewhether that independent variable has an effect on the incidence of disease,which is the dependent variable.
indirect adjustment. A technique employed to minimize error that mightresult when comparing two populations because of differences in age, sex, oranother parameter that may affect the rate of disease in the populations. Therate of disease in a large reference population, such as all residents of a coun-try, is calculated and adjusted for any differences in age between the referencepopulation and the study population. This adjusted rate is compared with therate of disease in the study population and provides a standardized mortality(or morbidity) ratio, which is often referred to as SMR.
inference. The intellectual process of making generalizations from observa-tions. In statistics, the development of generalizations from sample data, usu-ally with calculated degrees of uncertainty.
information bias. Also, observational bias. Systematic error in measuring datathat results in differential accuracy of information (such as exposure status) forcomparison groups.
interaction. Risk factors interact, or there is interaction among risk factors,when the magnitude or direction (positive or negative) of the effect of onerisk factor differs depending on the presence or level of the other. In interac-tion, the effect of two risk factors together is different (greater or less) thantheir individual effects.
392
Reference Guide on Epidemiology
meta-analysis. A technique used to combine the results of several studies toenhance the precision of the estimate of the effect size and reduce the plausi-bility that the association found is due to random sampling error. Meta-analysisis best suited to pooling results from randomly controlled experimental stud-ies, but if carefully performed, it also may be useful for observational studies.misclassification bias. The erroneous classification of an individual in a studyas exposed to the agent when the individual was not, or incorrectly classify-ing a study individual with regard to disease. Misclassification bias may existin all study groups (nondifferential misclassification) or may vary among groups(differential misclassification).
morbidity rate. Morbidity is the state of illness or disease. Morbidity rate mayrefer to the incidence rate or prevalence rate of disease.
mortality rate. Mortality refers to death. The mortality rate expresses the pro-portion of a population that dies of a disease or of all causes. The numeratoris the number of individuals dying; the denominator is the total population inwhich the deaths occurred. The unit of time is usually a calendar year.
model. A representation or simulation of an actual situation. This may be ei-ther (1) a mathematical representation of characteristics of a situation that canbe manipulated to examine consequences of various actions; (2) a representa-tion of a country’s situation through an “average region” with characteristicsresembling those of the whole country; or (3) the use of animals as a substi-tute for humans in an experimental system to ascertain an outcome of inter-est.
multivariate analysis. A set of techniques used when the variation in severalvariables has to be studied simultaneously. In statistics, any analytic methodthat allows the simultaneous study of two or more independent factors orvariables.
nondifferential misclassification. A form of bias that is due to misclassificationof individuals or a variable of interest into the wrong category when themisclassification varies among study groups. This bias may result from limita-tions in data collection and will often produce an underestimate of the trueassociation. See differential misclassification.
null hypothesis. A hypothesis that states that there is no true association be-tween a variable and an outcome. At the outset of any observational or ex-perimental study, the researcher must state a proposition that will be tested inthe study. In epidemiology, this proposition typically addresses the existenceof an association between an agent and a disease. Most often, the null hy-pothesis is a statement that exposure to Agent A does not increase the occur-rence of Disease D. The results of the study may justify a conclusion that thenull hypothesis (no association) has been disproved (e.g., a study that finds a
393
Reference Manual on Scientific Evidence
strong association between smoking and lung cancer). A study may fail todisprove the null hypothesis, but that alone does not justify a conclusion thatthe null hypothesis has been proved.
observational study. An epidemiologic study in situations in which nature isallowed to take its course, without intervention from the investigator. Forexample, in an observational study the subjects of the study are permitted todetermine their level of exposure to an agent.
odds ratio (OR). Also, cross-product ratio, relative odds. The ratio of theodds that a case (one with the disease) was exposed to the odds that a control(one without the disease) was exposed. For most purposes the odds ratiofrom a case-control study is quite similar to a risk ratio from a cohort study.pathognomonic. An agent is pathognomonic when it must be present for adisease to occur. Thus, asbestos is a pathognomonic agent for asbestosis. Seesignature disease.
placebo controlled. In an experimental study, providing an inert substance tothe control group, so as to keep the control and exposed groups ignorant oftheir status.
p(probability), p-value. The p-value is the probability of getting a value ofthe test outcome equal to or more extreme than the result observed, giventhat the null hypothesis is true. The letter p, followed by the abbreviation“n.s.” (not significant) means that p > .05 and that the association was notstatistically significant at the .05 level of significance. The statement “p < .05”means that p is less than 5%, and, by convention, the result is deemed statis-tically significant. Other significance levels can be adopted, such as .01 or .1.The lower the p-value, the less likely that random error would have pro-duced the observed relative risk if the true relative risk is 1.
power. The probability that a difference of a specified amount will be detectedby the statistical hypothesis test, given that a difference exists. In less formalterms, power is like the strength of a magnifying lens in its capability toidentify an association that truly exists. Power is equivalent to one minus typeII error. This is sometimes stated as Power = 1 - β.
prevalence. The percentage of persons with a disease in a population at aspecific point in time.
prospective study. In a prospective study, two groups of individuals areidentified: (1) individuals who have been exposed to a risk factor and (2)individuals who have not been exposed. Both groups are followed for aspecified length of time, and the proportion that develops disease in the firstgroup is compared with the proportion that develops disease in the secondgroup. See cohort study.
394
Reference Guide on Epidemiology
random. The term implies that an event is governed by chance. See random-ization.
randomization. Assignment of individuals to groups (e.g., for experimentaland control regimens) by chance. Within the limits of chance variation, ran-domization should make the control group and experimental group similar atthe start of an investigation and ensure that personal judgment and prejudicesof the investigator do not influence assignment. Randomization should notbe confused with haphazard assignment. Random assignment follows a pre-determined plan that usually is devised with the aid of a table of randomnumbers. Randomization cannot ethically be used where the exposure isknown to cause harm (e.g., cigarette smoking).randomized trial. See clinical trial.
recall bias. Systematic error resulting from differences between two groups ina study in accuracy of memory. For example, subjects who have a diseasemay recall exposure to an agent more frequently than subjects who do nothave the disease.
relative risk (RR). The ratio of the risk of disease or death among peopleexposed to an agent to the risk among the unexposed. For instance, if 10% ofall people exposed to a chemical develop a disease, compared with 5% ofpeople who are not exposed, the disease occurs twice as frequently amongthe exposed people. The relative risk is 10%/5% = 2. A relative risk of 1indicates no association between exposure and disease.
research design. The procedures and methods, predetermined by an investi-gator, to be adhered to in conducting a research project.
risk. A probability that an event will occur (e.g., that an individual will becomeill or die within a stated period of time or by a certain age).
sample. A selected subset of a population. A sample may be random or nonran-dom.
sample size. The number of subjects who participate in a study.
secular-trend study. Also, time-line study. A study that examines changesover a period of time, generally years or decades. Examples include the de-cline of tuberculosis mortality and the rise, followed by a decline, in coronaryheart disease mortality in the United States in the past fifty years.
selection bias. Systematic error that results from individuals being selected forthe different groups in an observational study who have differences otherthan the ones that are being examined in the study.
sensitivity, specificity. Sensitivity measures the accuracy of a diagnostic orscreening test or device in identifying disease (or some other outcome) when
395
Reference Manual on Scientific Evidence
it truly exists. For example, assume that we know that 20 women in a groupof 1,000 women have cervical cancer. If the entire group of 1,000 women istested for cervical cancer and the screening test only identifies 15 (of theknown 20) cases of cervical cancer, the screening test has a sensitivity of 15/20, or 75%. Specificity measures the accuracy of a diagnostic or screening testin identifying those who are disease free. Once again, assume that 980 womenout of a group of 1,000 women do not have cervical cancer. If the entiregroup of 1,000 women is screened for cervical cancer and the screening testonly identifies 900 women as without cervical cancer, the screening test has aspecificity of 900/980, or 92%.
signature disease. A disease that is associated uniquely with exposure to anagent (e.g., asbestosis and exposure to asbestos). See also pathognomonic.significance level. A somewhat arbitrary level selected to minimize the riskthat an erroneous positive study outcome that is due to random error will beaccepted as a true association. The lower the significance level selected, theless likely that false positive error will occur.
specific causation. Whether exposure to an agent was responsible for a givenindividual’s disease.
standardized morbidity ratio (SMR). The ratio of the incidence of diseaseobserved in the study population to the incidence of disease that would beexpected if the study population had the same incidence of disease as someselected standard or known population.
standardized mortality ratio (SMR). The ratio of the incidence of deathobserved in the study population to the incidence of death that would beexpected if the study population had the same incidence of death as someselected standard or known population.
statistical significance. A term used to describe a study result or differencethat exceeds the type I error rate (or p-value) that was selected by the re-searcher at the outset of the study. In formal significance testing, a statisticallysignificant result is unlikely to be the result of random sampling error andjustifies rejection of the null hypothesis. Some epidemiologists believe thatformal significance testing is inferior to using a confidence interval to expressthe results of a study. Statistical significance, which addresses the role of ran-dom sampling error in producing the results found in the study, should notbe confused with the importance (for public health or public policy) of aresearch finding.
stratification. The process of or result of separating a sample into severalsubsamples according to specified criteria, such as age or socioeconomic sta-tus. Researchers may control the effect of confounding variables by stratify-
396
Reference Guide on Epidemiology
ing the analysis of results. For example, lung cancer is known to be associatedwith smoking. To examine the possible association between urban atmo-spheric pollution and lung cancer, the researcher may divide the populationinto strata according to smoking status, thus controlling for smoking. Theassociation between air pollution and cancer then can be appraised separatelywithin each stratum.
study design. See research design.systematic error. See bias.
teratogen. An agent that produces abnormalities in the embryo or fetus bydisturbing maternal health or by acting directly on the fetus in utero.
teratogenicity. The capacity for an agent to produce abnormalities in the em-bryo or fetus.
threshold phenomenon. A certain level of exposure to an agent below whichdisease does not occur and above which disease does occur.time-line study. See secular-trend study.
toxicology. The science of the nature and effects of poisons. Toxicologistsstudy adverse health effects of agents on biological organisms.toxic substance. A substance that is poisonous.
true association. Also, real association. The association that really exists be-tween exposure to an agent and a disease and that might be found by a perfect(but nonetheless nonexistent) study.
Type I error. Rejecting the null hypothesis when it is true. See alpha error.Type II error. Failing to reject the null hypothesis when it is false. See betaerror.
validity. The degree to which a measurement measures what it purports tomeasure; the accuracy of a measurement.
variable. Any attribute, condition, or other item in a study that can have differ-ent numerical characteristics. In a study of the causes of heart disease, bloodpressure and dietary fat intake are variables that might be measured.
397
Reference Manual on Scientific Evidence
References on Epidemiology
Causal Inferences (Kenneth J. Rothman ed., 1988).William G. Cochran, Sampling Techniques (1977).
A Dictionary of Epidemiology (John M. Last et al. eds., 3d ed. 1995).
Anders Ahlbom & Steffan Norell, Introduction to Modern Epidemiology (2ded. 1990).
Joseph L. Fleiss, Statistical Methods for Rates and Proportions (1981).Leon Gordis, Epidemiology (2d ed. 2000).
Morton Hunt, How Science Takes Stock: The Story of Meta-Analysis (1997).Harold A. Kahn, An Introduction to Epidemiologic Methods (1983).
Harold A. Kahn & Christopher T. Sempos, Statistical Methods in Epidemiol-ogy (1989).
David E. Lilienfeld, Overview of Epidemiology, 3 Shepard’s Expert & Sci. Evid.Q. 25 (1995).
David E. Lilienfeld & Paul D. Stolley, Foundations of Epidemiology (3d ed.1994).
Judith S. Mausner & Anita K. Bahn, Epidemiology: An Introductory Text (1974).Marcello Pagano & Kimberlee Gauvreau, Principles of Biostatistics (1993).Richard K. Riegelman & Robert A. Hirsch, Studying a Study and Testing aTest: How to Read the Health Science Literature (3d ed. 1996).Bernard Rosner, Fundamentals of Biostatistics (4th ed. 1995).
Kenneth J. Rothman & Sander Greenland, Modern Epidemiology (2d ed. 1998).James J. Schlesselman, Case-Control Studies: Design, Conduct, Analysis (1982).Mervyn Susser, Epidemiology, Health and Society: Selected Papers (1987).
References on Law and Epidemiology
2 American Law Institute, Reporters’ Study on Enterprise Responsibility forPersonal Injury (1991).
Bert Black & David H. Hollander, Jr., Unraveling Causation: Back to the Basics, 3U. Balt. J. Envtl. L. 1 (1993).
Bert Black & David Lilienfeld, Epidemiologic Proof in Toxic Tort Litigation, 52Fordham L. Rev. 732 (1984).
Gerald Boston, A Mass-Exposure Model of Toxic Causation: The Content of ScientificProof and the Regulatory Experience, 18 Colum. J. Envtl. L. 181 (1993).
398
Reference Guide on Epidemiology
Vincent M. Brannigan et al., Risk, Statistical Inference, and the Law of Evidence:The Use of Epidemiological Data in Toxic Tort Cases, 12 Risk Analysis 343 (1992).Troyen Brennan, Causal Chains and Statistical Links: The Role of Scientific Uncer-tainty in Hazardous-Substance Litigation, 73 Cornell L. Rev. 469 (1988).
Troyen Brennan, Helping Courts with Toxic Torts: Some Proposals Regarding Alter-native Methods for Presenting and Assessing Scientific Evidence in Common LawCourts, 51 U. Pitt. L. Rev. 1 (1989).
Philip Cole, Causality in Epidemiology, Health Policy, and Law, [1997] 27 Envtl.L. Rep. (Envtl. L. Inst.) 10279 (June1997).
Comment, Epidemiologic Proof of Probability: Implementing the Proportional RecoveryApproach in Toxic Exposure Torts, 89 Dick. L. Rev. 233 (1984).
George W. Conk, Against the Odds: Proving Causation of Disease with Epidemio-logical Evidence, 3 Shepard’s Expert & Sci. Evid. Q. 85 (1995).
Carl F. Cranor et al., Judicial Boundary Drawing and the Need for Context-SensitiveScience in Toxic Torts After Daubert v. Merrell Dow Pharmaceuticals, Inc., 16Va. Envtl. L.J. 1 (1996).
Richard Delgado, Beyond Sindell: Relaxation of Cause-in-Fact Rules for Indetermi-nate Plaintiffs, 70 Cal. L. Rev. 881 (1982).
Michael Dore, A Commentary on the Use of Epidemiological Evidence in Demonstrat-ing Cause-in-Fact, 7 Harv. Envtl. L. Rev. 429 (1983).
Jean Macchiaroli Eggen, Toxic Torts, Causation, and Scientific Evidence After Daubert,55 U. Pitt. L. Rev. 889 (1994).
Daniel A. Farber, Toxic Causation, 71 Minn. L. Rev. 1219 (1987).
Heidi Li Feldman, Science and Uncertainty in Mass Exposure Litigation, 74 Tex. L.Rev. 1 (1995).
Stephen E. Fienberg et al., Understanding and Evaluating Statistical Evidence inLitigation, 36 Jurimetrics J. 1 (1995).
Joseph L. Gastwirth, Statistical Reasoning in Law and Public Policy (1988).Herman J. Gibb, Epidemiology and Cancer Risk Assessment, in Fundamentals ofRisk Analysis and Risk Management 23 (Vlasta Molak ed., 1997).
Steve Gold, Note, Causation in Toxic Torts: Burdens of Proof, Standards of Persua-sion and Statistical Evidence, 96 Yale L.J. 376 (1986).
Leon Gordis, Epidemiologic Approaches for Studying Human Disease in Relation toHazardous Waste Disposal Sites, 25 Hous. L. Rev. 837 (1988).
Michael D. Green, Expert Witnesses and Sufficiency of Evidence in Toxic SubstancesLitigation: The Legacy of Agent Orange and Bendectin Litigation, 86 Nw. U. L.Rev. 643 (1992).
399
Reference Manual on Scientific Evidence
Khristine L. Hall & Ellen Silbergeld, Reappraising Epidemiology: A Response toMr. Dore, 7 Harv. Envtl. L. Rev. 441 (1983).
Jay P. Kesan, Drug Development: Who Knows Where the Time Goes?: A CriticalExamination of the Post-Daubert Scientific Evidence Landscape, 52 Food DrugCosm. L.J. 225 (1997).
Constantine Kokkoris, Comment, DeLuca v. Merrell Dow Pharmaceuticals,Inc.: Statistical Significance and the Novel Scientific Technique, 58 Brook. L. Rev.219 (1992).
James P. Leape, Quantitative Risk Assessment in Regulation of Environmental Car-cinogens, 4 Harv. Envtl. L. Rev. 86 (1980).
David E. Lilienfeld, Overview of Epidemiology, 3 Shepard’s Expert & Sci. Evid. Q.23 (1995).
Junius McElveen, Jr., & Pamela Eddy, Cancer and Toxic Substances: The Problemof Causation and the Use of Epidemiology, 33 Clev. St. L. Rev. 29 (1984).2 Modern Scientific Evidence: The Law and Science of Expert Testimony (DavidL. Faigman et al. eds.,1997).
Note, The Inapplicability of Traditional Tort Analysis to Environmental Risks: TheExample of Toxic Waste Pollution Victim Compensation, 35 Stan. L. Rev. 575(1983).
Susan R. Poulter, Science and Toxic Torts: Is There a Rational Solution to the Prob-lem of Causation?, 7 High Tech. L.J. 189 (1992).
Jon Todd Powell, Comment, How to Tell the Truth with Statistics: A New Statis-tical Approach to Analyzing the Data in the Aftermath of Daubert v. Merrell DowPharmaceuticals, 31 Hous. L. Rev. 1241 (1994).
David Rosenberg, The Causal Connection in Mass Exposure Cases: A Public LawVision of the Tort System, 97 Harv. L. Rev. 849 (1984).
Joseph Sanders, The Bendectin Litigation: A Case Study in the Life-Cycle of MassTorts, 43 Hastings L.J. 301 (1992).
Joseph Sanders, Scientific Validity, Admissibility, and Mass Torts After Daubert, 78Minn. L. Rev. 1387 (1994).
Richard W. Wright, Causation in Tort Law, 73 Cal. L. Rev. 1735 (1985).
Development in the Law—Confronting the New Challenges of Scientific Evidence, 108Harv. L. Rev. 1481 (1995).
400
因篇幅问题不能全部显示,请点此查看更多更全内容