U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Kufe DW, Pollock RE, Weichselbaum RR, et al., editors. Holland-Frei Cancer Medicine. 6th edition. Hamilton (ON): BC Decker; 2003.

  • By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.
Cover of Holland-Frei Cancer Medicine

Holland-Frei Cancer Medicine. 6th edition.

Show details

Methodologic Issues in the Evaluation of Early Detection Programs

, PhD, , PhD, and , MD.

The issue of whether a screening intervention is effective may appear on its face to be a simple matter. Theoretically, one only need observe whether persons live longer or have a lesser risk of dying from the disease in question as a result of application of a screening test. Case reports or anecdotal evidence of good outcomes following cancer detection in asymptomatic persons should not, however, be trusted as evidence of screening effectiveness. Evaluations of screening tests done outside the context of a rigorous research design are subject to many biases that may (and usually do) invalidate the conclusions being drawn. Included among these complicating factors are lead time and length biases, subject self-selection, and overdiagnosis.

Known Biases in the Evaluation of Screening Programs

Lead-Time Bias

As described earlier, the interval between the moment an occult condition can be detected and the moment that condition would have become known by patient awareness of signs or symptoms is known as the lead time. Unless lead time is accounted for, comparisons of survival rates in screened and unscreened populations will be misleading. There always is a bias toward better survival rates in the screened group because the length of the lead time moves the point at which survival begins to be measured forward. Thus, it is possible that earlier detection only moves forward the time of a patient's diagnosis, without moving back the time of death. If lead time bias is present, screen-detected cancers appear to have better survival, but in fact death occurs at the same point it would have without screening.

Length Bias

Because of variations in tumor growth rates and other biologic characteristics, more of the cancers with long preclinical phases will be detected when a population is screened. A tumor with a longer preclinical phase may also be a more indolent and less threatening lesion. This bias toward detection of less threatening cancers is length bias. This form of potential bias complicates the interpretation of outcome differences between cancers detected by screening and those found outside the screening program because the cancers most likely to escape detection may be the very cancers that have the greatest likelihood of causing death.

Overdiagnosis

The purpose of screening is to find cancers at an early stage. It is possible, however, to detect some tumors at so early a stage that the biologic propensity to progress and cause death is uncertain. Overdiagnosis is an extreme example of length bias. Because screening is more likely than symptom recognition to yield lesions that might never become clinically significant cancers, survival statistics for screening detected cancers may be inflated. Overdiagnosis may be suspected if an imbalance in a cohort persists after an extended period of follow-up between the incidence rate in a screening program and the expected incidence rate in the absence of screening.

Patient Self-Selection

Individuals who elect to receive early detection tests may be different from those who do not in ways that could affect their survival or recovery from disease. For example, users of early-detection services may be more health conscious, more likely to control risk factors such as smoking or diet, more alert to the signs and symptoms of disease, more adherent to treatment, or generally healthier.

Research Designs for Screening Evaluation

Researchers use several different approaches to study cancer-screening effectiveness, including descriptive studies, case-control studies, and randomized controlled trials. Each of these strategies has certain strengths and weaknesses. Some methods are more powerful than others, but no single approach can provide all the answers needed for the evaluation of screening efficacy. Assessing the effectiveness of a screening intervention almost always requires combining evidence from multiple sources based on different research methodologies.

Descriptive Studies

Uncontrolled studies based on the experience of individual physicians, hospitals, and non-population-based registries can yield important information about screening. Indeed, the first evidence that screening may contribute to disease control often is reported from descriptive studies, as is evidence about the performance parameters of detection tests, such as sensitivity, specificity, and positive predictive values. Descriptive studies, however, do not establish efficacy, because of the absence of an appropriate control group and the influence of the potential biases described previously.

Case-Control Studies

Retrospective case-control studies can provide additional evidence on screening effectiveness. The advantage of this approach is that it is a low-cost strategy that may provide evidence more quickly than prospective studies when the screening procedure is already in clinical use.27 Although mortality reduction can be an end point measured in these studies, case-control studies are subject to bias and confounding from uncontrolled factors.

Randomized Clinical Trials

The most rigorous assessment of screening is by randomized clinical trials that measure cancer-specific mortality reduction as the primary end point. In a randomized clinical trial (RCT), the distorting effects of self-selection are bypassed through random assignment to either an experimental group invited to receive screening or an uninvited group. The mortality end point is not subject to the effects of lead time or length bias, or overdiagnosis. A randomized clinical trial of screening evaluates the effect of an invitation to screening rather than screening per se, that is, end results are based on comparisons between invited and uninvited groups rather than screened and unscreened groups. The distinction is important since noncompliance to the invitation to screening in the experimental group, and contamination in the control group (ie, participation in screening), has an effect on the magnitude of the observed outcome. Although randomized clinical trials are the most desirable study design from a methodological perspective, the large sample sizes required, their expense, and their long duration have tended to limit the number of randomized controlled trials that have been conducted.

All-cause mortality rather than diseasespecific mortality has been proposed as a preferable endpoint in RCTs on the basis of possible biases in assignment of cause of death in experimental and control groups in a trial.28 These potential biases can take two forms. Sticky-diagnosis bias occurs when deaths from other causes in the experimental group are incorrectly attributed to the disease that is the target of screening, thus falsely elevating disease-specific mortality. Slippery-linkage bias, on the other hand, occurs when deaths from the disease, treatment of the disease, or the screening process itself are incorrectly attributed to other causes, thus falsely lowering disease-specific mortality. Cause of death committees need to, and usually do, build safeguards into the death ascertainment process so that these biases are avoided, or, at the least, minimized. These safeguards include blind review and a consensus process that referees disputes between multiple reviewers. Thus, although some level of misallocation may occur, there is little evidence that the rate of error approaches a level that would measurably bias end results.29 Several other points are worth making about the use of all-cause mortality as an endpoint. First, disease-specific interventions cannot reasonably be expected to influence other causes of death, that is, screening for breast cancer should not be expected to contribute to reduced mortality from cardiovascular disease, stroke, hip fractures, or diabetes. Second, on the assumption that a comparison of all-cause mortality in the experimental and control group is worthwhile (perhaps treatment-related mortality due to another cause is elevated), the only sensible comparison is between the disease-specific cases in the experimental group and the control group. Third, a cancer screening RCT with an all-cause mortality endpoint would require study sizes in excess of one million individuals in each arm, which clearly would be not only inefficient, but also prohibitive. Finally, the argument that the ultimate value of an intervention should be measured, in practical terms, on the basis of its ability to reduce the risk of dying altogether, wrongly implies that all-cause mortality is a useful surrogate for disease-specific mortality. Disease-specific interventions are not intended to reduce the risk of dying from any cause, but rather to avoid a premature death from a specific cause. As Tabar and colleagues have shown, when comparing breast cancer mortality and all-cause mortality among the breast cancer cases in the invited and control groups in the Swedish Two Country Trial, not only is a statistically significant reduction in breast cancer mortality observed, but also a statistically significant reduction in all-cause mortality.30 One would expect to see this effect from a successful screening program since death from breast cancer is a greater proportion of all deaths among women ages 40 to 70 compared with women in all age groups combined.

By agreement with the publisher, this book is accessible by the search feature, but cannot be browsed.

Copyright © 2003, BC Decker Inc.
Bookshelf ID: NBK13473

Views

  • Cite this Page

Related Items in Bookshelf

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...