NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
Institute of Medicine (US); National Research Council (US); Pignone M, Russell L, Wagner J, editors. Economic Models of Colorectal Cancer Screening in Average-Risk Adults: Workshop Summary. Washington (DC): National Academies Press (US); 2005.
Economic Models of Colorectal Cancer Screening in Average-Risk Adults: Workshop Summary.
Show detailsOrigin of the Exercise
The idea for collaboration among research teams that maintain published models of CRC screening grew out of a recent review by Michael Pignone and colleagues for the U.S. Preventive Health Services Task Force (Pignone et al., 2002). They systematically reviewed seven published CEAs of periodic CRC screening in average-risk adults. That review identified several aspects of model structure and underlying assumptions which, taken together, might account for most of the differences in cost-effectiveness rankings of CRC screening strategies. However, each model involves dozens of assumptions, and the reviewers concluded that the published reports provided insufficient information to determine which assumptions or aspects of model design were most important in explaining differences in conclusions across models.
The goal of the collaborative pre-workshop exercise was to shed light on the degree to which difference across models could be reduced by standardizing the values of key input parameters, or assumptions, across models. Any residual variation in model outcomes would be the result of differences either in parameters that remained unstandardized or in the structure of the models themselves. Secondary objectives were to demonstrate the benefit of collaborative interactions among modelers and to ascertain the research resources (time and money) required to mount such exercises.
General Approach
Five research teams with published CEAs of colorectal cancer screening agreed to participate in a comparative modeling exercise to further explore the reasons for disparate cost-effectiveness findings. Each of the models can track (via computer) a hypothetical cohort of average-risk Americans, beginning at age 50, over their remaining lifetimes and can estimate the number of years of life lived and the medical costs incurred by the members of that cohort.1 The participating research teams were:
- The Harvard Model (Frazier et al., 2000), led by Karen Kuntz, Ph.D.;
- The Ladabaum Model (Ladabaum et al., 2004a; Song et al., 2004), led by Uri Ladabaum, M.D.;
- The Miscan Model (Loeve et al., 1999, 2000), led by Marjolein van Ballegooijen, M.D.;
- The Vanderbilt Model (Ness et al., 2000) led by Reid Ness, M.D.; and
- The Vijan Model (Vijan et al., 2001), led by Sandeep Vijan, M.D.
At the workshop, each team leader described essential features of the model's structure and assumptions. (See the appendixes with speakers' presentations.) The teams further agreed to provide cost-effectiveness results for a set of five specific screening strategies across 10 different combinations of assumptions, starting with the assumptions in their original models.
The Screening Strategies
All the strategies included in the pre-workshop exercise envisioned periodic screening of all average-risk Americans beginning at age 50 and ending at age 80. The five selected strategies were:
- F/S: Annual fecal occult blood testing in combination with a flexible sigmoidoscopy every five years;
- S: Sigmoidoscopy every five years;
- R: A prototype radiology procedure every five years, with specific test characteristics and costs;
- C: Colonoscopy every 10 years; and
- F: Annual fecal occult blood testing.
These strategies were selected not for any posited superiority over other CRC screening approaches, but for the frequency with which they are advocated by practitioners today. Some of them represent strategies that have been recommended by professional groups (Smith et al., 2004; U.S. Preventive Services Task Force, 2002; Winawer et al., 2003). They also represent a wide range of procedure cost and test accuracy.
The prototype radiology strategy differed from the others by virtue of being defined by specific assumptions about costs and test performance. That route was necessary because some research teams had not investigated CRC screening with radiological technologies and therefore had no original assumptions at the ready. Moreover, an emerging imaging technique—virtual colonoscopy—may eventually join a much older radiology procedure—double-contrast barium enema (DCBE)—as an entry in the mix of available screening technologies. (Cotton et al., 2004; Pickhardt et al., 2003; Ransohoff, 2004). The assumptions specified for the prototype strategy represent an optimistic mix of cost and test performance characteristics based on the old and new radiology procedures.
The Standard Assumptions
The pre-workshop exercise specified standard assumptions in each of four groups listed below:
- follow-up and periodic surveillance regimens—the assumptions that modelers make about how the health care system responds to a positive screening test, both in the short term (diagnostic follow-up) and after removal of a pre-cancerous adenoma (surveillance);
- test performance characteristics—the sensitivity, specificity, and medical risk of tests for screening, follow-up, and periodic surveillance after treatment;
- medical costs—the costs of screening, follow-up, and surveillance, as well as the costs of treating colorectal cancer at various stages; and
- compliance—expected levels of adherence to the screening, follow-up, and surveillance strategies under evaluation.
The standardized assumptions in each of these groups are shown in Table 1.
A small number of basic assumptions, such as the discount rate, were also specified to remove possible sources of variation among models deriving from technical details (see Table 2).
Each research team first produced results with its own original assumptions, as shown in Table 3.2 Then they produced results in successive runs when assumptions in one group at a time were assigned standardized values, leaving the rest at their original values. They generated a third set of results for a series of runs when one group of assumptions was left at its original values while the rest of the groups were standardized. A final run produced estimates when all assumptions in the exercise were standardized.
The standardized assumptions were not selected with the goal of specifying “correct” values. For the most part they were selected to strike a compromise among the five research teams' original assumptions. However, some values were set to accommodate the least specific model in order to avoid the need for extensive reprogramming. For example, standardized compliance was set at 100 percent. Although an abundance of evidence suggests compliance is far less than perfect, it would have been time-consuming or impossible for all of the research teams to reconfigure their models to accommodate more realistic assumptions. This somewhat opportunistic standardization process underscores the danger of interpreting the standardized results as endorsing any specific colorectal cancer screening strategy, especially because the effectiveness of some strategies is bound to be more heavily dependent on high rates of compliance than others.
Note also that assumptions about the natural history of colorectal cancer screening differ across models, but standardizing those assumptions is especially difficult to do and was not attempted. Natural history assumptions—the prevalence and incidence of adenomas and other benign polyps, how fast adenomas progress to cancer, what proportion of cancers are preceded by benign adenomas, and how fast cancers progress from early to late stages, and life-expectancy of the population with and without colorectal cancer—are interrelated with one another. They can be specified at various levels of detail, by age, sex and race, or other risk factors, as well as by location of the lesion in the colon and by the existence of past or concurrent adenomas. Some models can incorporate very detailed natural history assumptions, whereas others cannot. Additionally, model structures vary in the kind of natural history inputs required. For example, some models require data on the monthly or annual probability that an adenoma will progress to early cancer, whereas others require estimates of the number of years of growth required before an adenoma makes the transition to colorectal cancer. Because of these difficulties, the research teams agreed that the comparative modeling exercise should not attempt to standardize assumptions regarding natural history. Instead, they agreed to provide some intermediate results: the number of adenomas or polyps detected, deaths from CRC, and total mortality at each age between 50 and 85 in the absence of screening. Those results would allow an indirect comparison of natural history assumptions across the models.3
Specification of Model Outputs
For every model run, the research teams provided the coordinators of the exercise4 with estimates of the total number of years of life lived and total medical costs incurred by a population of 100,000 average-risk 50-year-old adults from age 50 until death or age 85, whichever comes first.5 These outputs were reported both as simple totals and in terms of their net present value (NPV) at the starting age (age 50).6
The cost-effectiveness of any screening strategy compared with any other strategy or with no screening, may be calculated from those outputs. For example, the cost-effectiveness of a strategy compared with no screening at all is as follows:
If both numerator and denominator are positive, then the C/E ratio represents the extra costs required to achieve each extra year of life. If the numerator of the ratio is negative, while the denominator is positive, then the strategy saves both costs and lives and is unequivocally superior to doing nothing.
The Comparisons
The five research teams were asked to report results for the baseline—no screening—as well as for 10 runs for each of the five screening strategies, 50 runs in all, as noted in Table 4. Each team ran its model 52 times (twice for the no-screening strategy,7 and 10 times for each of the 5 screening strategies). Thus, the research teams submitted a total of 260 separate computer runs for analysis by the coordinators
Two runs represent the extremes of the standardization spectrum. Run number 1 produced results for the model's original assumptions in all four areas—follow-up, test performance, cost, and compliance. Run number 6 showed the results when all assumptions were set to their standardized values. All other model runs involved combinations of original and standardized assumptions.
Results
Baseline Estimates (No Screening)
The research teams estimated the number of years of life lived (life expectancy) by an average 50-year-old and lifetime CRC-related costs per person, when no screening program was in effect and all assumptions were set to each team's original values (Table 5). Any differences among models in those estimates would reflect variations either in model structure or in assumptions about age-specific mortality in the U.S. population, age- and stage-specific incidence of colorectal cancer, and costs of treating colorectal cancer by age and stage.
The research teams reported a range of estimates of years of life lived. The average life expectancy in the model with the highest predicted value was about 2.25 years or 1.1 times longer than in the model with the lowest value. Two models predicted almost identical life expectancies of 25 years; three predicted identical life expectancies of 27 years. In reviewing these results at the workshop, several researchers suggested that the differences were due to the use of mortality statistics from different years. Sandeep Vijan and Karen Kuntz remarked that assumptions about life expectancy in their models were based on older life tables. Mortality rates have decreased substantially in the last decade, especially in older age groups.
The variation among models reported by the research teams in estimated lifetime costs was larger than the variation in effects, with the highest estimate about 1.8 times higher than the lowest. Those disparities reflect the models' very different assumptions about and approaches to estimating the cost of treating colorectal cancer. When treatment costs were standardized to the values shown in Table 1, the range of estimated costs diminished substantially to a ratio of 1.2 between the highest and lowest values. Some participants posited that differences in assumptions about cancer incidence probably account for the remaining variation in colorectal cancer costs.
Screening Estimates Under Original Assumptions
Differences among the five models in estimates of the effect of screening under each team's original assumptions were presented by Michael Pignone and discussed by the research teams and other participants.
Comparing screening with no screening. Figure 1 shows the net increases in years of life lived and lifetime costs (discounted to their NPV), compared with no screening under the full set of original assumptions adopted by each research team. The research teams reported wide variation in ratio terms for each of these two components of cost-effectiveness. For example, the NPV of lifetime cost reported for a screening program of flexible sigmoidoscopy every 5 years ranged from $224 per person (Miscan) to $1,159 per person (Vanderbilt), a five-fold difference between the two. The predicted gains in life expectancy from screening are less varied than for costs, but still high. For example, the net present value of life-years gained from flexible sigmoidoscopy ranged from 2,723 per 100,000 50-year-olds (Miscan) to 4,265 (Vanderbilt), a ratio between the highest and lowest of about 1.6.
The research teams reported that the most effective strategy differed across the models. Two models predicted that F/S gains the most years of life for the population, two models predicted that R would be most effective, and one model predicted C is the most effective. The least costly strategy also differed across models. Two models predicted that S is the least costly strategy, two that F is least costly, and one that R is least costly8
As a result, estimates of the cost-effectiveness ratio also varied across the five models, in some cases by a five-fold difference between the highest cost-effectiveness ratio and the lowest (Figure 2). Despite that variation, Michael Pignone pointed out, all the models show that all of the strategies meet common benchmarks of cost-effectiveness. Every research team estimated that, when compared with no screening, colorectal cancer screening could deliver an additional year of life for a cost of less than $40,000, regardless of which strategy is adopted.
Comparing strategies with one another. The goal of cost-effectiveness analysis is to compare alternative strategies with one another (Gold et al., 1996). The disparities among CRC models in such comparisons prompted the Workshop to begin with. So, participants reviewed the performance of strategies with each other as reported by the research teams.
The first step in making such comparisons is to rule out any screening strategy that is both less effective and more costly than at least one other. Strategies ruled out at this stage are referred to as “strongly dominated.” The second step is more subtle. It requires ruling out any strategy whose gains in life expectancy, compared with the next most effective strategy, come at an incremental cost that is higher than the incremental cost of achieving gains at least as great through still another strategy. Strategies ruled out at this stage are referred to as “weakly dominated.” Any strategies surviving this two-step elimination process present a true trade-off between successively higher costs and greater health benefits. Louise Russell reminded participants, however, that the process is based on point estimates, which are subject to uncertainty. All research groups have routinely assessed the effect of uncertainty on those estimates. Had the exercise included such analyses, it might have found that some strategies that were ruled out were essentially equivalent to those ruled in.
Once the strategies surviving the two rule-out tests are identified, their incremental cost-effectiveness ratios can be calculated by sorting them into ascending order of effectiveness, measuring the differences in both cost and years of life gained compared with the next most effective strategy (or with no screening for the least effective strategy), and calculating the cost-effectiveness ratio. Michael Pignone summarized the results. Across the five models, the surviving strategies differed substantially and the incremental cost-effectiveness ratios of those strategies also differed (Table 6). Thus, according to Pignone, under their original assumptions, the five research teams would present very different options to policy makers.
Estimates of Screening Under Standardized Assumptions
Michael Pignone presented the effect of standardizing all of the assumptions in the four groups together on differences among models.
Comparing screening with no screening. Under the full set of standardized assumptions, the two components of the cost-effectiveness ratio still varied across models. Sometimes, but not always, by less in ratio terms than when the models used their original assumptions (Figure 3). Differences across models in predicted per capita lifetime costs were greatest for strategy S, where they ranged from $718 per person (Miscan) to $1,436 per person (Vanderbilt), a two-fold difference between the two. (Recall that the difference was five-fold under the original assumptions.)
Differences across models in years of life gained from screening did not change in a systematic way after standardization. The range of variation grew modestly for two strategies and declined for the other three. The NPV of life-years gained from strategy S ranged from 3,470 per 100,000 50-year-olds (Vijan) to 6,954 (Vanderbilt), a ratio of 2.0 between the highest and lowest, compared with a ratio of 1.6 under the teams' original assumptions. The two strategies involving sigmoidoscopy seemed to resist convergence in predicted years of life gained more than other strategies.
Standardization of assumptions did result in agreement across models on the most effective and least costly strategies. All of the research teams estimated that F/S gains the most years of life and all found F to be the least costly strategy.
The cost-effectiveness ratio for each strategy continued to vary across the five models, but the range of difference as measured by the ratio of the highest to lowest narrowed with full standardization (Figure 4).With all tested assumptions standardized, the cost-effectiveness ratio varied across models by a factor of 1.5 to 2.0 for every strategy.
Comparing strategies with one another. Under standardized assumptions, all modules agreed about which strategies survived the dominance test (Table 7), but the incremental cost per year of life gained for F/S, versus S, still varied widely.
Effect of Specific Assumption Groups on Variations across Models
The research teams examined the separate effect of each group of assumptions on the estimates for four strategies (versus no screening).9 They compared model results when each of the four assumption groups was standardized while the rest were set to their original values. Estimated years of life gained did not show any general pattern of convergence (Table 8). The ratio between the highest and lowest estimate of years of life gained actually increased for some strategies when some assumption groups were standardized. The range of estimates for lifetime costs associated with a particular strategy declined substantially for two strategies but increased slightly for two others.10
The cost-effectiveness ratios for each strategy did converge across models as a result of standardizing costs (Table 9). That result led Michael Pignone to conclude that standardizing cost assumptions seemed to have the biggest effect on convergence among models. However, he also warned that standardizing other groups of assumptions individually did not lead to systematic convergence across models in the estimated cost-effectiveness of any strategy. Because the cost-effectiveness ratios converged when all four assumption groups were standardized, Pignone observed, it is probable that the assumption groups interact in their effects on model outcomes.
Lessons Learned from the Exercise
The results of the pre-workshop exercise prompted substantial discussion among the workshop participants. Comments focused both on the strengths and limitations of the exercise itself and on the implications of the collaborative exercise for further model development.
The Impact of Subtle Differences in Model Structure
Workshop participants identified some subtle differences in structure across models that affected the results of the exercise itself. One is how the different models account for polyps that are not adenomas. As described by T.R. Levin, most experts believe that the vast majority of colorectal cancers arise from pre-cancerous adenomas. These lesions come in a variety of morphologic and histological forms and they grow and progress to cancer at varying speeds. They are not, however, the only polyps that appear in the colon or rectum—other kinds of benign lesions, notably hyperplastic polyps, are quite common in older people (Lieberman et al., 2003). Although hyperplastic lesions are thought to present a low risk for progression to cancer (Imperiale et al., 2003; Lieberman et al., 2004), some screening technologies may detect them with higher frequency than others. In particular, endoscopy and radiology would be more likely to detect non-adenomatous polyps than would fecal occult blood testing, because non-adenomatous polyps rarely bleed.11 Once detected, however, such lesions are typically removed and sent for biopsy because they cannot be differentiated from adenomas by any other method. Martin Brown observed that the cost of follow-up procedures triggered by detection of a nonadenomatous lesion may have a major effect on the incremental cost of screening.
The research teams reported that not all of the models account for the implications of detecting non-adenomatous lesions. Karen Kuntz noted that the Harvard model did not include such lesions at all. Some recognize them implicitly rather than explicitly by making a downward adjustment in the assumed specificity (i.e., increasing the false positive rate) of the screening test, or an upward adjustment in the average cost of diagnostic follow-up of adenomas detected through screening. Reid Ness observed that standardizing assumptions in the groups involving test performance (i.e., test specificity) and costs (i.e., follow-up costs) masked these subtle differences in model structure.
The Vanderbilt team was the first to recognize the impact of non-adenomas on the standardized results of the pre-workshop exercise. Vanderbilt's estimates of the lifetime costs of all screening strategies were much higher than those reported to the workshop by the other research teams (see Figure 3B). The Vanderbilt model explicitly recognizes the prevalence of non-adenomatous polyps and independently records the costs of diagnostic follow-up of those lesions. Because other models either excluded those costs or considered them implicitly through adjustments in other assumptions, they effectively ignored them when test specificity and unit costs were standardized. The Vanderbilt team assessed the importance of this difference in model structure by reanalyzing the five strategies under fully standardized assumptions after setting the prevalence of non-adenomas to zero in their model. Reid Ness reported that the lifetime costs of all screening strategies declined (Table 10). Those with the highest relative decline were the screening strategies most likely to detect non-adenomas, namely those that involve direct visualization of the colon and diagnostic follow-up of all polyps with colonoscopy.12 Ignoring non-adenomas also had a small negative impact on life years gained, because doing so would imply fewer referrals to colonoscopy. Such referrals generated by a screening test that was positive because of a non-adenomatous polyp would sometimes result in serendipitous discovery on follow-up of an adenoma or cancer, with consequent life-extending benefits.
The Vanderbilt team reported that their reanalysis had a limited effect on incremental cost-effectiveness ratios under standardized assumptions. Their results were in closer agreement with those of the other models, but their estimate of the incremental cost-effectiveness of moving from F to F/S was still much higher (Table 11). Ness concluded that different approaches to non-adenomas may have been responsible for some of the variation among models. However, other factors recognized but not fully understood by participants continued to support a high level of variation in the incremental cost-effectiveness estimates for alternative screening strategies.
Other Limitations of the Exercise
Several participants noted that standardizing to a single set of values in each assumption group is insufficient if the goal is to determine unequivocally the extent to which variation across models can be explained by different values in the four groups of assumptions.13 Other standardized values for the same group of assumptions might have generated more, or less, agreement among models than did the values chosen for the pre-workshop exercise. In the extreme, it might be possible to force a measure of agreement among models by selecting standardized assumptions that strongly favor certain strategies. Judith Wagner noted that the standardized assumptions selected in the pre-workshop exercise may have differentially favored the two strategies involving fecal occult blood testing—F and F/S—since the effectiveness of fecal occult blood testing is especially sensitive to assumptions about compliance. A more robust exercise would have tested multiple values for standardized assumptions, perhaps selected probabilistically from a range of possible values. Such an exercise—involving hundreds or thousands of model runs—would have required time and resources that none of the research teams could afford without external funding.
Several participants noted that convergence among models does not necessarily imply that the models are valid representations of the true cost and effectiveness of any given CRC screening strategy. To paraphrase Marjolein van Ballegooigen, if the models merge when we standardize, should we believe the merged results? The ultimate test of any model is how well it predicts what occurs in the real world. If all models share flawed designs or assumptions, agreement does not constitute validity.
Michael Pignone observed that a key structural aspect shared by all the models is that the sensitivity of each subsequent screening test performed in a periodic screening program is independent of the results of earlier tests. That assumption may be questioned most strongly in the case of annual fecal occult blood testing. Researchers have posited that some adenomas may never bleed, while others may bleed regularly. If more were known about whether such patterns actually exist, and the frequency with which they do, models could be constructed that would adjust the assumed probability that people with adenomas receive positive fecal occult blood testing results in the second and subsequent years of a screening program, based on their test results in previous years. In Pignone's view, adjustments such as these could have profound effects on the estimated effectiveness of periodic fecal occult blood testing. At present, however, data simply do not exist to provide reasonable estimates of such contingent probabilities, and modeling them would be a complex undertaking.
Footnotes
- 1
Some of models can track all age cohorts of adults over a long period of time as well as specific age cohorts.
- 2
Researchers were given the choice of using the same assumptions as those made in their published papers or using other assumptions if more recent work had led them to new assumptions in the current versions of their models.
- 3
The intermediate results were not presented at the workshop and are therefore not discussed in this summary.
- 4
Four workshop participants, Martin Brown, Louise Russell, Michael Pignone, and Judith Wagner, led the development of the pre-workshop exercise and coordinated the analysis of its results. Michael Pignone presented the analysis at the Workshop (See his presentation in Appendix I.)
- 5
Screening programs lasted 30 years, but the reporting period continued for 35 years.
- 6
All comparisons using NPV applied an annual discount rate of 3 percent (Gold et al., 1996).
- 7
The no-screening strategy required Runs #1 and #2, because standardizing costs of treating colorectal cancer (as in Run # 2) would change model outcomes even without screening. All other runs involve changes in assumptions that would occur only under a screening regimen.
- 8
In discussion following the presentation of the results, Louise Russell and others pointed out that the analysis focused on differences across models in their single best estimates of effects and costs, whereas the research teams have acknowledged and reported on the range of uncertainty surrounding their estimates in their research papers. Had uncertainty been modeled in this exercise, the range of reported results might have overlapped.
- 9
Recall that strategy R was conceived with standard test performance and test costs at the outset because at the time not all of the research teams had studied a radiologic technology for screening. Therefore, the analysis of the impact of individual strategies did not include R.
- 10
One model estimated negative net lifetime costs for strategy F under standardized cost assumptions. The absolute difference in costs between the highest-cost and lowest-cost estimates increased compared with the difference among the models when original assumptions were used.
- 11
Even when a non-adenomatous polyp is not detectable by the screening test, it could be found serendipitously through diagnostic follow up of a test result that was positive for reasons unrelated to its presence.
- 12
Note that under the standardized assumptions\ all polyps found on sigmoidoscopy are referred to full colonoscopy for removal and biopsy.
- 13
Recall that natural history assumptions were not standardized in the exercise.
- THE COLLABORATIVE MODELING EXERCISE - Economic Models of Colorectal Cancer Scree...THE COLLABORATIVE MODELING EXERCISE - Economic Models of Colorectal Cancer Screening in Average-Risk Adults
- Workshop Speaker and Staff Biographies - Economic Models of Colorectal Cancer Sc...Workshop Speaker and Staff Biographies - Economic Models of Colorectal Cancer Screening in Average-Risk Adults
Your browsing activity is empty.
Activity recording is turned off.
See more...