U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Henderson JT, Webber EM, Weyrich M, et al. Screening for Breast Cancer: A Comparative Effectiveness Review for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2024 Apr. (Evidence Synthesis, No. 231.)

Cover of Screening for Breast Cancer: A Comparative Effectiveness Review for the U.S. Preventive Services Task Force

Screening for Breast Cancer: A Comparative Effectiveness Review for the U.S. Preventive Services Task Force [Internet].

Show details

Chapter 3Results

Literature Search

We reviewed 10,379 abstracts and assessed 420 full-text articles for inclusion (Appendix B Figure 1). Overall, we identified 20 studies (reported in 45 articles79,80,126168) that met our inclusion criteria. Lists of included and excluded studies (with reasons for exclusion) are available in Appendix C and Appendix D. No RCTs were excluded from this review due to poor quality; however, 15 nonrandomized studies of interventions (NRSIs) were excluded for poor quality, primarily due to confounding based on imbalances in baseline characteristics (without proper statistical adjustment), selection into study groups, and the absence of information on participant characteristic by study arm.

The numbers of studies and outcomes reported for each KQ are described in Table 3. Details on study design, population, and methodologies are provided in Table 4, Table 5, and Appendix E Table 1. The age to begin screening was not addressed in any of the included studies; the age to stop screening was addressed in one NRSI.136 The effects of different screening intervals (i.e., annual, biennial, triennial) were addressed in five studies, one RCT,126 and four NRSIs.140,153,154,159 Eleven of the included studies, four RCTs,129,139,143,160 and seven NRSIs79,132,140,144,147,162,168 compared outcomes for screening with DBT versus DM. One RCT164 and one NRSI135 evaluated the effects of an invitation to supplemental screening with MRI for participants with dense breasts after receiving a negative screening mammography. One RCT158 and one NRSI152 addressed the use of supplemental ultrasound screening. No studies of interventions involving personalized screening based on predicted risk met the eligibility criteria for this review.

Table Icon

Table 3

Health Outcomes and Harms Reported by Included Trials and Nonrandomized Studies, by Intervention Category (k = 19).

Table Icon

Table 4

Study Characteristics of Included Trials and Nonrandomized Studies of Screening Approaches and Modalities.

Table Icon

Table 5

Population Characteristics of Included Trials and Nonrandomized Studies, by Intervention Category.

Health outcomes (KQ1) associated with different screening programs were reported in only two fair-quality NRSIs that addressed the age to stop screening136 or screening interval159 (Table 4). For invasive cancer detection (KQ2), two studies addressed the effect of screening frequency on the characteristics of detected cancers, including one fair-quality RCT of multiple rounds of screening126 and one fair-quality cases-only analysis from the BCSC.154 Five studies of DBT compared with DM, three RCTs129,143,160 (two good- and one fair-quality), and two NRSIs144,168 reported screening outcomes from more than one round of screening and were included for KQ2. These studies reported characteristics of cancers detected at each round, necessary to assess whether screening resulted in stage shift toward less advanced cases with better prognosis. All 20 studies were included to examine potential harms of different screening approaches (KQ3).

Overall, the demographic characteristics of study participants were minimally described (Table 5). Most studies included participants in their 40s to 60s, with one study focusing on screening after age 70 years.136 Only seven of the 20 studies reported racial and/or ethnic characteristics. For six studies, participants were primarily White (73% to 92%), with <1 to 11 percent identified as Black, 2 to 11 percent as Asian, and 5 to 7 percent as Hispanic.132,136,147,152,154,168 An included electronic health record–based study included primarily Hispanic/Latina participants (76%) along with 10 percent Black and 10 percent White participants.153

KQ1. What Is the Comparative Effectiveness of Different Mammography-Based Screening Strategies on Breast Cancer Morbidity and Mortality?

Summary of Results

We did not identify any RCTs designed to test the comparative effectiveness of ages to start or stop screening, screening interval, or screening modality that reported morbidity, mortality, or quality of life outcomes. Two NRSIs reported mortality outcomes (breast cancer mortality, all-cause mortality)—one comparing mortality based on different ages to stop screening and another comparing annual to triennial screening intervals. One fair-quality observational study (N = 1,058,013) was conducted in the United States using a random sample from Medicare claims data to estimate the effect of women stopping screening at age 70 compared with those who continued annual screening after age 70. Individuals included in the study had a high probability of living for 10 more years at the start of the study. The data were analyzed using statistical methods that have been developed to emulate per-protocol trials of screening. Continued screening between the ages of 70 and 74 was associated with a 22 percent decrease in the risk of breast cancer mortality compared with a cessation of screening after age 70. The difference in absolute rates was small (1 fewer death per 1,000 women screened) and the confidence interval for the rate difference included null. The analysis found no difference in the hazard ratio or absolute rates of breast cancer mortality with continued versus discontinued screening from ages 75 to 84. The second NRSI was a fair-quality study (N = 14,765) conducted in Finland during the years 1985 to 1995 that assigned participants ages 40 to 49 years of age to annual or triennial screening invitations using birth year. The study reported similar mortality from incident breast cancer and for all-cause mortality between the two study groups.

Detailed Results by Screening Intervention

Age to Start or Stop Screening

Study and Population Characteristics

One fair-quality NRSI study by García-Albéniz et al. used U.S. Medicare data from 1999–2008 and National Death index data to conduct an emulated trial evaluating the effect of stopping annual mammogram screening at the age of 70 versus continuing annual screening beyond this age (Table 4).136 Annual screening was the most frequent pattern in the data for this time frame. An emulated trial uses statistical techniques to structure and adjust observational data in a way that can approximate a target (per-protocol) randomized trial.169 The study was conducted using a 20 percent random sample of enrollees ages 70 to 84 years in Medicare parts A and B (N = 1,058,013) between 1999 and 2008 (Medicare Advantage enrollees, who comprised 13%–21% of Medicare beneficiaries, were not included). Data on demographic characteristics, chronic conditions, preventive care, screening mammograms, breast cancer symptoms and signs, and breast cancer incidence and treatments were analyzed along with cause of death information obtained from the National Death Index from the National Center for Health Statistics.

A Medicare-specific comorbidity score was computed to exclude individuals that did not have a high probability of living an additional 10 years. Participants could not have a prior breast cancer diagnosis or have breast symptoms or a mammogram in the previous nine months. The trial was emulated for two age groups, those ages 70 to 74 (N = 1,235,459) and those ages 75 to 84 (N = 1,403,735). At each year of age, individuals were randomly assigned to the stop screening or continue screening strategy and the data were analyzed according to whether they had adhered to their assignment, resulting in 15 per protocol trial emulations (for each year of ages 70 to 84). Participants were followed until death, Medicare disenrollment, or the year 2008, whichever came first. A discrete hazard model was approximated using a pooled logistic regression model, and observations were cloned for analytic reasons and censored when they deviated from the randomly assigned screening strategy. Sensitivity analyses were used to evaluate the robustness of findings for a range of assumptions. The baseline characteristics for the sample were described for these two groups and showed that a majority of the sampled eligible participants in these Medicare plans were White (>90%), with 5 percent of participants reported as Black, and 3 percent as “other” (no additional information provided) (Table 5). The older age group (75 to 84) had more frequent visits to the emergency room and more chronic conditions. These factors and other baseline characteristics were adjusted for in all analyses to account for possible differences that could affect assignment and adherence to the screening strategy (stop at age 70 or continue).

Outcomes

In the García-Albéniz NRSI,136 1,058,013 individuals contributed data to an average of 2.5 age-specific emulated trials. Therefore, after pooling all ages (70 to 84 years), 2,639,194 individuals contributed 4,656,465 person-years to the continued screening strategy and 7,170,142 person-years to the stop screening strategy. In the continued screening group, there were 1,533 breast cancer deaths and in the stop screening strategy there were 1,304 breast cancer deaths.

For women ages 70 to 74, the estimated 8-year risk of breast cancer mortality with continued annual screening was 2.7 per 1,000 women (95% CI, 1.8 to 3.7); it was 3.7 per 1,000 women (95% CI, 2.7 to 5.0) with discontinuation after age 70 (risk difference, −1.0 [95% CI, −2.3 to 0.1]). Despite this small, statistically nonsignificant risk difference in mortality risk for the age group, the adjusted hazard ratio (aHR) suggested a 22 percent lower hazard of 8-year breast cancer mortality with continued screening among those ages 70 to 74 (aHR, 0.78 [95% CI, 0.63 to 0.95]). For women ages 75 to 84, the 8-year estimated risk of breast cancer mortality was 3.8 per 1,000 women (95% CI, 2.7 to 5.1) with continued screening and 3.7 per 1,000 women (95% CI, 3.0 to 4.6) (risk difference, 0.07 [95% CI, −0.93 to 1.3]) with discontinuation, with an estimated hazard ratio of 1.00 (95% CI, 0.83 to 1.19). These study results are the fully adjusted effect estimates that account for baseline demographics, chronic conditions, and health care use, as well as time-varying factors including screening history, use of health care resources, and comorbidities. Without adjustment for factors that would contribute to adherence to the continue or stop screening strategy, the risk differences are larger and more favorable for those who continued annual screening, especially in the 70 to 74 years age group. Overall, the adjusted findings did not show a statistical difference in the 8-year risk of breast mortality for women screened beyond age 75 compared with women who discontinued screening.

KQ1a. Does Comparative Effectiveness Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

The included study for KQ1 on age to stop screening did not present comparisons that tested or stratified mortality by participant characteristics or risk markers.

Screening Interval

Study and Population Characteristics

One fair-quality NRSI159 conducted in Turku, Finland from 1987 to 2007 compared rates of mortality associated with annual or triennial screening from ages 40 to 49 (Table 4). Over 10 years (1985–1995) the study sent invitations to all female residents of the screening catchment area starting at age 40 as part of the national screening program (N =14,765). Study group assignment was determined based on birth year (even year birth annual, odd year birth triennial). All individuals invited to screening were followed for 10 years to assess incident cancer and an additional three years (to age 52) to assess mortality from breast cancers presenting from ages 40 to 49 as well as all-cause mortality. No data were reported on the demographics of participants, such as race, breast density, or presence of underlying risk factors (Table 5). Two-view, double read mammography was conducted by eight radiologists at a single screening center serving the city of Turku. The attendance rate for those invited to screening was 85 percent (not reported by study arm).

The intention-to-treat analysis was designed to test the effect of invitations to more or less frequent screening (2.8 versus 9.2 on average per person over the 10-year period). Data for the study outcomes were obtained through linkage with the Finnish Cancer Registry, the national Statistics Finland mortality registry, and the Turku clinical breast cancer database. All diagnoses and outcomes were cross-checked across the data sources, and medical chart review was conducted to resolve discrepancies. The analysis used person-years calculated from ages 40 to 49 for breast cancer incidence outcomes and from ages 40 to 52 for mortality outcomes to compute rates per 100,000 person-years. During the study, breast cancer incidence between ages 40 and 49 was similar for those invited to annual screening (141.1 per 100,000 person-years) and those invited to triennial screening (144.0 per 100,000 person-years). Unadjusted Poisson regression was used to estimate the relative rate of incidence and mortality.

Outcomes

The 14,765 people invited to screening for this study contributed 100,738 person-years to the triennial screening invitation group and 88,780 person-years to the annual screening invitation group for estimation of mortality outcomes.159 Mortality from incident breast cancer diagnoses occurring from ages 40 to 49 (with followup to age 52) was similar between groups, with 20.3 deaths per 100,000 person-years with annual screening invitations and 17.9 deaths per 100,000 person-years with triennial screening invitations (relative risk [RR], 1.14 [95% CI, 0.59 to 1.27]).

All-cause mortality (including mortality from prevalent and incident breast cancer diagnoses) was higher in the intention-to-treat analysis for invitation to annual screening (230.9 per 100,000 person-years) compared with invitation to triennial screening (192.6 per 100,000 person-years) and there was a trend suggesting an estimated 20 percent increased risk due to the relative risk and a confidence interval on the margin of null (RR, 1.20 [95% CI, 0.99 to 1.46]). An explanation or mechanism for the higher mortality rate related to more frequent screening could not be identified by the study authors. Deaths from other cancers and deaths from “other natural causes” (not defined) were higher in the annual screening invitation group, whereas deaths from violent causes (accidents, intoxication, murder, suicide) were higher in the triennial invitation group.

KQ1a. Does Comparative Effectiveness Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

The included study for KQ1 on screening intervals did not present comparisons that tested or stratified mortality by participant characteristics or risk markers.

Digital Breast Tomosynthesis

No comparative studies reporting morbidity or mortality outcomes for screening with DBT compared with DM were identified.

Magnetic Resonance Imaging

No eligible comparative studies of MRI screening that reported mortality or morbidity health outcomes were identified.

Ultrasound

No eligible comparative studies of ultrasound screening that reported mortality or morbidity health outcomes were identified.

Personalized Screening Programs Using Risk Assessment

No eligible comparative studies of personalized screening that reported mortality or morbidity health outcomes were identified.

KQ2. What Is the Comparative Effectiveness of Different Mammography-Based Screening Strategies on the Incidence and Progression to Advanced Breast Cancer?

Summary of Results

There were no eligible comparative effectiveness studies of the age to start or stop screening that reported the outcome of cancer incidence and progression to advanced cancer across multiple screening rounds.

One older fair-quality RCT (n = 76,022) conducted between 1989 and 1996 randomized individuals to annual or triennial screening.126 The number of screen-detected cancers was higher in the annual screening study arm (RR, 1.64 [95% CI, 1.28 to 2.09]). The total number of cancers diagnosed either clinically or with screening was similar after three years of screening (one triennial incidence screen, three annual incidence screens). Cancers occurring in the annual screening group (including clinically diagnosed cancers) did not differ by prognostic features such as tumor size, node positivity status, or histologic grade compared with those in the triennial screening group. The study did not report mortality outcomes, so it was not possible to ascertain whether the increase in the proportion of cancers detected by screening would influence health outcomes. Estimated effects based on prognostic indices did not predict statistically significant differences in mortality based on the tumor characteristics. Given the timing of the study, applicability is limited due to developments in screening technology, prognostics, and treatment effectiveness.

A fair-quality NRSI used BCSC data (N = 15,440) to compare the tumor characteristics of cancers detected following annual versus biennial screening intervals.154 The reported tumor characteristics were presented in adjusted analyses stratified by age and menopausal status categories. The detection of stage IIB or higher cancers and cancers with less favorable characteristics did not differ by age when comparing annual to biennial screening intervals. For premenopausal individuals, however, a biennial interval preceding diagnosis was associated with having a higher stage tumor (IIB or higher) (RR, 1.28 [95% CI, 1.01 to 1.63]; p=0.04) and tumors with less favorable prognostic characteristics (RR, 1.11 [95% CI, 1.00 to 1.22]; p=0.047). For postmenopausal individuals with and without use of hormone therapy, there was no difference between cancers that were preceded by annual or biennial screening. The study did not conduct formal tests for interaction in the subgroup comparisons.

Results from three RCTs (N = 130,196)129,143,160 and two NRSIs (N = 597,267)144,168 comparing DBT with DM screening reported invasive cancer detection and the characteristics of detected cancers from at least two rounds of screening (study participants were screened with a common modality at the second round). While cancer mortality results are not yet available from the trials, stage shift in the tumor characteristics across screening rounds could offer indirect evidence of potential screening benefit. Two RCTs129,160 and one NRSI144 used DM for all participants at the second screening round and one RCT143 used DBT for all participants at the second screening round. An additional NRSI reported DBT screening outcomes over multiple rounds compared with individuals receiving only DM.168 The three trials showed higher invasive cancer detection at the first round of screening in the DBT arm (pooled RR, 1.41 [95% CI, 1.20 to 1.64]; I2=7.6%; 3 trials; n = 129,492). Results were consistent with these in one NRSI, and not the other. At the second screening round in the RCTs (where all study participants were screened with a common modality), invasive cancer detection was similar for the group assigned to DBT at round one compared with the group assigned DM at round one (pooled RR, 0.87 [95% CI, 0.73 to 1.05]; I2=0%; 3 trials; n = 105,244). Results in the NRSI were not entirely consistent with the trial results; one reported fewer cancers at the second round for those originally screened with DBT144 and another reported no differences in detection by modality at round one or two, but a small difference at round three.168

The three trials and two NRSIs reported tumor characteristics that inform staging such as tumor diameter, histologic grade, or node status. No statistically significant differences in these or other individual tumor prognostic characteristics that were reported at followup rounds of screening were found, but statistical power was limited for comparisons of less common tumor types. None of the three RCTs reported statistically significant differences in cancer stage (stage II or higher) at the second screening round. Other outcomes reflecting potential tumor progression, such as tumor size, nodal status, and grade, also were not statistically different at round two in the trials, including when quantitative pooling was supported. The NRSI reporting findings from two or more rounds of screening with DBT versus DM found no statistically significant difference in screen-detected advanced cancer (−0.06 per 1,000 [95% CI, −0.14 to 0.03]).168 Taken together, these results do not provide evidence that DBT screening generates stage shift or prevents progression to advanced cancer. Limited results stratified by age and breast density reported in the RETomo and To-Be RCTs did not suggest differences in invasive cancer detection at a second round of screening for people who had been screened with DBT at the first screening round, but tests for interaction were not conducted and estimates were imprecise.

We did not identify any studies that reported data from more than a single screening round that could be used to compare shifts in cancer stage to assess the effectiveness of age to start or stop screening, the use of supplemental screening modalities, or personalized screening programs using risk assessment.

Detailed Results by Screening Strategy

Age to Start or Stop Screening

No eligible studies were identified that reported the cancer stage at detection across multiple screening rounds to provide evidence of a beneficial stage shift with screening when commenced earlier or continued to later ages.

Screening Interval

Study and Population Characteristics

Two studies addressed the effect of screening frequency on the characteristics of detected cancers. The fair-quality UKCCCR RCT was conducted as part of the U.K. National Breast Screening Program during the years 1989 to 1996 that randomized people ages 50 to 62 to annual (N = 37,530) or triennial (N = 38,492) breast cancer screening (Table 4).126 No characteristics other than participant age were reported (Table 5). The cumulative incidence of invasive cancer (including screen-detected and invasive cancers) was reported for all participants who attended a prevalence screening visit. The study was designed to compare the incidence of cancer in an annually screened group (three screens after the prevalence screen) and in a triennially screened group (one screen after the prevalence screen). The randomization scheme for the trial was conducted by month of birth for the first two years of the trial but thereafter used a computerized randomization scheme implemented through the national screening program. Cancer outcomes were obtained through searches of the pathology reports and databases maintained by hospitals involved in the U.K. National Breast Screening Program. Reports from pathologists on the prognostic factors for each cancer were obtained and reviewed by two consultants. Size, node status, and histological grade were also used to code prognostic indices for the cancers (e.g., Nottingham Prognostic Index). The analysis of cancer prognostic characteristics included both screen-detected and interval cancers. The study also reported the tumor diameter, lymph node positivity, and histologic grade for all of the cancers diagnosed during the study, including interval and screen-detected cancers.

A fair-quality BCSC NRSI by Miglioretti et al. used data on cancers detected in the BCSC registries from 1996 to 2012 (Table 4).154 The study compared the interval of screening relative to the characteristics of screen-detected and interval cancers. Individuals were included in the analysis if their cancer was preceded by at least two screening mammograms either 11 to 14 months apart (annual interval) or 23 to 26 months apart (biennial interval). The characteristics of women with cancers preceded by an annual screening interval (n = 12,070) and those preceded by a biennial interval (n = 3,370) differed on some reported factors; those with an annual interval preceding a cancer diagnosis were less likely to be ages 40 to 49 (14% versus 18%) or 70 to 85 (29% versus 27%), and more likely to have a first-degree family history of breast cancer (23% versus 18%). The groups did not differ in race and ethnicity composition, and over three-quarters of the study population was non-Hispanic White (78%), with the remaining participants reported as Black (5%), Asian (5%), Hispanic (5%), AI/AN (<1%), and 7% reported as “other” or unknown (Table 5). This study did not report overall effects of the screening interval on cancer detection by stage, but provided detailed results on the stage at detection stratified by age and menopausal status that are reported as KQ2a results below.

Outcomes
Screen-Detected Invasive Cancer and Prognostic Characteristics by Round

In the UKCCCR, there were more invasive screen-detected cancers detected in the annual screening arm (4.42 per 1,000 people screened, representing 71% of overall cancers) compared with the triennial screening arm (2.70 per 1,000 people screened, representing 50% of overall cancers) (RR, 1.64 [95% CI, 1.28 to 2.09]) (Table 6). After three years of screening (three incidence screens in the annual arm, one incidence screen in the triennial arm) a similar number of cancers (screen-detected and interval cancers) had been diagnosed in the annual screening arm (6.26 per 1,000 screened) and the triennial screening arm (5.40 per 1,000 screened) (RR, 1.16 [95% CI, 0.96 to 1.40]). In comparisons of all cancers that occurred over the course of the study, including interval and screen-detected, there were no statistically significant differences in tumor size, nodal status, histological grade, or the prognosis (Table 6). Mortality data from the study have not been reported, but based on estimates from the prognostic indices, the authors concluded that annual screening confers lead time bias (estimated to be ~6 months) but did not result in downstaging of screen-detected cancers that would influence breast cancer survival or risk of death.

Table Icon

Table 6

Characteristics of Screen-Detected Invasive Cancers Diagnosed Following an Annual Versus Triennial Screening Frequency.

KQ2a. Does Comparative Effectiveness Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

The Miglioretti BCSC NRSI reported the tumor characteristics and a prognostic characteristic variable for cancers diagnosed among individuals with an annual or biennial screening interval preceding their diagnosis, stratified by age (40 to 49, 50 to 59, 60 to 69, 70 to 85) and by menopausal status.154 The adjusted analysis presented in the study compared those with a biennial versus an annual interval preceding their diagnosis. Screen-detected and interval cancers (12 months followup for annual screened, 24 months followup for biennial screened) were included in the comparisons (Table 7). The relative risk of being diagnosed with a stage IIB or higher cancer was not statistically different for biennially compared with annually screened women in any of the age categories. The composite variable indicating less favorable prognostic characteristics (stage IIB+, tumor size >15 mm, or node-positive) also was not statistically different for any age group comparing those biennially versus annually screened before their diagnosis. Analyses comparing stage (IIB or higher) and less favorable prognostic tumor characteristics stratified by menopausal status showed statistically significant effects of the screening interval. The risk of a stage IIB or higher diagnosis was higher for premenopausal women screened biennially compared with annually (RR, 1.28 [95% CI, 1.01 to 1.63]; p=0.04). Similarly, a marginally significant increased risk of having a tumor with less favorable prognostic characteristics was seen for premenopausal women when they had been screened biennially versus annually prior to their diagnosis (RR, 1.11 [95% CI, 1.00 to 1.22]; p=0.047). For postmenopausal individuals (with and without hormone therapy use), tumor stage and prognosis were statistically similar when preceded by annual or biennial screening. The study did not conduct formal tests for interaction in the subgroup comparisons and did not adjust for multiple comparisons.

Table Icon

Table 7

Characteristics of Cancers Diagnosed Following an Annual Versus Biennial Screening Frequency, by Population Subgroup.

Digital Breast Tomosynthesis

Study and Population Characteristics

We included three RCTs129,143,160 and two NRSIs144,168 that reported cancer detection across more than one round of screening and could therefore be used to assess invasive cancer detection and stage shift across screening rounds as an intermediate outcome to compare the effectiveness of DBT with DM screening (Table 4). Overall, population characteristics were sparsely reported for the trials (Table 5).

The fair-quality Proteus Donna RCT conducted in Italy reported screening results from two rounds of screening with randomization to DBT/DM (n = 30,844) or DM (n = 43,022) for the first round of screening and DM screening for all participants at the second round of screening.129 Participants in the national screening program from ages 46 to 49 were offered annual screening if they opted to participate in the screening program, and routine screening in the program was offered biennially for women ages 50 to 68. Recruitment began in December 2014 and the trial was completed in December 2017. The mean age of participants was 57 years; information on breast density was not reported. Independent double reading was used with participants recalled based on the recommendation of either radiologist.

A good-quality RCT conducted in northern Italy reported on the characteristics of cancers detected at two consecutive rounds of screening. The Reggio Emilio Tomosynthesis study (RETomo) prospectively randomized women to undergo DBT/DM (n = 13,356) or DM (n = 13,521) at baseline followed by DM screening for all eligible participants one or two years later.160 Women ages 45 to 49 were offered annual screening and those ages 50 to 69 were offered biennial screening. Followup is ongoing, and to date results have been reported over two rounds of screening, with an additional nine months of followup to obtain the final diagnosis for cancers detected at the second screening round. Participants were women ages 45 to 69 who had participated in the regional screening program but had never received a DBT examination. Just over one-third (38%) of participants were ages 45 to 49 at the first screening round in both study arms and the mean age was 55. Breast density category distributions were similar, with 9 percent of women classified as having very dense breasts. In both study arms, two radiologists independently read the images and a third reader made the final judgment in cases of disagreement (usual screening program practice). Followup evaluations and final diagnosis results were obtained from screening program and cancer registry databases.

The To-Be study is a good-quality RCT conducted in Norway that randomized participants to DBT/sDM screening (n = 14,380) or DM screening (n = 14,369) and followed them for two years, or until the next screening episode.143 The second screening round consisted of DBT/sDM for all participants. Therefore, outcomes at the second screening round with DBT/sDM were compared between those originally screened with DBT/sDM (n = 11,201) and those originally screened with DM (n = 11,105). The study was conducted within the population-based BreastScreen Norway program, which offers all women ages 50 to 69 biennial mammogram screening. The mean age of study participants was 60 years, with 7 percent of women classified as having very dense breasts. In this program, independent dual reading with consensus is standard and prior mammograms, if available, are used to assist image reading. The first round of screening was conducted in January 2016 through December 2017 and the second screening in January 2018 through January 2020.

A fair-quality NRSI using a concurrent geographical comparison cohort design was conducted within the BreastScreen national screening program in Norway. The Oslo-Vestfold-Vestre Viken (OVVV) cohort was used to compare cancer screening outcomes from one round of screening with DBT/sDM (n = 37,185) or DM (n = 61,742) and a second round of DM for all attending the consecutive round of screening (n = 72,017).144 Individuals screened in Oslo received DBT at the baseline screening round and DM in the consecutive round, and in Vestfold and Vestre Viken DM screening was provided at both rounds. Those ages 50 to 69 years presenting to be screened in Oslo, Vestfold, and Vestre Viken from February 2014 to December 2015 were included in the cohort. In this program, biennial screening is provided, so the second screening visit (for those not diagnosed at baseline) occurred two years later. Those participating in BreastScreen were assigned to the baseline screening modality based solely on their county of residence and were not given an option to select the screening type. The mean age of study participants was 59; data on breast density were not reported. In the BreastScreen Norway program, independent double reading of mammography images with random pairs of breast radiologists are used to determine the mammography result.

A fair-quality NRSI by Sprague et al. used retrospective cohort data from 58 clinical sites from five BCSC registries (Carolina, Chicago, New Hampshire, San Francisco, Vermont) to compare DBT with DM-only screenings conducted within 30 months of a prior screening mammogram.168 All women ages 40 to 79 obtaining such DBT and DM screenings from the years 2011 to 2020 were included (n = 504,843). Thus, all included mammograms were subsequent screenings, and participants could contribute data from multiple consecutive screening rounds. Because all study participants were selected based on having a prior screening history, all screening results reported in the study represent subsequent screening visits, including those designated “round one.” Examinations were also excluded if supplemental screening with ultrasound (within 3 months) or MRI (within 12 months) was conducted. The screening modality was coded as DM only if there was no known prior history of DBT screening. Followup on screening outcomes was obtained using databases including SEER, BCSC, and state/regional tumor registries and examinations with less than one year of followup data for cancer diagnosis were excluded. Advanced statistical analyses were conducted to account for the clustered data structure (women within radiologists within facilities) and differences in breast cancer risk characteristics across modality and screening round. Adjustment for age, breast density, race or ethnicity, time since last mammogram, BCSC 5-year invasive cancer risk, benign breast disease history, first-degree family history of breast cancer, and examination year was conducted for all study estimates. The analytic sample included 1,531,608 examinations after exclusions for diagnostic evaluations and other factors. Screening using DBT was more common among non-Hispanic White women with a family history of breast cancer and individuals with an annual screening frequency. Overall, women receiving DBT examinations were at higher 5-year invasive breast cancer risk and those with multiple rounds of DBT screening were more likely to be at high risk than those with just one subsequent DBT screening round.

Outcomes
Screen-Detected Invasive Cancer and Prognostic Characteristics by Screening Round

Three trials randomized participants to DBT or DM at a first round of screening, followed by a second round of screening with either DM for everyone (Proteus Donna, RETomo) or DBT for everyone (To-Be). One NRSI with a concurrent geographic comparison design compared people receiving DBT/sDM or DM at a first screening round and DM for everyone in the second round (OVVV). A second NRSI used BCSC data to compare individuals screened with DBT versus those receiving DM over at least two screening rounds. The 2D image accompanying DBT (DM or sDM) was not reported (Table 8).

Table Icon

Table 8

Characteristics of Screen-Detected Invasive Cancers Diagnosed in Studies Comparing Digital Breast Tomosynthesis and Digital Mammography.

The three RCTs reported increased detection of invasive cancer with DBT at the first round of screening (pooled RR, 1.41 [95% CI, 1.20 to 1.64]; I2=7.6%; 3 trials; n = 129,492) and effects in the opposite direction, but not statistically different at second round screening (pooled RR, 0.87 [95% CI, 0.73 to 1.05]; I2=0%; 3 trials; n = 105,064) (Figure 3).129,143,160 Information on the characteristics of cancers detected at each screening round can help with indirect inferences about whether the additional or earlier cancer detection at the first round of screening would affect health outcomes. Two RCTs conducted in Italy reported detection of cancers stage II or higher and the same variable was obtained via author communication from the To-Be study. There was no difference within any of the studies in the detection of stage II or higher cancers at either round of screening, and results were inconsistent at round two, with one trial nearing statistical significance for more stage II+ cancers and the other two trials in the direction of reduced stage II+ cancer in the DBT arm.

Figure 3 shows a pooled analysis of screen-detected invasive cancers diagnosed in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 3

Pooled Analysis of Screen-Detected Invasive Cancers Diagnosed in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention group; RETomo=Reggio (more...)

In the Proteus Donna trial, the DBT study arm detected more invasive cancers during the first round of screening (7.3 versus 5.0 per 1,000 screened; RR, 1.46 [95% CI, 1.21 to 1.77]) and at the second round the detection of invasive cancers was not statistically different between arms (RR, 0.85 [95% CI, 0.64 to 1.13]).129 For the RETomo trial, detection of invasive cancer was higher for the DBT/DM study arm at round one (RR, 1.60 [95% CI, 1.16 to 2.22]) with a rate of 6.3 versus 3.9 per 1,000 screened.160 Detection at second round screening (all DM) did not differ by study arm (RR, 0.90 [95% CI, 0.62 to 1.30]) (Figure 3). There were no statistical differences in the characteristics of screen-detected cancers at either screening round, including cancers detected at stage II or higher, tumor size, histologic grade, or node status (Table 8, Figures 47).

Figure 4 shows the proportion of screen-detected invasive cancers diagnosed at stage II or higher in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 4

Proportion of Screen-Detected Invasive Cancers Diagnosed at Stage II or Higher in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention (more...)

Figure 5 shows the proportion of screen-detected invasive cancers diagnosed with tumor size greater than 20 mm in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 5

Proportion of Screen-Detected Invasive Cancers Diagnosed with Tumor Size >20 mm in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; (more...)

Figure 6 shows the proportion of screen-detected invasive cancers diagnosed as Grade 3 in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 6

Proportion of Screen-Detected Invasive Cancers Diagnosed as Grade 3 in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention group; (more...)

Figure 7 shows the proportion of screen-detected invasive cancers diagnosed as node positive in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 7

Proportion of Screen-Detected Invasive Cancers Diagnosed as Node Positive in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention (more...)

The To-Be RCT randomized people to DBT/sDM or DM in the first round of screening followed by DBT/sDM for all at the second round of screening.143 There was not a statistically significant difference between study arms in the detection of invasive cancer at the first round of screening using DBT or DM (5.6 versus 4.9 per 1,000 screened, respectively; RR, 1.13 [95% CI, 0.82 to 1.55]) or the subsequent screening round using DBT for all participants (6.9 versus 7.8 per 1,000 screened, respectively; RR, 0.88 [95% CI, 0.65 to 1.19]). No statistical differences in tumor stage, tumor size, histologic grade, or nodal status were seen for cancers detected in the DBT/sDM arm compared with the DM arm (Table 8, Figures 47).

The OVVV NRSI reported on a single round of screening with DBT/sDM followed by DM at the subsequent round of screening two years later compared with a concurrently screened group from another region that was screened with DM at both rounds. More invasive cancers were detected at the first round of screening for those in the DBT/sDM screened region (7.6 versus 5.3 per 1,000 screened; RR, 1.43 [95% CI, 1.22 to 1.67]). During the second round of screening, where all received DM, the incidence of screen-detected invasive cancer was lower in the arm that received DBT/sDM at the first round (3.2 versus 4.5 per 1,000 screened; RR, 0.71 [95% CI, 0.55 to 0.92]) compared with those who received DM at both screens. The study did not report cancer stage but reported some characteristics of the screen-detected invasive cancers. No statistical differences were identified between cancers detected by either arm, including tumor diameter, histologic grade, and node status (Table 8).

The BCSC NRSI by Sprague et al. reported screening outcomes by round, with round one reflecting a screening examination occurring within 30 months of a prior examination and round two providing screening results from an additional screening round with either DBT or DM, and so on. In adjusted analyses, the invasive cancer detection at round one with DBT was 4.8 per 1,000 (95% CI, 4.4 to 5.2) and with DM was 4.4 per 1,000 (95% CI, 3.8 to 5.2), and no statistical difference was observed (absolute difference, 0.4 per 1,000 [95% CI, −0.4 to 1.2]). In adjusted analyses, the invasive cancer detection at rounds one and two with DBT was not statistically different from DM (absolute difference, 0.4 per 1,000 [95% CI, −0.4 to 1.2] and 0.4 per 1,000 [95% CI, −0.2 to 0.9], respectively). At round three, invasive cancer detection with DBT was 3.9 per 1,000 (95% CI, 3.6 to 4.4) and with DM was 3.4 (95% CI, 3.1 to 3.8), and the difference was 0.6 per 1,000 more invasive cancers detected with DBT (95% CI, 0.2 to 1.1). The detection of advanced cancer was not statistically different for DBT compared with DM at round one (0.21 versus 0.26 per 1,000; difference of −0.5 per 1,000 [95% CI, −0.19 to 0.09]) or for rounds two and above (0.13 versus 0.20 per 1,000; difference of −0.6 per 1,000 [95% CI, −0.14 to 0.03]).

The number of examinations was highest at the first round (DBT, 207,280; DM, 355,944) and comparisons of less common outcomes combined examinations from rounds two and higher (DBT, 316,205; DM, 652,179). Characteristics of the two study groups diverged considerably at round three compared with round one in terms of the race and ethnicity composition of the women screened as well as the timing of screening and breast cancer risk (non-Hispanic White and more frequently screened women more heavily represented in later rounds). There were also differences in the composition of the screened population for both groups at higher rounds compared to the first round (e.g., higher BCSC 5-year risk at round three).

KQ2a. Does Comparative Effectiveness Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

None of the included studies were designed to enroll populations to support comparisons in the screening outcomes of DBT and DM by race, ethnicity, or family history.

Age- and breast density–stratified analysis of cancers detected at the second round of screening was reported in the RETomo RCT. As in the overall population, DBT resulted in a higher invasive cancer detection at the first round of screening for women ages 50 to 69 (RR, 1.60 [95% CI, 1.10 to 2.30]) and for women with nondense breasts (RR, 1.80 [95% CI, 1.10 to 3.00]), but at the next round of screening when all were screened with DM, there was not a statistically significant difference in invasive cancer detection. For women ages 45 to 49 and women with dense breasts, there was no statistical difference in the detection of invasive cancers at either round of screening (Table 9). No test for interaction was conducted for either the age- or density-stratified analyses and no information on the characteristics of the screen-detected tumors was provided.

Table Icon

Table 9

Incidence of Screen-Detected Invasive Cancers Diagnosed in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography, by Population Subgroup.

Density-stratified results were presented in the To-Be RCT. No statistical difference was seen for detection of invasive cancer using DBT or DM for any breast density subgroup at both round one and two of screening (Table 9).

Magnetic Resonance Imaging

No eligible studies were identified that reported on cancer detection and characteristics over multiple rounds of screening comparing usual care mammography with mammography plus supplemental MRI screening.

Ultrasound

No eligible studies were identified that reported on cancer detection and characteristics over multiple rounds of screening comparing usual care mammography with mammography plus supplemental ultrasound screening.

Personalized Screening Programs Using Risk Assessment

No eligible studies were identified that reported on cancer detection and characteristics over multiple rounds of screening comparing usual care mammography personalized screening programs using risk assessment.

KQ3. What Are the Comparative Harms of Different Breast Mammography-Based Cancer Screening Strategies?

Summary of Results

One NRSI with an emulated trial design used Medicare data to estimate the effects of screening beyond age 70 compared to stopping at ages 70 or 75.136 No difference was found in 8-year breast cancer mortality for screening beyond age 75 compared with stopping at that age. Cancers diagnosed in the stop screening strategy were more likely to receive aggressive treatment.

One RCT126 and four NRSIs140,153,154,159 reported potential harms of screening with respect to the screening interval. One RCT reported approximately 1 fewer interval cancer per 1,000 with annual screening compared with triennial screening. Data related to interval cancer risks were limited in the four NRSIs for comparisons of different screening periods.140,153,154,159 False-positive recall was more likely to occur with annual screening compared with longer intervals between screenings. The probability of false-positive recall and biopsy over 10 years of screening was higher with annual screening. The highest cumulative false-positive estimates occurred among young people with dense breasts screened annually.

Three large RCTs found no statistically significant difference in the rates of interval cancers following screening with DBT compared with DM (pooled RR, 0.87 [95% CI, 0.64 to 1.17]; I2=0%; 3 trials; n = 130,196) (Figure 10).129,143,160 Data on interval cancers from six NRSIs were mixed, and interpretation was limited by differences in study design. The effects of DBT screening on recall, false-positive recalls, and biopsy rates varied between trials and by screening round, with no or small statistical differences between study groups, not consistently favoring DBT or DM. The cumulative rates of false-positive recall and false-positive biopsy were slightly lower with DBT compared with DM screening, regardless of screening interval (cumulative probability over 10 years: 50% versus 56% for annual screening, 36% versus 38% with biennial screening). No statistically significant differences were seen in the trials related to DCIS detection or adverse events. Rates of radiation were approximately two times higher when DBT was performed in addition to DM; however, these increases were not present in two studies using DBT to generate synthetic DM images (DBT/sDM). Data on subgroups were limited, with all but one of the studies providing stratified results only, without tests for interaction.

Figure 8 shows the cumulative probability of false-positive recall in one nonrandomized study (Ho et al., 2022) using Breast Cancer Surveillance Consortium (BCSC) data comparing annual versus biennial screening with digital breast tomosynthesis or digital mammography.

Figure 8

Cumulative Probability of False-Positive Recall in One NRSI Using BCSC Data Comparing Annual Versus Biennial Screening with Digital Breast Tomosynthesis or Digital Mammography. Abbreviations: BCSC=Breast Cancer Surveillance Consortium; DBT=digital breast (more...)

Figure 9 shows the cumulative probability of false-positive biopsy in one nonrandomized study (Ho et al., 2022) using Breast Cancer Surveillance Consortium (BCSC) data comparing annual versus biennial screening with digital breast tomosynthesis or digital mammography.

Figure 9

Cumulative Probability of False-Positive Biopsy in One NRSI Using BCSC Data Comparing Annual Versus Biennial Screening with Digital Breast Tomosynthesis or Digital Mammography. Abbreviations: BCSC=Breast Cancer Surveillance Consortium; DBT=digital breast (more...)

Figure 10 shows a pooled analysis of interval cancers diagnosed at first round followup in trials comparing digital breast tomosynthesis and digital mammography.

Figure 10

Pooled Analysis of Interval Cancers Diagnosed in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention group; RETomo=Reggio Emilia (more...)

One RCT reported on the effects of an invitation to screening MRI for women ages 50 to 75 with extremely dense breasts following a negative mammogram.164 The risk of invasive interval cancer was reduced by approximately half (RR, 0.47 [95% CI, 0.29 to 0.77]) after the first invitation and prior to the next screening round (2 years). MRI resulted in additional recall, false-positive recall, and biopsy (95, 80, and 63 per 1,000 screened, respectively) that did not occur for the DM-only group. An NRSI analysis of U.S. insurance claims data found that health care use related to conditions that were not breast-related (a measure of possible incidental findings) was higher following screening with MRI compared with receiving mammography screening only.135

One RCT of women ages 40 to 49158 and one NRSI of BCSC data152 reported outcomes related to the potential harms of supplemental ultrasound screening. In the analyses comparing event rates presented in our review, there was not a statistically significant difference in interval cancer rates between study groups in either study. In the trial, additional recalls (48 per 1,000 screened) were experienced by those screened with ultrasound. In the BCSC analysis, referral to biopsy and false-positive biopsy results were twice as high for the group screened with ultrasound.

No eligible studies were identified that reported on the potential harms of personalized screening programs using risk assessment.

Detailed Results by Screening Intervention

Age to Start or Stop Screening

Study and Population Characteristics

One fair-quality NRSI (N = 1,058,013) analyzed data to emulate a trial of discontinuation of mammography screening between the ages of 70 and 84 compared with continued annual screening (described in detail for KQ1 above) (Table 4).136 Additional details on study design are available in Appendix E.

Outcomes
Overdiagnosis and Overtreatment

Overall, the 8-year cumulative risk of a breast cancer diagnosis was higher for the continued annual screening strategy after age 70 (5.5% overall; 5.3% ages 70 to 74, 5.8% ages 75 to 84) compared with the stop screening strategy (3.9% overall; same proportion for both age groups) (Table 10). Lumpectomy and radiotherapy were more common for cancers diagnosed in the continued annual screening strategy compared with those who stopped screening after age 70, whereas mastectomy and chemotherapy were more common for cancers diagnosed in those who discontinued screening after age 70 (Table 10). Overall, because fewer cancers were diagnosed under the stop screening strategy (ages 70 to 84), there was a lower risk of undergoing followup and treatment. For those ages 75 to 84, additional diagnoses did not contribute to a difference in the risk of breast cancer mortality.

Table Icon

Table 10

Overdiagnosis and Overtreatment in Studies of Age to Stop Screening in an Emulated Trial, by Population Subgroup.

KQ3a. Do Comparative Harms Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

No studies of ages to start or stop screening presented data that would allow for testing of effect differences or stratification of results by different population characteristics or risk markers.

Screening Interval

Study and Population Characteristics

Three of the studies included to address potential harms of different screening intervals have been described previously in KQs 1 and 2.126,154,159 Two additional studies examined the potential cumulative harms across multiple rounds of screening (Table 4). One analysis of 2005 to 2018 data from the BCSC estimates the cumulative probability of a false-positive result after 10 years of screening with DM or DBT.140 The second additional study was the Know Your Risk: Assessment at Screening (KYRAS) study that calculated the cumulative risk of false-positive screens over a median of 8.9 years at Columbia University Medical Center.153

Demographic characteristics were not commonly reported in the studies of screening interval (Table 5). The BCSC study population reported by Miglioretti was primarily White (78%), with the remaining participants reported as Black (5%), Asian (5%), Hispanic (5%), AI/AN (<1%), and 7% reported as “other” or unknown. In the KYRAS study, the population was majority Hispanic (76%), with the remaining reported as White (10%), Black (10%), or other (4%) including Asian, Pacific Islander, Native American, or Alaska Native. Twenty-four percent of the non-Hispanic White women were of Ashkenazi Jewish descent. Additional details on study design are available in Appendix G.

Outcomes
Interval Cancers

Three studies presented data on interval cancers by participant screening interval with mixed findings (Table 11). The UKCCCR RCT reported the rate of interval cancers was significantly lower in the annual invitation group (1.84 per 1,000 women initially screened) than in the triennial invitation group (2.70 per 1,000 women initially screened) (RR, 0.68 [95% CI, 0.50 to 0.92]). The Parvinen et al. quasi-randomized study found that similar numbers of cases were reported in the annual screening and triennial screening groups, and a statistical test for the difference was null (p=0.22). The Miglioretti et al. BCSC NRSI found that 22.2 percent of cancers diagnosed following an annual screening interval were interval cancers compared with 27.2 percent of cancers proceeded by a biennial interval. However, the study did not provide adjusted comparisons, limiting the ability to draw inferences about differences in the interval cancer rate associated with biennial and annual screening from this study.

Table Icon

Table 11

Interval Cancer Rates in Studies Comparing Breast Cancer Screening Frequencies.

False-Positive Recall

Based on two studies, false-positive recall was more likely to occur with annual screening compared with longer intervals. A NRSI of BCSC data by Ho estimated the 10-year cumulative probability of at least one false-positive recall was 49.6 percent for those screened annually and 35.7 percent for those screened biennially (proportion difference, −13.9% [95% CI, −14.9% to −12.8%]). The difference in cumulative false-positive recall between annual and biennial screening was larger for DM (−18.2 [95% CI, −18.6 to −17.7]) (Figure 8, Appendix F Tables 3 and 4). In the KYRAS study, individuals screened with DM annually had 2.18 times the odds of having a false-positive result compared with those who screened biennially (odds ratio, 2.18 [95% CI, 1.70 to 2.80]) after controlling for total years of followup, age, race and ethnicity, BMI, breast density, and breast cancer risk status (Appendix F Table 4).

False-Positive Biopsy

The comparative NRSI from Ho used data from the BCSC and found biennial screening compared with annual screening led to a 5 percent lower 10-year cumulative false-positive biopsy rate whether the screening was conducted with DBT or DM (Figure 9, Appendix F Tables 3 and 4). For individuals screened with DBT, the estimated cumulative probability of at least one false-positive biopsy recommendation was 11.2% for those screened annually and 6.6% for those screened biennially (proportion difference, −4.6% [95% CI, −5.2 % to −3.9%]). For individuals screened with DM, the difference was similar (proportion difference, −5.0% [95% CI, −5.4% to −4.7%]).

KQ3a. Do Comparative Harms Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

The Ho et al. BCSC NRSI reported 10-year cumulative false-positive and biopsy rates by age and breast density category. Annual screening was associated with higher cumulative false-positive recall and biopsy for most age and density groups (Figures 8 and 9). There was not a strong association between age and cumulative false-positive biopsy regardless of the screening interval among those with the lowest breast density (Figures 8 and 9, Appendix F Tables 5 and 6).

Digital Breast Tomosynthesis

Study and Population Characteristics

We identified 10 eligible studies, four RCTs (three good-quality, one fair-quality),129,139,143,160 and seven fair-quality NRSIs79,132,140,144,147,162,168 that reported on potential harms of screening associated with the use of DBT (plus DM or sDM) compared to DM-only screening (Tables 4 and 5). Four large trials were conducted with individuals participating in organized screening programs in Germany, Italy, and Norway. Three of these trials were previously discussed in KQ2.129,143,160 One additional RCT was identified that addresses the potential harms of screening with DBT compared with DM. The TOmosynthesis plus SYnthesised MAmmography Study (TOSYMA) is a good-quality RCT conducted in Germany that assigned 99,634 women ages 50 to 69 to DBT/sDM (DBT with synthetic two-view imaging) versus DM alone between July 5, 2018, and December 30, 2020. Available results from the trial report on performance at a single round of screening and for this review was included only for rare or uncommonly reported harms (adverse events, radiation exposure).139 The seven NRSIs included for KQ3 were conducted using data from populations screened with DBT and DM in the United States,132,140,147,162,168 Sweden,79 and Norway144 (Tables 4 and 5). Additional details on study design and results are available in Appendix G.

Outcomes
Interval Cancers

Three trials reported interval cancers following screening with DBT or DM (Table 12).129,143,160 The three RCTs did not show statistically significant differences in the risk of interval cancer following screening with DBT or DM (pooled RR, 0.87 [95% CI, 0.64 to 1.17]; I2=0%; 3 trials; n = 130,196) (Figure 10). Six observational studies used data from medical systems, registries, and cancer screening and surveillance programs to compare interval cancers occurring after screening with DBT or DM (Table 12). Four of the NRSIs found no significant difference in the rate of interval cancers diagnosed following screening with DBT or DM (including data from the BCSC, PROSPR consortium, and the OVVV comparative cohort study),132,144,147,168 while one found a slight increased risk with DBT screening162 and one found an unadjusted decreased risk with DBT screening.79 These studies differed in the timeline of followup and method of identifying interval cancers (Appendix E Table 1), highlighting the variability in interval cancer definitions and data used to assess the outcome across the included NRSIs and the need for more standardization of definitions and study protocols.

Table Icon

Table 12

Rates of Interval Cancers (Invasive Cancer and DCIS) in Studies Comparing Breast Cancer Screening Modalities, per 1,000 Screened.

Recall

The same three RCTs and two NRSIs included for KQ2 reporting data across multiple rounds of screening were also included to assess screening recall rates and false-positive recalls (Tables 13 and 14). In the trials, results for recall and false-positive recall were mixed across the first round of screening, and statistical heterogeneity was high, so a pooled effect is not presented. The studies varied in their approaches to screening at round two: two RCTs used DM screening for both study groups (Proteus Donna, RETomo) and one used DBT for both study groups (To-Be) at round two. Results for round two were more consistent and did not suggest a difference in recall rates or false-positive recalls between study groups when combined using meta-analysis (Figures 11 and 12).

Table Icon

Table 13

Followup of Abnormal Screening in Studies Comparing Digital Breast Tomosynthesis and Digital Mammography.

Table Icon

Table 14

False Positives in Studies Comparing Digital Breast Tomosynthesis and Digital Mammography.

Figure 11 shows a pooled analysis of recall rates reported in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 11

Pooled Analysis of Recall Rates Reported in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention group; RETomo=Reggio Emilia Tomosynthesis (more...)

Figure 12 shows a pooled analysis of false-positive recalls reported in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 12

Pooled Analysis of False-Positive Recalls Reported in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DM=digital mammography; IG=intervention group; RETomo=Reggio (more...)

The fair-quality BSCS NRSI by Sprague et al. reported lower recall and lower false-positive recall at round one and round two screening with DBT compared with DM (Tables 13 and 14). The difference in false-positive recall at round one was 34 fewer per 1,000 examinations with DBT versus DM (95% CI, 22 to 47 fewer) and at round two was 18 per 1,000 fewer with DBT (95% CI, 7 to 30 fewer). False-positive recall was not statistically lower with DBT versus DM among those included for round three comparisons (−11 per 1,000 [95% CI, 23 fewer to 2 additional]).

The fair-quality NRSI OVVV study did not report statistically significant difference in recall rates between the DBT and DM arms at round one (Tables 13 and 14). At round two, when both groups received DM, false-positive recall rates were lower in the group previously screened with DBT compared with the DM group (20 versus 25 per 1,000).

Biopsy and Surgical Followup

Two of the included RCTs reported on the rate of referral to biopsy143,160 and two reported on referral to surgery following screening129,160 (Table 13). At round one, when the trials compared screening with DBT and DM, there were mixed results, with one trial finding a significantly higher rate of referral to biopsy with DBT and another trial finding no difference in referral to biopsy or false-positive biopsy rates. Two trials found that the referral to surgery was higher among those screened with DBT. The RETomo RCT reported higher referrals to surgical followup, including open biopsy, following DBT screening (8.7 versus 5.0 per 1,000; RR, 1.70 [95% CI, 1.3 to 2.30]). Findings from the Proteus Donna RCT were similarly higher for surgical referrals following DBT/DM (9.9 versus 6.4 per 1,000; RR, 1.54 [95% CI, 1.31 to 1.82]).

The trials screened both study groups with an identical modality at the second round, and effects should be interpreted as findings from screening following previous round of screening with DBT or DM. Overall, no significant difference between arms was found for rates of biopsy at round two. The Proteus Donna trial129 found a lower risk of surgical referrals among those originally screened with DBT (4.3 versus 5.7 per 1,000; RR, 0.76 [95% CI, 0.59 to 0.97]); however, this finding was not confirmed by RETomo, where screening was with DM in both study groups at round two (5.3 versus 6.4 per 1,000; RR, 0.83 [95% CI, 0.60 to 1.10]).

One fair-quality BCSC NRSI by Sprague et al.168 reported slightly lower biopsy and false-positive biopsy at round one with DBT compared with DM (Tables 13 and 14). The difference in false-positive biopsy at round one was 3 fewer per 1,000 screening examinations with DBT (95% CI, 2 to 5 fewer). False-positive biopsy rates were similar for DBT and DM for those included for two or more screening rounds.

Cumulative False-Positive Recall and Biopsy

The comparative BCSC NRSI from Ho et al. reported the estimated cumulative probability of having at least one false-positive recall and biopsy over 10 years of screening with DBT or DM on an annual or biennial basis (Figures 8 and 9, Appendix F Tables 3 and 4). Probabilities were mostly lower with DBT screening compared with DM screening, regardless of the screening interval, but the difference was greater with annual screening. With annual screening, the 10-year cumulative probability of a false-positive recall was 49.6% with DBT and 56.3% with DM (difference, −6.7% [95% CI, −7.4 to −6.1]). The 10-year cumulative probability of a false-positive biopsy was 11.2% with DBT and 11.7% with DM (difference, −0.5 [95% CI, −1.0 to −0.1]). With biennial screening, the 10-year cumulative probability was 35.7% for DBT and 38.1% for DM (difference, −2.4% [95% CI, −3.4 to −1.5]) and the 10-year cumulative probability of a false-positive biopsy was 6.6% for DBT and 6.7% for DM (difference, −0.1% [95% CI, −0.5 to 0.4]).

Overdetection and Overtreatment

In the three RCTs, rates of DCIS detected at each screening round and between study arms were similar, ranging from 0.7 to 1.3 per 1,000 screened at the first screening round and from 0.6 to 1.3 per 1,000 screened at the second screening round, with no statistical differences between the DBT and DM screened groups (Table 15). Meta-analysis was used to generate combined estimates that also did not show statistically significant differences at round one (pooled RR, 1.33 [95% CI, 0.92 to 1.93]; I2=0%; 3 RCTs; n = 130,196) or round 2 (pooled RR, 0.75 [95% CI, 0.49 to 1.14]; I2=0%; 3 RCTs; n = 130,196) (Figure 14). The OVVV NRSI reported higher DCIS detection at the first screening round in the DBT group compared with the DM group (1.8 versus 0.8 per 1,000 screened; RR, 2.16 [95% CI, 1.49 to 3.12]).

Table Icon

Table 15

Screen-Detected DCIS Diagnosed in Studies Comparing Digital Breast Tomosynthesis and Digital Mammography.

Figure 13 shows the cumulative probability of false-positive recall or biopsy in one nonrandomized study (Ho et al., 2022) using Breast Cancer Surveillance Consortium (BCSC) data comparing annual versus biennial screening with digital breast tomosynthesis or digital mammography among women with extremely dense breasts.

Figure 13

Cumulative Probability of False-Positive Recall or Biopsy in One NRSI Using BCSC Data Comparing Annual Versus Biennial Screening with Digital Breast Tomosynthesis or Digital Mammography, Among Women With Extremely Dense Breasts. Abbreviations: BCSC=Breast (more...)

Figure 14 shows a pooled analysis of ductal carcinoma in situ (DCIS) diganosed in the first versus second round of screening in trials comparing digital breast tomosynthesis and digital mammography.

Figure 14

Pooled Analysis of DCIS Diagnosed in Trials Comparing Digital Breast Tomosynthesis and Digital Mammography. Abbreviations: CG=control group; DBT=digital breast tomosynthesis; DCIS=ductal carcinoma in situ; DM=digital mammography; IG=intervention group; (more...)

Adverse Events

The TOSYMA RCT reported on adverse events from a single round of screening using DBT/sDM compared with DM only.139 The study randomized 49,804 individuals to DBT/sDM and 49,830 to DM. Six adverse events were reported in each study arm, with none categorized as serious.

Radiation Exposure

Five studies (four RCTs, one NRSI) reported the median, mean, or relative radiation dose by study arms from a single screening round (Table 16). In three of these studies, participants underwent a DBT and DM screening (in one or two compressions) and in two studies, participants underwent DBT with a synthetic reconstruction of a 2D DM image.139,143 Studies using DBT/DM screening reported radiation exposure approximately two times higher in the intervention group compared with the DM-only control group.79,129,160 Differences between study groups in radiation exposure were smaller in studies using DBT/sDM. The TOSYMA RCT reported median glandular radiation dose in the DBT/sDM group was 1.86 mGy (interquartile range, 1.48 to 2.45) and in the DM group was 1.36 mGy (interquartile range, 1.02 to 1.85).139 In the To-Be RCT, which also used DBT/sDM, the mean radiation dose was 2.96 mGy compared with 2.95 mGy in the DM group.128

Table Icon

Table 16

Radiation Exposure in Studies Comparing Digital Breast Tomosynthesis and Digital Mammography.

KQ3a. Do Comparative Harms Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

None of the included studies were designed to enroll populations to support comparisons in the screening outcomes of DBT and DM by race, ethnicity, or family history. Two RCTs143,160 and four NRSIs79,140,147,162 that compared DBT-based screening strategies with DM-only screening strategies presented results stratified by age and/or breast density. Most studies did not report interaction tests and were not designed to test these subgroup comparisons, making it difficult to draw conclusions about differences by age and/or breast density.

Age

The RETomo RCT reported the effects of DBT/sDM versus DM on recall, biopsy, and surgical procedures stratified by age category (45 to 49 versus 50 to 69) (Tables 17 and 18). Overall, these stratified results suggest some risk of increased biopsy or surgery with DBT screening at the first round for all, followed by lower rates at the next round for those ages 45 to 49. One trial160 and two NRSIs79,162 reported no significant findings related to the relationship between age and interval cancer outcomes (Table 19). Two of these studies did not report interaction tests, making it difficult to draw conclusions about differences by age group.

Table Icon

Table 17

Followup of Abnormal Screening in Randomized Trials Comparing Digital Breast Tomosynthesis and Digital Mammography, by Population Subgroup.

Table Icon

Table 18

False Positives in Randomized Trials Comparing Digital Breast Tomosynthesis and Digital Mammography, by Population Subgroup.

Table Icon

Table 19

Rates of Interval Cancers (Invasive Cancers and DCIS) in Studies Comparing Breast Cancer Screening Modalities, by Population Subgroup.

Breast Density

The To-Be trial reported recall and biopsy stratified by Volpara density grade categories (VDG1–VDG4). There was lower recall at the first screening round for those screened with DBT who had lower density breasts (VDG1 and VDG2) but not for those with higher density breasts (VDG3 and VDG4) (Table 17). Two trials143,160 and one analysis of BCSC data147 found no statistically significant differences in the incidence of interval cancer for the breast density–stratified comparisons (Table 19).

The To-Be RCT reported mean radiation doses for the study groups, stratified by breast density in a figure. The study reported that there were no statistically significant differences in radiation dose for DBT/sDM compared with DM for any of the density categories.

Age and Breast Density Subgroups

The Ho et al. BCSC NRSI presented 10-year cumulative false-positive recall and biopsy probabilities stratified by breast density and age, comparing DBT to DM screening. Overall, the study reported lower false-positive recall with DBT screening. In stratified analyses, however, there was not a statistical difference in cumulative false-positive recall or biopsy among those with extremely dense breasts in any age group (Figure 13).

Magnetic Resonance Imaging

Study and Population Characteristics

The Dense Tissue and Early Breast Neoplasm Screening (DENSE) trial is a good-quality RCT conducted in The Netherlands that enrolled participants from December 2011 to November 2015 (N = 40,373) (Table 4). The aim of the study was to determine whether an invitation to supplemental MRI screening after a negative mammogram for those ages 50 to 75 with extremely dense breast tissue would reduce the incidence of interval cancer. The baseline characteristics of the study groups were balanced on the reported characteristics (Table 5). Among those invited to MRI screening, 59 percent underwent the MRI examination (n = 4,783). While this study included two rounds of screening with MRI, findings from the second round of screening in the mammography-only arm have not been published. Therefore, this study was not eligible for inclusion in KQ2, but it is included for interval cancers and potential harms of supplemental MRI imaging.

A fair-quality NRSI compared commercially insured women ages 40 to 64 in the MarketScan database who had received at least one bilateral screening breast MRI (n = 9,208) or mammogram (n = 9,208) between January 2017 and June 2018 (Tables 4 and 5). Propensity score matching was used to compare cascade events (mammary and extramammary) in the six months following the MRI or mammogram that were potentially attributable to having a breast MRI. Additional details on study design and results are available in Appendix G.

Outcomes
Interval Cancers

In the DENSE RCT, the intention-to-treat analysis based on invitation to MRI screening found a rate of invasive interval cancers for the DM+MRI of 2.2 per 1,000 invited to screening compared with 4.7 per 1,000 screened for the DM-only control group (RR, 0.47 [95% CI, 0.29 to 0.77]) (Table 12).

Adverse Events

In the DENSE RCT, eight adverse events (including five classified as serious adverse events) occurred during or immediately after the MRI screening. Adverse events included two vasovagal reactions and three allergic reactions to the contrast agent (serious adverse events) as well as two reports of extravasation (leaking) of the contrast agents and one shoulder subluxation. Twenty-seven individuals (0.6% of MRI arm) reported a serious adverse event within 30 days of the MRI.

Downstream Consequences of Supplemental Imaging, Including Incidental Findings

In the first round of the DENSE trial, the rate of recall among those who underwent additional imaging with MRI was 94.9 per 1,000 screens and the false-positive rate was 79.8 per 1,000 screened. The rate of biopsy for those undergoing supplemental MRI was 62.7 per 1,000 screened (Table 20). Among the cancers diagnosed by MRI, over 90 percent were classified as DCIS (stage 0) or stage 1 cancer. Without information for two rounds of screening from both arms of the study, there is not sufficient information to weigh the relative benefit versus harms of these diagnoses and downstream imaging consequences.

Table Icon

Table 20

Downstream Consequences of Supplemental Screening With MRI or Ultrasound.

In the U.S. insurance claims NRSI, individuals who had an MRI compared to those receiving only a mammogram were more likely in the subsequent six months to have additional cascade events (adjusted difference between groups, 19.6 per 1,000 screened [95% CI, 8.6 to 30.7]) and were mostly comprised of additional health care visits. (Table 20).

KQ3a. Do Comparative Harms Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

No studies of supplemental MRI screening presented data that would allow for testing of effect differences or stratification of results by different population characteristics or risk markers.

Ultrasound

Study and Population Characteristics

The Japan Strategic Anti-cancer Randomized Trial (J-START) is a fair-quality RCT that randomly assigned asymptomatic women ages 40 to 49 in 23 prefectures in Japan to breast cancer screening with mammography plus handheld ultrasound (DM/US) (n = 36,859) or mammography only (DM) (n = 36,139) over two rounds of annual screening during 2007 to 2011 (Table 4).158 The two study groups were balanced across a range of characteristics (Table 5). The authors note that 58 percent of women were classified as having dense breasts. Only one round of screening has been reported; therefore, this study was not eligible for inclusion in KQ2, but it is discussed here for interval cancers and potential harms related to supplemental ultrasound imaging.

An NRSI by Lee et al. reported results of an analysis using data from two BCSC registries to compare screening outcomes for individuals receiving ultrasonography on the same day as a screening mammogram (DM/US) (n = 3,386, contributing 6,081 screens) compared with those who received only a mammogram (DM) (n = 15,176, contributing 30,062 screens) (Tables 4 and 5, see Appendix E for detailed methods).152 The majority of individuals included in the study were White (accounting for 80% of the screening examinations) and represent a higher-risk population, with a significant proportion of examinations among those with a first-degree family history of breast cancer or previous breast biopsy. Additional details on study design and results are available in Appendix G.

Outcomes
Interval Cancers

The interval cancer rates reported were not statistically significantly different in the J-START RCT when comparing the DM with ultrasound versus DM-only groups (RR, 0.58 [95% CI, 0.31 to 1.08]). The published results from the trial were population-average effects that included DCIS and statistical adjustments for the clustered data structure. The result presented is a calculated individual-level intervention effect for invasive interval cancer without adjustment for clustering based on the reported event rates. Adjustment for clustering would result in a greater imprecision since it would statistically compensate for the correlated variances with wider confidence intervals. In the NRSI using BCSC data, the confidence interval was wide and not statistically significant (adjusted relative risk [aRR], 0.67 [95% CI, 0.33 to 1.37]) (Table 12).

Downstream Consequences of Supplemental Imaging

The rate of recall based only on ultrasound was 49.7 per 1,000 in the ultrasound arm, and 48.0 per 1,000 had a false-positive recall (Table 20). Of those cancers identified only by ultrasound, 76.2 percent were classified as stage 0 or 1 cancer. Without information for two rounds of screening from both arms of the study, there is not sufficient information to weigh the relative benefit versus harms of these diagnoses and downstream imaging consequences. In the BCSC analysis, the rates of referral to biopsy and false-positive biopsy recommendations were twice as high and short interval followup were three times as high for the group screened with ultrasound (Table 20).

KQ3a. Do Comparative Harms Differ by Population Characteristics and Risk Markers (e.g., Age, Breast Density, Race and Ethnicity, Family History)?

A secondary analysis of J-START reported results for trial participants from a single screening center in one Japanese prefecture (Miyagi) to compare interval cancer rates for DM/US and DM screening among women ages 40 to 49.138 Analyses stratified by breast density did not show a statistically significant difference in interval cancer rates for any density category (Table 19). The rates of recall based only on ultrasound were 69.7 per 1,000 (95% CI, 63.3 to 76.6) among those with dense breasts and 39.4 per 1,000 (95% CI, 33.5 to 46.0) among those without dense breasts.

Personalized Screening Programs Using Risk Assessment

No eligible studies were identified that reported on the potential harms of screening comparing usual care mammography personalized screening programs using risk assessment.

Image appbf1

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.1M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...