U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Henderson JT, Webber EM, Weyrich M, et al. Screening for Breast Cancer: A Comparative Effectiveness Review for the U.S. Preventive Services Task Force [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2024 Apr. (Evidence Synthesis, No. 231.)

Cover of Screening for Breast Cancer: A Comparative Effectiveness Review for the U.S. Preventive Services Task Force

Screening for Breast Cancer: A Comparative Effectiveness Review for the U.S. Preventive Services Task Force [Internet].

Show details

Chapter 4Discussion

Overall Summary of Evidence

We conducted this review to inform the USPSTF update to its recommendation on breast cancer screening. The 2016 review updated the evidence on screening effectiveness and provided emerging evidence on comparative effectiveness questions related to DBT and supplemental screening modalities.76,95 The evidence included in this review includes comparative effectiveness studies only because the evidence on mammography screening effectiveness has been reviewed and updated numerous times over the past two decades as large trials of mammography screening were completed. Based on the trials’ findings of a mortality benefit for women ages 50 to 69 (Appendix A), new trials comparing screening versus no screening are unlikely except in groups where there is equipoise or unclear evidence of a benefit.

The results of this review are summarized in Table 21, with different comparisons separately considered within each KQ. We included 20 studies that met the review eligibility criteria (two included in the previous review126,159) and compared active screening interventions against comparisons that differed by the timing, frequency, or modality of screening. Eligible studies using more recent registry data were included when available rather than earlier studies from the previous review.

Table Icon

Table 21

Summary of Evidence.

While breast cancer screening is an active area of research, few longitudinal trials of screening have been conducted since the original effectiveness trials were completed. We included six new randomized trials in the review,129,139,143,158,160,164 including four comparing DBT with DM screening129,139,143,160 and two on supplemental screening compared with mammography only.158,164 Three of these trials are ongoing139,158,164 and have only reported preliminary results, and three are completed.129,143,160 Nonrandomized observational studies were also included; however, few followed a screening population over time to compare different screening approaches. Risk of bias due to confounding and selection in nonrandomized, nonexperimental studies limits the confidence in their findings.

For KQ1, two studies compared mortality outcomes for different screening strategies: one nonrandomized study of the age to stop screening136 and one older RCT comparing annual with triennial screening.159 For KQ2, seven studies were included that reported invasive cancer detection outcomes from more than a single round of screening. Breast cancer outcomes must be assessed over a minimum of two rounds of screening to determine whether a screening approach leads to a shift toward detection at an earlier cancer stage. Studies from a single round of screening are subject to lead time bias. Two studies comparing different screening intervals (biennial or triennial versus annual) and four studies comparing mammography with DBT versus DM met this requirement. For KQ3, 20 studies provided data related to the potential relative harms of different screening strategies, including supplemental screening. No studies compared screening strategies by population characteristics and risk markers for any of the KQs, although two relevant RCTs are ongoing, with estimated completion dates in 2025.170,171

Overall, evidence of the relative effectiveness or harms of different breast cancer screening strategies was limited. The completion of ongoing trials will add to this evidence base in the future.

Age to Start or Stop Screening

No randomized trials that assigned individuals to different ages to start or stop screening were identified for inclusion in this review. A nonrandomized study (N = 1,058,013) based on data from Medicare B enrollees ages 70 to 84 suggested that continuing screening beyond age 75 did not reduce breast cancer mortality compared with stopping screening (aHR, 0.78 [95% CI, 0.63 to 0.95]).126 The study used novel statistical methods to approximate a per-protocol trial effect estimate from observational data.172 The study did not present subgroup comparisons to identify specific groups that might benefit from continued screening beyond age 75. In terms of potential harms, fewer breast cancers were diagnosed among those who stopped screening, which could indicate overdiagnosis with continued screening given the similar mortality rates in those ages 75 to 84 who continued versus stopped screening or reflect short-term followup in the study (8 years). Cancers detected in those who continued screening were more likely to be treated with lumpectomy and radiotherapy than mastectomy and chemotherapy.

Screening Interval or Frequency

Two older studies that compared triennial with annual screening did not find evidence of a mortality benefit with more frequent screening. Specifically, one nonrandomized experiment that assigned participants in the Finnish national screening program to annual or triennial screening did not find a difference in breast cancer mortality (RR, 1.14 [95% CI, 0.59 to 1.27]) or all-cause mortality (RR, 1.20 [95% CI, 0.99 to 1. 46]).159 An RCT of annual versus triennial screening from a similar time period conducted in the United Kingdom reported more screen-detected invasive cancers over multiple rounds of screening, but no difference in invasive cancers overall (including interval cancer) or their prognostic features.126 These studies were limited in terms of potential risk of bias related to randomization and the applicability of the studies to the current U.S. screening population because of the study periods and settings.

No studies comparing annual to biennial screening reported breast cancer mortality or other health outcomes. Intermediate outcomes (KQ2) were reported in one nonrandomized study using BCSC data to compare the progression of tumors diagnosed following an annual or biennial screening interval.154 The study indicated no difference between annual and biennial screening by decade of age in the adjusted risk for cancer diagnosed at stage IIB+ or with less favorable prognostic characteristics (stage IIB or higher, tumor size >15 mm, or positive node status).

Harms related to screening intervals were evaluated in two nonrandomized studies using BCSC data140 and a health system data source with a majority Hispanic population153 that provided estimates of cumulative false-positive recall and false-positive biopsy rates. Annual screening resulted in more false-positive recall and biopsy than biennial screening, estimated to be twice as high in one study (odds ratio, 2.2 [95% CI, 1.7 to 2.8]). The most recent analysis of BCSC data showed that at least 50 to 56 percent of women screened annually over 10 years would have at least one false-positive recall and approximately 12% would have at least one false-positive biopsy. Among those screened biennially, 36 to 38 percent would experience at least one false-positive recall and 7 percent at least one false-positive biopsy. Annual screening would thereby result in approximately 50 more false-positive biopsies per 1,000 women screened over a 10-year period. These estimates update previous BCSC analyses and account for the more recent increased use of DBT screening; their findings of higher rates of false positives with annual screening are consistent with those in the previous review.145,173

Studies included for comparisons of annual and biennial screening were more applicable to the U.S. screening population but were not randomized and subject to considerable risk of bias due to confounding and selection.

Our review did not identify any updated information on effect of screening interval on the lifetime impact of radiation. The 2016 review included information from models which calculated the number of deaths due to radiation-induced cancer using estimates for DM is between 2 per 100,000 in women ages 50 to 59 years screened biennially, and up to 11 per 100,000 in women ages 40 to 59 years screened annually.89

Digital Breast Tomosynthesis Screening

No studies of breast cancer screening with DBT compared with DM reported mortality outcomes. Three RCTs (N = 130,195)129,143,160 and two NRSIs (N = 597,267),144,168 all but one NRSI conducted in Europe, reported cancer detection outcomes from at least two rounds of screening. In the trials, the DBT screening intervention group received sDM imaging in two of the trials (synthetic views equivalent to DM) and DM imaging in one trial. The second round of screening was conducted after a biennial interval for most participants. The modality of screening was the same for all participants during the second round of the trial (either DM or DBT/sDM). Some trialists have proposed that a common modality at round two is necessary for accurately determining whether stage shift is present, which would suggest that the screening intervention at the first round identified clinically significant cancers that would have otherwise progressed. Similarities in the study designs and effect sizes and low statistical heterogeneity supported the estimation of pooled effects for some outcomes.

A potential benefit of a more sensitive breast cancer screening imaging technology is that it might detect small, clinically important tumors before they progress to advanced disease. Results from the trials were inconclusive as to whether the added first round of detection with DBT would reduce the incidence of advanced cancers, and thereby improve health outcomes. In three trials comparing screening with DBT versus DM, DBT was associated with increased detection in two of the three trials at the first screening round (pooled RR, 1.41 [95% CI, 1.2 to 1.6]; I2=8%; 3 trials; n = 129,492), but in none of the trials at the second screening round (pooled RR, 0.87 [95% CI, 0.7 to 1.1]; I2=0%; 3 trials; n = 105,244). Tumor characteristics and prognostic characteristics were inconsistently reported or had heterogenous effects across the studies, precluding meta-analysis of most outcomes related to breast cancer stage at detection. Overall, there was not statistically significant evidence of stage shift in the individual studies or for outcomes with sufficiently consistent data for meta-analysis. The trials primarily reported dichotomous outcomes to categorize early versus advanced disease, most commonly using stage >II or tumor size greater than 20 mm as cutpoints, which may not be sensitive or meaningful for identifying differences in cancer detection that could be clinically important.

The absence of changes in the distribution of tumor characteristics or stage at detection at round two could also be interpreted to mean that the additional detection with DBT at round one would have little to no effect on health outcomes, such as breast cancer morbidity and mortality. If the increased detection was comprised of more indolent cancers with longer sojourn times, the time of diagnosis may be shifted earlier without a change in mortality risk. A nonrandomized study using BCSC data that included over a million screening examinations conducted at U.S. clinical sites reported results consistent with the trial evidence from Europe that did not find differences in the incidence of advanced cancer at subsequent screening rounds with DBT. This supports the generalizability of the trial evidence to the U.S. setting. Screen-detected advanced cancer is a relatively rare outcome, however, resulting in somewhat imprecise comparisons for this outcome even with evidence from large trials and nonrandomized studies.

Studies describing interval cancer results were evaluated as potential harms in this review because they are due to either false-negative screening (a harm arising from low sensitivity) or missed cancers that progressed to clinical significance during the gap between screenings. The same European RCTs (n = 130,196) and six nonrandomized comparison studies (N = 5,832,513) were included for assessing the risk of interval cancer associated with DBT screening compared with DM only. Several studies have documented differences in the tumor characteristics of interval and screen-detected cancers and worse prognosis; therefore, a screening program that reduced the risk of interval cancers could be more effective for prevention of mortality from breast cancer. The three large RCTs found no statistically significant difference in the rates of interval cancers following screening with DBT compared with DM. The data on interval cancers from the six NRSIs were mixed, and interpretation was limited by differences in study design. Combined with the similar cancer detection results for DBT and DM, the findings on interval cancer additionally suggest similar screening effectiveness for the two technologies based on the available evidence.

Overdiagnosis and overdetection are important potential harms of screening. The 2016 breast cancer screening effectiveness review for the USPSTF reviewed a broad literature including the effectiveness trial evidence and modeling studies and found overdiagnosis rates ranging from 11 to 22 percent in trials and 1 to 10 percent in observational studies.174 These outcomes are difficult to estimate even in the setting of large effectiveness trials because of differences in definitions and data collection. Rates of DCIS are considered one measure of overdiagnosis in screening studies because DCIS is generally treated but has unclear malignant potential. The three trials with multiple screening rounds did not show statistically significant differences in DCIS detection in meta-analysis, although this outcome is only one theorized source of potential overdetection that could lead to overtreatment.

Additional harms include rates of recall for additional imaging, false-positive recall, and false-positive biopsy; however, these were inconsistent across studies comparing DBT with DM. An included study using BCSC data estimated the 10-year probability of at least one false-positive recall to be slightly lower with DBT screening when screening was conducted annually; however, rates were high for both groups, with 50 percent screened with DBT and 56 percent screened with DM experiencing at least one false-positive recall with 10 years of screening. Limited evidence from other studies on less commonly reported harms included adverse events associated with screening, which were rare, and radiation exposure. In studies using DM with DBT, radiation exposure was twofold higher than what was received in the DM group, but in two studies using DBT with synthesized DM images created from the DBT scan, the dose was similar between study groups.

Current studies with more than one screening round do not provide evidence that DBT has an advantage over DM by detecting cancer at earlier stages. Breast cancer includes a range of disease features, including both indolent or slow growing tumors and rapidly progressive disease that may have a short window for detection before metastatic disease develops. A tumor stage shift could contribute to improved health outcomes, if observed, but imprecise estimation and inconsistencies in the few studies reporting detection and tumor characteristics outcomes limit conclusions. These limitations increase uncertainty about the effect of small improvements in test performance on health outcomes.

Overall, the included studies indicated no or minor differences between DBT and DM screening in effectiveness and potential harms. Small improvements in false-positive recall observed for DBT in initial or early screening rounds may dissipate over longer time horizons, suggesting the importance of evidence on cumulative effects of different screening programs over the lifetime. Very few randomized trials have completed more than a single round of screening and neither RCTs nor nonrandomized studies reported morbidity or mortality outcomes important for estimating the health consequences or potential overdiagnosis associated with different screening programs.

Test Performance Characteristics of Digital Breast Tomosynthesis

A large volume of evidence on DBT comes from single-round test performance studies, including paired design studies that report the detection yield for readings on the same person with DBT/DM versus DM only. The literature on the test performance of screening tests can be helpful for the evaluation of new technologies and their potential contribution to a screening program. Three systematic reviews (including randomized trials, prospective cohorts, and diagnostic accuracy studies) reported pooled estimates of positive predictive value (PPV) (i.e., percent diagnosed with cancer among those with a positive mammogram result) and false-positive recalls (i.e., proportion recalled that were not diagnosed with cancer) among average-risk women screened with DBT or DBT/sDM versus DM.175177 Overall, the reviews included relatively few eligible studies (4 to 13), with fewer available for meta-analysis of most outcomes. Statistical heterogeneity was also high for most analyses, raising questions about the validity of the pooled estimates. Results of the reviews were mixed, but small increases in PPV with DBT/sDM or DM compared with DM were reported.

A 2020 review of 10 studies (three randomized trials, one prospective cohort study, and six diagnostic accuracy studies) estimated a difference in PPV (invasive breast cancer and DCIS combined) between participants screened with DBT/sDM versus DM (pooled RR, 1.26 [95% CI, 1.09 to 1.46]; I2=52%; 6 trials; n = 213,927 screening recalls).177 A 2022 individual participant data meta-analysis including four prospective studies found that PPV (invasive breast cancer and DCIS combined) improved with DBT compared to DM (pooled RR, 1.31 [95% CI, 1.07 to 1.61]; I2=70%; n = 7,274 screening recalls).176 The 2020 meta-analysis showed no difference in false-positive recalls (invasive breast cancer and DCIS combined) between women screened with DBT versus DM (RR, 1.06 [95% CI, 0.85 to 1.32]; I2=85%; 6 trials; n = 96,970 screening examinations) or between DBT/sDM versus DM (RR, 1.02 [95% CI, 0.85 to 1.23]; I2=90%; 6 trials; n = 213,927).177 A 2022 systematic review of 13 studies (one RCT, 12 observational cohorts) also reported meta-analyses with very high statistical heterogeneity that suggested improved PPV and recall with DBT/sDM. It was unclear whether the results were for invasive cancer detection or invasive cancer and DCIS detection.175

Data from the BCSC can provide estimates of screening performance from data on U.S. populations screened in select breast cancer care systems that contribute to the registry. A 2020 publication by Lowry et al. used 2010 to 2018 data from five BCSC registries to assess the performance of DM (1,273,492 screening examinations) versus DBT mammography (310,587 screening examinations) among women ages 40 to 79 years. Improvements in cancer detection and recall with DBT were observed at baseline screening (prevalence screen) across all age groups. At subsequent screening visits (incidence screens), screening performance improvements were not uniform. Only women with heterogeneously dense breasts and women ages 50 to 79 with scattered fibroglandular breast density had reduced recall relative to cancers detected. Younger women with extremely dense breasts experienced higher recall with DBT at subsequent screens and no improvement in cancer detection. The main analyses presented were adjusted for a range of demographic and breast cancer risk characteristics, but the observational design cannot fully account for differences in the reasons women may have received DBT screening; risk of bias from selection into the study groups and potential unmeasured confounding remain even after statistical adjustments.178

Supplemental Screening With Ultrasound or Magnetic Resonance Imaging

No studies comparing women screened with mammography only with those receiving supplemental MRI screening reported health outcomes or evidence of reduced progression to advanced cancer in subsequent screening rounds. Harms were reported in one RCT (N = 40,373) that found fewer interval cancers diagnosed in the two years following the first round of screening among a group with dense breasts invited to MRI after a negative screening mammogram result (2.2 per 1,000) compared to those with dense breasts who did not receive the invitation (4.7 per 1,000) in the intention-to-treat analysis (RR, 0.47 [95% CI, 0.29 to 0.77]). The reduction in interval cancers serves as an intermediate outcome, suggesting potential benefit, but the likelihood and magnitude of differences in breast cancer morbidity and mortality outcomes are not yet known. While this study was designed to consist of three MRI screening rounds, second-round results for both study groups have not been published.164

Harms from MRI screening identified in the review included additional recalls and biopsies from the supplemental imaging. The acceptability of screening was also limited in the trial that randomized participants with dense breasts to an invitation for MRI after having a mammography screen with negative findings. Forty percent randomized to the MRI invitation did not present for screening. Data from a nonrandomized study using insurance claims data (N = 18,416) estimated compared cascade events (mammary and extramammary) in the six months following screening and did not find a difference between those screened with MRI or mammography.

One randomized trial conducted in Japan (N = 72,717) was designed to estimate the effectiveness of DM plus ultrasound screening compared with DM only for women ages 40 to 49, since this group tends to have higher breast density. The study has published results from the first round of screening with followup for interval cancers, and second-round findings are currently being analyzed for future publication (personal communication). There was not a statistically significant difference in interval cancers following first-round screening in this trial (0.4 versus 0.8 per 1,000 screened) based on the event rates reported (unadjusted), but the estimate was imprecise. There was also no difference in a nonrandomized study by Lee et al. using data from two BCSC registries with propensity score matching to adjust comparisons for confounding and selection bias (1.5 versus 1.9 per 1,000 screened).152 These studies also reported additional followup testing attributed to ultrasound screening. The Japanese trial found 48 per 1,000 additional false-positive screens from ultrasonography. The BCSC study reported false-positive biopsy rates that were more than twice as high in the group with supplemental ultrasound compared with having only a mammogram (52.0 versus 22.2 per 1,000 screened). The BCSC analysis also did not report statistically significant differences in detection or sensitivity with supplemental ultrasound screening compared to DM, but these outcomes were not included in our review since the study did not report results compared across multiple screening rounds.

Differences in detection of cancer with supplemental screening in addition to mammography have been reported in studies that were not eligible for our review for the reasons outlined above (e.g., paired-studies where individuals serve as their own control through blinded readings). Two recent systematic reviews included individual paired-study designs not eligible for this systematic evidence review. These reviews reported pooled estimates of sensitivity and specificity for women with dense breasts receiving supplemental screening with MRI or ultrasound.

A 2022 systematic review of 42 studies that included a wide range of study designs and settings reported on the performance of various supplemental breast cancer screening modalities for women with dense breasts.179 Test performance characteristics were estimated primarily based on observational studies using sequential testing where participants served as their own controls. For supplemental screening with handheld ultrasound, meta-analysis of nine studies estimated 86 percent sensitivity (95% CI, 77 to 92) and 87% specificity (95% CI, 75 to 93) for diagnosed breast cancer, with low statistical heterogeneity (I2=0.09%; 9 trials; n = 42,242). Test performance results for MRI supplemental screening were limited and inconsistent and were therefore not summarized using meta-analysis.

Overall, the review concluded that supplemental screening with handheld ultrasound or MRI could increase cancer detection by 2 to 3 per 1,000 women with dense breasts but would also substantially increase recall by 73 to 134 per 1,000 screens and biopsy by 33 to 73 per 1,000 screens among women without cancer. The authors noted the lack of studies reporting breast cancer mortality outcomes or intermediate outcomes that could be used to assess the health impact of the additional cancers detected.179 A 2020 meta-analysis estimated higher pooled sensitivity and specificity for supplemental screening with ultrasound compared with DM alone among women with dense breasts, but the pooled result was based on five studies exhibiting very high statistical heterogeneity.180 A more broadly scoped 2020 systematic review that was not limited to studies of women with dense breasts included 12 studies evaluating supplemental ultrasound test performance following a negative mammogram.181 Sensitivity estimates ranged from 0.62 to 1.00 and specificity from 0.69 to 0.98; the pooled estimate for cancer detection was 3.0 additional cancers per 1,000 women screened (95% CI, 1.8 to 4.6), with high statistical heterogeneity (I2=85.1%). Across the seven studies reporting cancer type, 73.9 percent of the cases detected were invasive cancers (70.9% node-negative invasive cancer). Additional recall and biopsy with supplemental screening were estimated to be 8.7 per 1,000 and 3.9 per 1,000 in pooled analysis (again with very high statistical heterogeneity [I2>95%]). The review authors noted limitations in the available literature on test performance for ultrasound supplemental screening, and the importance of studies to evaluate the longer-term consequences of this screening approach in terms of possible health benefits and risks.181 Without comparative studies, performance studies cannot determine whether cancers detected would progress and pose morbidity and mortality risks. A population program of supplemental ultrasound screening would also lead to more women in the screened population needing followup testing and biopsy, including among women without cancer.

No systematic reviews were identified reporting pooled estimates of the positive predictive value or false-positive recalls for women receiving supplemental MRI or ultrasound.

The previous review of supplemental screening noted the shortcomings of test performance data on this topic for establishing the clinical net benefits of screening programs.95 Comparative studies that report health outcomes are important for establishing whether supplemental or breast cancer screening tests lead to improved health outcomes or contribute to false positives, overdiagnosis, and unnecessary treatments.

Screening in Different Population Subgroups

No studies evaluated potential differences in screening effectiveness and harms for population subgroups using valid rigorous methods. Subgroup comparisons were not adequately powered or assessed with statistical tests for interaction, but instead were based on presentation of stratified results, primarily by age, breast density, breast cancer risk, and less commonly, by hormonal status. There were some consistent trends that were present in the evidence from subgroup analyses, but limitations in the study designs and analyses hindered the strength of findings (Appendix F Table 7). In general, breast density and younger ages were associated with higher false-positive results with screening. However, the absence of interaction tests, lack of correction for multiple comparisons, and the possibility of unmeasured confounding that can introduce bias in observational comparisons precluded conclusions. Evidence from BCSC and other registry studies generally showed findings consistent with the broader literature.140,147,154

No comparative effectiveness studies reported differences in estimates by race or ethnicity. Nearly all of the included studies were conducted in majority non-Hispanic White populations and were not powered with adequate numbers of Black, Hispanic, Asian, or AI/AN women for meaningful comparisons.

Inequities in Breast Cancer Incidence and Outcomes (CQ1)

A pronounced inequity in breast cancer mortality in the United States is seen among non-Hispanic Black women compared with all other people. Although the incidence of breast cancer among Black women overall is not as high when compared with non-Hispanic White women, breast cancer mortality is 40 percent higher for Black women (27.6 per 100,000 compared with 19.7 per 100,000 for White women) based on the most recent U.S. surveillance data (2016–2020).14 Relative risks of mortality when accounting for the age and stage at diagnosis have been estimated to be 71 percent higher for non-Hispanic Black women and 28 percent higher for AI/AN women compared with non-Hispanic White women.182 Mortality from breast cancer was similar between Black and White women before the 1980s, after which mortality rates abruptly diverged. The introduction of mammography screening and new treatment interventions, particularly adjuvant endocrine therapy, around the same time suggest that health care inequities underlie the emergence of the disparity and its persistence.14,20

Currently, most research on health inequities compares non-Hispanic Black women to non-Hispanic White women. Many of the issues outlined below may similarly affect care and outcomes for other populations in the United States, although some inequities may result from causal pathways unique to specific populations. For example, there are longstanding and substantial inequities in breast cancer survival for populations living in rural areas of the United States.183

The National Institute of Minority Health and Disparities (NIMHD) framework184 was developed to guide research investigating health disparities and is helpful for examining sources of inequities in breast cancer survival, particularly higher mortality for Black women. The framework recognizes the role of the health care system,185 the sociocultural environment, the built environment, behavioral factors, and genetic factors that contribute to health inequities. Inequities in breast cancer mortality can be examined at each step along the cancer screening, diagnosis, treatment, and survival pathway with these factors in mind.186 The higher mortality rate seen for Black women diagnosed with breast cancer in the United States aligns with other health inequities that are attributed to the effects of structural racism, which results in inequalities in resources and exposures, including disparities in access to high quality health care.187189 For example, worse breast cancer survival has been associated with racialized residential segregation that has been driven by historical and ongoing discriminatory housing policies.190193 Racialized and classist segregation has also been associated with exposure to cancer risk from toxic environments in terms of air pollution, industrial waste, built-environments that do not support health, and stressful life conditions.190,192 Although interrelated factors contribute to inequities in breast cancer mortality, the primary focus in this report is on structural, systemic, and individual factors related to health care that are in the USPSTF purview.

Research is ongoing to disentangle the factors that may contribute to the observed higher rates of cancer subtypes with worse prognoses among Black women, who are more likely to present with advanced cancer compared with non-Hispanic White women.20,193, 194 Based on national SEER surveillance estimates (2016–2020), breast cancers having a HR-negative molecular marker are more common among non-Hispanic Black women compared with White women (30.6 versus 17.4 per 100,000). The higher incidence of negative HR status leads to worse outcomes since these subtypes are less readily detected through screening and less responsive to adjuvant endocrine therapy.195 Triple-negative cancers (i.e., ER-, PR-, HER2-) are also more likely to be diagnosed at younger ages and among Black women (24.1 per 100,000) compared with White women (12.4 per 100,000) based on data from 2015 to 2019. These cancers tend to be particularly aggressive and more likely to be diagnosed at later stages than other subtypes. Sub-Saharan African ancestry may contribute a genetic component to this difference, but HR-negative cancers have decreased for all racial and ethnic groups in the United States, and variability in rates of decline by region suggests a more complex etiology.196 Observed regional differences in the incidence of HR-negative cancer within and between racial groups suggest that environmental and social determinants of health may contribute to the risk of developing HR-negative cancer.20,196 Although differences in the incidence of different cancer subtypes explain some of the differences in breast cancer mortality (estimated 56%), race differences in mortality within subtypes point to barriers to obtaining high quality health care and disparities in screening followup and treatment initiation.20

Differences in recent trends in breast cancer incidence are difficult to attribute to specific factors due to the complex interactions of structural and environmental conditions, health care, and individual health mediated processes that can be associated with cancer detection and diagnosis. Breast cancer incidence trends show slight increases from 2005 to 2019 for non-Hispanic Black women and non-Hispanic White women ages 50 to 74 (0.9 and 0.4 average annual percent change, respectively) and similar increases among those ages 40 to 49 (0.6 average annual percent change for both groups).16 Other race and ethnicity groups have experienced steeper increases in incidence since 2015. Average percent increases in incidence were higher and similar among Asian/Pacific Islander women (2.0 average annual percent change [AAPC]) and Hispanic women (1.7 AAPC) ages 50 to 74. Incidence among AI/AN women has also risen by at least 1.7 percent on average each year, but the trend is not precisely estimated for all age groups (ages 40 to 64, 1.7 AAPC; ages 50 to 74, 6.1 AAPC [p=0.14]; ages 75+, 1.8 AAPC). At younger ages, 40 to 49, increasing trends have been steepest among Asian/Pacific Islander women (4.0 AAPC), followed by AI/AN women (2.0 AAPC) and Hispanic women (1.6 AAPC).16 Overall, however, among women below age 40, Black women have the highest breast cancer incidence (27.6 per 100,000 women).10

Structural and contextual factors affect the well-being, health, and resources (e.g., financial, health literacy) of individuals when they enter the health care system, and factor into their experiences obtaining care.197 The next sections focus on inequities that accumulate along the health care pathway that contribute to mortality disparities, drawing on a conceptual framework presented by Nelson et al. for a systematic review on interventions to address inequities in preventive health services.197

Inequities in Access to Screening

Despite having a higher rate of breast cancer mortality, non-Hispanic Black women report the highest rates of mammography screening. Based on self-reported Behavioral Risk Factor Surveillance System data from 2020, 78 percent of all women ages 50 to 74 reported having a mammography in the past two years. For non-Hispanic Black women, the rate was 84.5 percent, followed by Hispanic women (79.8%), Native Hawaiian/Pacific Islander women (79.7%), Hispanic women (79.8%), non-Hispanic White women (77.8%), and AI/AN women (68.7%). Non-Hispanic Black women also reported higher levels of screening than non-Hispanic White women from ages 40 to 44 (60% versus 54%) and ages 45 to 49 (76% versus 68%).198 Self-report data from the 2015 and 2018 National Health Interview Survey indicate lower, but similar, rates of breast cancer screening for non-Hispanic Black and non-Hispanic White women (72.9% and 71.7%, respectively).199

Although evidence remains unclear regarding the relative benefit of DBT compared with DM screening, adoption of DBT occurred most rapidly in regions with proportionally larger non-Hispanic White populations.200 In addition, even as the availability of DBT increased, Black, Asian, and Hispanic women remained less likely to be screened with DBT compared with White women. Analysis of data from the BCSC indicates that when both technologies were available at the screening site, over half of White women (53%), and smaller percentages of Black (38%), Hispanic (44%), and Asian women (43%) were screened with DBT.201 Out of pocket costs often required for DBT screening may contribute to these differences, as well as inequities in the geographic distribution of health resources and clinician behaviors.202204

Although there are not currently recommendations for supplemental screening in the general screening population, barriers to access for individuals at increased risk of breast cancer could contribute to mortality risks. Uneven access to supplemental screening modalities (e.g., MRI, ultrasound) has been documented in the United States and is most likely to impact American Indian women and those living in rural areas.205

Inequities in Diagnostic Followup and Access to Evidence-Based Cancer Treatments

Health outcome benefits from mammography screening require initiation and completion of appropriate and effective followup and treatment. Microsimulation modeling and other population-based studies have suggested that treatment advances have had a greater impact on reducing breast cancer mortality than screening.36 These advances have been most pronounced for HR-positive cancer subtypes. Delays and inadequacies in the diagnostic and treatment pathway likely contribute to increased mortality relative to those receiving prompt, effective care.197

Disparities in followup after screening have been observed for Black, Hispanic, and Asian women compared with White women.186,206213 Interventions to address delays in followup of abnormal screening results, treatment initiation, and treatment completion, especially for Black women for whom delays and reduced access to timely care are most pronounced, could address disparities in the care pathway following a positive screening mammogram. The use of navigators, shown to improve cancer screening rates, deserves investigation for potential effects on reducing inequities in followup and treatment.186

Adjuvant endocrine therapy reduces the risk of cancer recurrence among individuals with HR-positive cancers by up to 30 percent, but long-term adherence can be difficult. Adherence has been associated with factors such as health literacy, comorbidities, depression, cognitive function, and social support, as well as the types of side effects experienced with therapy.214 Black women are more likely to discontinue adjuvant endocrine therapy compared with White women, in part due to greater physical symptom (vasomotor, musculoskeletal, cardiorespiratory) and psychological symptom (distress, despair) burdens and owing to structural and contextual factors such as neighborhood and community resources and supports.215,216 Improved symptom management and social support could improve adherence and help reduce cancer outcome inequities. Improvements in access to effective health care, removal of financial barriers, and use of support services for followup and treatment of breast cancer could reduce mortality risks for individuals experiencing disparities related to their race or ethnicity, rural location, low income, or other factors associated with lower breast cancer survival.

Additional Findings From Original Effectiveness Trials (CQ2)

A detailed overview of the findings of the original effectiveness trials of mammography screening from the 2016 evidence review can be found in Appendix A. These trials include the Canadian Breast Cancer Screening Studies (CNBSS-1 and CNBSS-2), the United Kingdom Age trial, and four trials from Sweden, including the Stockholm trial, Malmö Mammographic Screening Trial (referred to separately as MMST I and MMST II), Gothenburg (Göteburg) trial, and Swedish Two-County Study (referred to separately as Östergötland and Kopparberg).111 We conducted a literature scan that identified updated estimates of effectiveness for four of the trials reporting on extended followup.77,78,217

A single 2017 publication presented an updated analysis of mammography effectiveness from a series of Swedish screening trials (the Malmö [MMST I and MMST II], Stockholm, and Göteburg [Gothenburg] trials) with over 20 years of followup data (30, 22, 25, and 24 years, respectively).217 These analyses focus on the difference in breast cancer mortality between screening and control groups among women with breast cancers diagnosed between randomization and completion of the first screening round of the control group (time varied by trial from 4.3 to 12.4 years). The previous review classified these analyses as using the “short case accrual method” (sometimes referred to as the “evaluation method” in trial publications). This method of analysis reduces the risk of contamination in the control group after the screening phase of a trial is completed but includes fewer cases in the analysis. Overall, the combined results from the Swedish trials retained the originally reported statistically significant effect of screening. The updated estimate from these three trials showed a 15 percent relative reduction in breast cancer mortality for women ages 40 to 74 years (RR, 0.85 [95% CI, 0.73 to 0.98]). When the age-stratified results were compared with the study-specific estimates for short case accrual from the previous review, the point estimates were similar, although confidence intervals included 1.0 for all age groups: ages 40 to 49 years at randomization (RR, 0.79 [95% CI, 0.62 to 1.0]), ages 50 to 59 years at randomization (RR, 0.89 [95% CI, 0.71 to 1.1)], and ages 60 to 70 years at randomization (RR, 0.73 [95% CI, 0.58 to 1.2]).

The U.K. Age trial of mammography effectiveness among women ages 40 to 49 years published final results incorporating nearly 23 years of participant followup data.77,78 In addition to the short case accrual method, utilized in the Swedish trials, the U.K. Age trial also presented results using the long case accrual method, which counts all breast cancer cases contributing to breast cancer deaths diagnosed over the course of the screening intervention period and the followup period. The long accrual method is considered least biased because it accounts for lead time and detection bias inherent in studies of cancer mortality.

The U.K. Age trial recruited women ages 39 to 41 years for random assignment to yearly screening up to and including the calendar year that they reached age 48 years (intervention group), or to usual care that included no screening until entering the National Health Service Breast Screening Program (NHSBSP) at approximately 50 years of age (control group). The primary endpoint was mortality from breast cancer diagnosed in the intervention period for both groups (all breast cancer diagnosed after randomization but before first NHSBSP invitation).

Based on a median of 22.8 years of followup, the final primary analysis showed no statistically significant difference in breast cancer mortality from starting screening at ages 39 to 41 (RR, 0.88 [95% CI, 0.74 to 1.03]). An analysis based on long-term case accrual also resulted in no statistically significant impact on breast cancer mortality (RR, 0.90 [95% CI, 0.79 to 1.03]) or all-cause mortality (RR, 1.01 [95% CI, 0.96 to 1.05]). In addition to the protocol specified primary analyses, the publication provided findings from several secondary post-hoc analyses stratified by followup periods. These analyses suggested a reduction in breast cancer mortality when followup was limited to the first 10 years of the trial (RR, 0.75 [95% CI, 0.58 to 0.97]), but no differences with followup from 10 years postrandomization and beyond or overall. These stratified analyses were not prespecified for the trial and use different definitions of the intervention period than previous analyses from the trial.

New publications reporting long-term outcomes are consistent with findings summarized in the 2016 evidence review. Results of nine RCTs individually and collectively indicate no statistically significant reduction in breast cancer mortality for women screened at ages 40 to 49. Breast cancer mortality is reduced in trials of women ages 50 to 69, although results of individual trials are mixed, and the magnitude of effect is small. Results for women ages 70 to 74 are inconclusive because few women in this age group were enrolled in the screening trials. Application of these findings to current practice remains questionable, although few other preventive health services offer trials of effectiveness with mortality outcomes, and clinical practice assumes benefits of screening regardless of the trial limitations.

Risk Assessment Tools to Personalize Breast Cancer Screening (CQ3)

Models estimating risk for breast cancer generally include common clinical risk factors, such as age, age at menarche, age at birth of first child, number of first-degree relatives with breast cancer, and number of previous breast biopsies. Additional variables differ between models including race, BMI, breast density, menopause status, use of hormone therapy, additional family histories, and others. Risk factors are categorized and weighted differently in each model. While all models published to date include age and number of first-degree relatives with breast cancer in their calculations, they vary in their complexity. These include the Gail,66 Claus,67 and Breast Cancer Surveillance Consortium (BCSC v2) models.218

A systematic review for the USPSTF published in 2019 included 25 studies of the diagnostic accuracy of 18 risk assessment methods to predict risk for breast cancer based on data from more than 5 million women.48 The most studied methods include the Gail model and its variations, including versions specific to Black and Asian women and versions that include breast density. Studies also evaluated four versions of the BCSC model; two versions of the Rosner-Colditz model, two versions of the Tyrer-Cuzick model; a model based on data from Italian women; the Chlebowski model; and a model to predict ER-positive and ER-negative breast cancer.

Results of studies indicated modest discriminatory accuracy in predicting incidence of breast cancer in individual women, with area under the curve (AUC) values ranging from 0.55 to 0.65.48 Studies of models specific to Black or Asian women showed similar results.219221 These values are generally considered too low for clinical applications, although they have been used as entry criteria and for risk stratification in research studies. The only study reporting AUC values above 0.70 for both the Gail-2 model (AUC, 0.74) and the Tyrer-Cuzick model (AUC, 0.76) was small and did not include a primary care population, limiting its clinical applicability.222 Studies also indicated that adding variables, such as breast density, race, or BMI to simple models had little effect on improving accuracy.223,224 Performance characteristics of individual models varied when applied across different validation samples.48 In studies where multiple models were validated in the same population, different models predicted different results.225,226

Diagnostic accuracy studies published since the 2019 systematic review further confirm the limited accuracy of risk models. These include studies of the BCSC (AUC, 0.63 to 0.68)194,227; Gail (AUC, 0.59)228; International Breast Cancer Intervention Study (IBIS) (AUC, 0.66)229; and Breast and Ovarian Analysis of Disease Incidence and Carrier Estimation Algorithm (BOADICEA) (AUC, 0.56)229 risk models; and a new model incorporating family history of breast cancer, proliferative benign breast disease, and previous breast calcifications (AUC, 0.587 to 0.647).230 A study comparing the performance of commonly used models in predicting breast cancer risk among 35,921 women ages 40 to 84 years in a U.S. community screening population indicated AUC values of 0.61 for BRCAPRO, 0.64 for Gail, 0.64 for BCSC, and 0.62 for the Tyrer-Cuzick models.231 These values are consistent with previous validation studies.

Additional new models include approaches or technologies not currently clinically available. These include a model with image-derived risk factors combined with clinical risk factors (AUC, 0.67)232; an image-based model (AUC, 0.73)233; a model using artificial intelligence and thermal radiomics (AUC, 0.89)234; models enhanced by machine learning (AUC, 0.88 to 0.90)235; mammography-based deep learning models (AUC, 0.79 to 0.81)236; and models incorporating single-nucleotide polymorphisms.237,238 However, these models have not been applied to routine population screening and require further validation.

There is not yet evidence from trials of screening informed by validated risk assessment instruments to tailor screening initiation, intervals, or modalities. Such evidence would be important to inform changes to clinical practice. Risk estimation from genome-wide association studies is also being used to develop polygenic risk scores that may be used to further personalize screening; however, the clinical utility of these models is unknown.74

Patient Perspectives on Balance of Benefits and Harms of Screening (CQ4)

Few studies directly examine how persons at average risk for breast cancer and eligible for routine screening incorporate personal preferences into their decisions about screening for breast cancer. Informed decisions about whether to undergo mammography may draw upon a range of factors including values, cultural influences, personal experience, and risk factors, and awareness of screening benefits and harms. A 2019 systematic review of 22 studies239 cited logistical challenges, psychological distress associated with the screening process, fear of a positive result, embarrassment, and not receiving services that align with cultural and/or religious beliefs as factors that influenced screening use. A 2022 systematic review including 66 studies focused on a broad set of individual social and structural factors that could influence screening use and access.240 Several social and structural determinants of health were associated with reduced screening attendance in the review, including lacking access to a vehicle, living in crowded housing conditions, living further from a screening center, and being unemployed. Very few included studies in these reviews were conducted in the United States, however, limiting generalizability and comparisons between different U.S. population groups.

Experiencing screening harms, such as a false-positive result, has been identified as a potential deterrent to screening.240 A 2021 systematic review that pooled six NRSIs conducted in very high Human Development Index settings (none in the United States) estimated reduced return to subsequent screening among women who had been previously screened and then experienced a false-positive result (pooled odds ratio, 0.77 [95% CI, 0.68 to 0.88]). Older research on the phenomenon had mixed findings.241 A more recent study conducted in a Chicago area health system (n = 261,767) found that women receiving a false-positive screening result were less likely to return for their next scheduled mammogram (22.1% versus 15.0%; aHR, 0.74; p<0.001) compared with women who had experienced a true-negative result; findings were more pronounced when a biopsy was conducted (aHR, 0.66; p<0.001). Women who experienced a false-positive result were more likely to be non-Hispanic Black, younger, premenopausal, or to have dense breasts.242

Little is known about patient awareness of the possible benefits and harms of mammographic screening and how it may differ among different patient populations. A small nationally representative survey of U.S. women ages 30 to 59 (n = 557)243 found that most women were familiar with the benefits of mammography (>85%); fewer were aware of some of the potential harms. Nearly three-quarters were aware of psychological anxiety and the risk of false-positive results, but less than one-third were aware of the possibility of overdiagnosis. Personal preferences vary with respect to communication about the benefits and harms of mammography. A series of in-depth interviews (n = 58)244 with older Latina, non-Hispanic White, and non-Hispanic Black women found that most participants (regardless of age, race and ethnicity, or education) preferred to hear about the benefits and harms of mammographic screening, including information about overdiagnosis, when deciding whether to continue screening beyond the age of 75 years. Highlighting the personal nature of such preferences, however, the study found that some participants preferred being encouraged to continue screening without discussion of possible harms, and some of the participants felt it was important to avoid discussing the prospect of overdiagnosis with older women, as it might deter them from getting mammograms.

To improve breast cancer screening programs, develop decision aids, and inform screening implementation, studies have investigated how mammography screening is perceived by various groups. Qualitative studies and survey research provide evidence on factors that influence how people weigh their personal risks and the potential benefits and harms of screening, which factor into the decision to be screened. For instance, Ro et al. conducted a series of interviews with Asian (n = 4), Black (n = 4), Hispanic (n = 7), and non-Hispanic White (n = 10) patients at average risk of developing breast cancer.245 Most reported having an annual mammography schedule, primarily influenced by their physician’s recommendation, and several described receiving automatic annual reminders. Other factors that influenced their decisions included having a family history of breast cancer (n = 9), an interest in early detection (n = 5), and age (n = 5). For some, biennial screening intervals were considered acceptable if recommended by their physician and when they did not consider themselves to be at high risk for breast cancer. Others considered two years to be too long between screening visits regardless of physician recommendations or risk status. The study also identified a theme related to confusion about screening due to conflicting and frequently changing guidelines. Similar themes were noted in other qualitative studies regarding confusion about screening recommendations and desires for additional information to help make more informed decisions.203,246,247

Decision aids have been developed to help inform the shared decision-making process, which may help address some of the confusion described by patients in qualitative studies. A 2021 systematic review by Esmaeli et al.248 reviewed 16 unique breast cancer screening decision aids that had been developed and tested in the United States, Australia, Germany, Spain, United Kingdom, France, Taiwan, Italy, and The Netherlands. The review found that the decision aids improved patient knowledge and decreased decisional conflict but had little or no influence on mammography participation rates, attitude, perceived risk of breast cancer, or anticipated regret.

Knowledge of inequities in breast cancer risks, mortality, and in access to treatment could influence some individuals’ preferences for and decisions about breast cancer screening. Unfortunately, relatively few studies focused on the populations that experience inequities in screening access and breast cancer morality, such as non-Hispanic Black women, recent immigrants, and people living in rural settings. A focus group study (n = 39) including Black and Latina participants described diverse perspectives on breast, colon, and cervical cancer screening.247 Some participants had strong, positive feelings towards preventive screenings, considering them a critical tool for staying healthy. Others were more inclined to wait until a health issue became visible or problematic before seeking care, citing cultural norms, cost barriers, or personal history of challenges with affording care. Women who were born outside the United States described feeling less acquainted with preventive health care, such as screening, as it was less likely to be offered or considered a cultural norm in the country in which they were raised. Trust in health care systems was also influenced by personal experiences with culturally insensitive or incompetent care, or awareness of historical concerns involving medical maltreatment (e.g., U.S. Public Health Service Syphilis Study at Tuskegee and forced sterilization in the early to mid-20th century). These factors can serve as barriers to receiving preventive screenings; however, having a positive, trusting relationship with a physician that encourages screening was described as helpful in rebuilding trust.

Perceptions and awareness of personal breast cancer risk can inform decisions about mammographic screening. In a study with Black and Latina focus group participants, lack of knowledge of family medical history served as a challenge in assessing individual risk for cancer.247 In a focus group study203 with Asian American (n = 3), non-Hispanic Black (n = 8), Hispanic (n = 2), and non-Hispanic White (n = 30) women with dense breasts, many were not aware that they were identified as having dense breasts, and almost none were aware that having dense breasts was an independent risk factor for breast cancer. Some study participants were receiving supplemental screening (such as ultrasound or MRI) in addition to DM but described having little knowledge of any specific benefit or possible harms of additional screening.

More research is needed to better understand whether individuals are aware of the benefits and harms when determining whether or when to pursue breast cancer screening. It is particularly important to understand how race, ethnicity, gender identity, and cultural influences shape these decisions, to better inform shared decision-making practices and provide culturally competent care.

Breast Cancer and DCIS Treatment Harms (CQ5)

Breast cancer treatment regimens are highly individualized according to each patient’s clinical status, cancer stage, tumor biomarkers, clinical subtype, and personal preferences, and vary in terms of potential side effects and morbidity.33 For individuals with early stage cancer (stage 1, IIA, and some stage IIB cancers), treatment generally involves lumpectomy with radiotherapy or mastectomy with or without radiotherapy.249 Depending on patient and tumor characteristics, adjuvant systemic therapy may be used to reduce the risk of recurrence. Locally advanced cancers (stage IIB and stage IIIA to IIIC disease) will generally receive neoadjuvant systemic therapy prior to surgery, with some cases receiving additional adjuvant therapy following surgical treatment.249 Most patients with metastatic breast cancer receive systemic medical therapy along with supportive care measures.250

Complications following breast surgery include seroma formation, infection, pain, and arm morbidity (either directly attributable to the surgery or through a combination of surgery and adjuvant radiation).251 The risk of postoperative complications increases with age251 and is greater with mastectomy than with lumpectomy.249,251,252 Additional adverse events associated with mastectomy may include skin flap necrosis (in 10 to 18 percent of cases), which may require additional surgery or delays in adjuvant treatment, nipple necrosis (in 3 to 22 percent of cases), and phantom breast syndrome (the sensation of residual breast tissue).252

Whole breast radiation is associated with uncommon acute toxicities (e.g., severe breast pain, moist desquamation) involving the treatment area. In addition, radiation may result in longer-term complications of cardiotoxicity, lung injury, or secondary malignancies. Improvements in radiotherapy techniques have reduced these risks over time.253 Chemotherapy is associated with acute toxicity resulting in side effects that usually resolve after treatment and differ based on the individual agents used; they most often include motor and sensory neuropathy, nausea, vomiting, hair loss, fatigue, vasomotor symptoms, and depression.254 Longer-term adverse effects of chemotherapy (including trastuzumab and hormonal therapy) vary by agent, but may include neuropathy,255 cardiovascular disease,256258 osteoporosis, cognitive dysfunction, and secondary malignancies.259

Long-term complications of primary treatment of breast cancer can include recurrent pain and skin infections in the chest wall, musculoskeletal issues (particularly reduced arm mobility), neurologic morbidity (including nerve injury, peripheral neuropathy, and cognitive dysfunction), cardiovascular disease, menopausal symptoms, psychological effects, fatigue, and an increased risk of second cancers associated with breast irradiation, chemotherapy, or tamoxifen.260

Given the uncertainty regarding the prognostic importance of DCIS, there is clinical variability in the treatment approach taken when DCIS is identified at screening. DCIS treatment (which may include surgery, radiation, and endocrine treatment) is intended to reduce the risk for future invasive ipsilateral (same side) breast cancer and consequent breast cancer mortality but is associated with harms. Prevention of future invasive cancer does not seem to be greater among those who undergo mastectomy in lieu of less invasive DCIS treatments.261,262 Despite lacking evidence of improved health outcomes, an analysis of SEER data from women diagnosed with unilateral primary DCIS between 2000 and 2014 found that over one-quarter of those referred for surgery chose mastectomy and the remaining 73 percent chose lumpectomy. Among those selecting mastectomy, most (75%) opted for removal of the affected breast, while the remaining opted for removal of both breasts.263 Treatment of DCIS with mastectomy was associated with younger age, having health insurance, and living in a region with fewer radiation oncologists.44 Research is ongoing to identify biomarkers and risk factors for progression, and to understand differences in the effectiveness of management and treatment options for reducing the risk of invasive cancer.264 Three clinical trials of active surveillance without surgery as a management strategy for low-risk DCIS are being evaluated within the international PREvent ductal Carcinoma In Situ Invasive Overtreatment Now (PRECISION) collaboration. These include two RCTs in the United States (COMET)265 and United Kingdom (LORIS)266 and a patient preference trial in The Netherlands (LORD).267 Until these trials are complete (estimated 2029–2030), the effectiveness of treatment of screen-detected DCIS to reduce breast cancer mortality remains unclear, and the extent to which it represents overdiagnosis and overtreatment is unknown.124

Treatment harms are of greatest concern when occurring among people who would not have otherwise experienced negative health consequences had their cancer not been screen-detected and treated. For some proportion of individuals participating in a screening program, the program could pose a greater risk to health than not participating. Unfortunately, it is very difficult to estimate the extent to which a screening program contributes to overdiagnosis. Based on the effectiveness studies from the 2016 evidence review, estimates of overdiagnosis ranged from nonexistent to nearly 50 percent of diagnosed breast cancer cases. Methods for estimating overdiagnosis varied in many ways, particularly by the type of comparison groups, assumptions about lead time, and the denominator used to calculate rates. In general, most adjusted estimates of overdiagnosis based on trials ranged from 11 to 22 percent. Estimates from observational and aggregated data range more widely, from nearly zero to over half of cases being overdiagnosed.76,123 Estimates from statistical models ranged from 0.4 to 50 percent. In the context of these findings, a recently published analysis using a statistical model based on BCSC data estimated that 15.4% (95% uncertainty interval, 9.4% to 26.5%) of screen-detected cancer cases would be overdiagnosed in a program of biennial screening from ages 50 to 74 years.122

Limitations of Our Review

Our review scope was developed following USPSTF procedures for assessing the comparative effectiveness of screening for eligible populations (not high risk) seen in settings reasonably comparable in terms of technology and practice to the U.S. health care environment (very high Human Development Index settings). Comparative studies were included to inform USPSTF refinement of its guidelines on screening intervals, age to begin and end screening, screening modalities, and supplemental or personalized screening strategies. The literature on breast cancer screening is vast. We conducted a comprehensive search of the literature, reviewed the reference lists of key studies and review articles, and sought expert input. Although unlikely, it is possible that our review could have missed relevant eligible studies published in English or in a different language.

Some included studies did not report complete data or provided results that were coded or described in ways inconsistent with other included studies. We sent inquiries to trial authors seeking additional information or data on key outcomes, but not all authors responded and were able to provide needed results.

The study design inclusion criteria for this review contributed to the low number of included studies and may be considered a limitation of our approach, despite its adherence to the USPSTF procedures. The included NRSI literature was limited to studies that compared screening approaches in at least two study groups either assigned or selected into different screening programs. This criterion meant that our review excluded single-arm studies often used to examine screening test performance. The review also did not include questions about the accuracy of screening for detection of invasive cancer and therefore did not include data on the commonly reported metric of cancer detection rate or positive predictive value.

Detection rates from a single screening round were not an included outcome for this review since improvements in detection would not necessarily reduce cancer mortality. This was because of the potential for bias introduced by studies considering only a single screening round. Additional detection, especially of DCIS and early-stage cancers, might extend lead time without altering health outcomes or contribute to overdiagnosis. In studies considering more than one round of screening, reduced mortality could be inferred if subsequent screening rounds had fewer advanced cancers in the intervention group. This would suggest that the intervention was effective for better detection of early cancers that would have otherwise progressed, impacting treatment morbidity and breast cancer mortality. Similarly, commonly reported potential harms, such as recall rates and biopsy, were not taken from studies reporting only a single round of screening.

Limitations of the Evidence and Future Research Needs

Inherent challenges limit research and the availability of evidence on breast cancer screening. The majority of literature related to screening mammography comes from trials conducted in the 1970s through 1990s. The availability of more effective treatments and changes to screening technology could have implications for the estimated benefits and harms of screening obtained from earlier cohorts. Estimates of mortality benefits from historical trials could be greater or smaller than what is obtained with present day screening programs in the United States. While new trials on approaches to breast cancer screening could help inform screening programs, mortality from breast cancer is low at the population level, and therefore large sample sizes and long followup times would be needed to evaluate screening program effectiveness. Because of these challenges, much of the newer literature on breast cancer screening is focused on single-round comparative or diagnostic accuracy studies. Such studies have limitations for estimating the ultimate health effects of screening in light of potential overdiagnosis and improvements in survival in recent decades for cancer regardless of whether it is screen detected or clinically presenting with symptoms or a palpable mass.

Very few trials evaluate the comparative effectiveness of screening with different screening tests, intervals, or at different ages, and none have been conducted in the United States. Much of the recent literature on mammography screening has been aimed at estimating the test performance characteristics of different screening modalities, and especially whether the use of DBT screening alongside mammography might be more sensitive (for detecting cancers early) and specific (reducing the likelihood of false-positive results), or whether certain subgroups may especially benefit from the new technology. Such studies can be informative for determining whether a new test is as good as or better than an older test. In the case of breast cancer, however, the advantages of earlier detection may be mitigated by the fact that treatment has grown increasingly effective for cancers detected at later stages, and small indolent or slow growing cancers could have similar outcomes if detected later. This makes it difficult to determine from test performance alone whether a modest gain in detection of smaller, early-stage cancers would necessarily lead to improved health or simply lengthen the time women live with a diagnosis. Studies reporting health outcomes are needed to resolve these questions. Finally, more robust measures and data collection on potential screening harms, including the patient perspective on false-positive experiences and harms from treatment, are needed.

The trials comparing DBT to DM from Europe and a nonrandomized study from the United States using BCSC data included in this review do not show evidence of a stage shift, which would be anticipated if a health or mortality benefit were likely to follow. Additional rigorous studies could contribute to the literature, ideally using experimental designs with randomization or quasi-randomization to reduce the risk of bias from confounding and selection bias common to nonrandomized observational studies on the topic. Importantly, such studies should actively recruit enough Black, Asian, Hispanic, AI/AN, and Pacific Islander participants to investigate how differences in screening, diagnosis, and treatment vary and affect outcomes. DBT has increasingly been adopted for routine screening in the United States, and there are disparities in access to this technology seen for Black women, rural women, and others. Even if DBT itself has not yet been shown to confer a screening advantage over DM, limited access to this newer technology may be a marker for broader inequities in followup and treatment that contribute to higher breast cancer mortality for Black women. Studies comparing the health outcomes of DBT and DM screening often are biased by selection and confounding—meaning populations that suffer lower access to comprehensive evidence-based health care are also less likely to be screened with DBT. Studies employing randomization are critical for obtaining unbiased effects.

As discussed above, research is needed to identify the underlying causes of inequities in breast cancer mortality along the clinical pathway. Screening rates are similar when comparing national data between Black women and White women, although for some vulnerable groups living in resource limited areas rates are lower and inequities greater. In addition to supporting guideline concordant screening, research is needed to identify and address factors other than access to screening. The importance of inequities along the clinical pathway following screening including diagnostic followup, treatment, and support services is increasingly recognized. Research is needed to identify where inequities exist and to develop interventions that close the care gaps following a positive screening result.

A consistent definition of advanced cancer has not been established in the literature, but stages II+ and stages IIB+ are the most common distinctions. Greater uniformity of reporting would benefit the comparability and interpretation of breast cancer screening studies. Since stage II includes localized cancers with average survival rates of 99.1 percent, their inclusion in study-reported definitions of advanced cancer may limit conclusions; treatment approaches and clinical outcomes differ for localized cancers. Including descriptions of whether cancers were staged according to an anatomic or prognostic staging system would add additional insight, as predicted mortality rates can vary slightly between the two.268

Additional studies with longer-term followup, preferably extended randomized trials allowing for comparisons across multiple rounds of screening, are needed to understand the impact of supplemental testing in women with dense breasts or other factors associated with increased risk on important breast cancer outcomes, including morbidity and mortality. Only RCTs and longer-term followup can address risks of bias due to length time bias (earlier detection of cancer not resulting in improved outcomes) as well as the impact of overdiagnosis (leading to unnecessary treatment).

Our review did not identify any completed studies comparing outcomes for people with different screening initiation ages that met the review inclusion criteria. Study design challenges limit rigorous research on this topic. Studies comparing a group screened in their 40s with a cohort initiating screening at age 50 years 10 years later are subject to risk of bias since cancers detected and treated a decade apart experience different screening and treatment protocols. In the United States, many people commence screening at age 40, in part due to the discordant screening recommendations among leading guideline groups. This further reduces opportunities to randomize people in this age group to begin or delay screening. Newer methods for analyzing observational data, such as those using emulated trials,169 propensity scoring, or Mendelian randomization,269 may be able to better address confounding and selection biases.

Ongoing Studies

We identified several ongoing studies relative to this review that are examining individualized risk-based screening, screening interval, and use of DBT with DM (Appendix H).

The current review did not identify any completed studies that incorporated a personalized approach to decisions about when to begin screening using an experimental design. The ongoing WISDOM trial should provide new evidence to improve our understanding of the effect of practical implementation of personalized screening on cancer detection, health outcomes, patient satisfaction, and screening adherence. Recruitment is ongoing with a target of enrolling 100,000 women ages 40 to 74 consenting to be randomized to either annual screening or individualized, risk-based screening. The trial is expected to be completed in March 2025. Another ongoing trial will contribute data on breast cancer–specific survival to a combined analysis with the WISDOM trial. The My Personalized Breast Screening study (MyPeBS, expected completion in 2025) is randomizing 85,000 women in Europe and Israel to standard screening (based on current national or regional guidelines) or screening with DM and/or DBT every 1 to 4 years (with or without ultrasound depending on breast density) based on estimated 5-year risk of developing breast cancer. These two trials will provide valuable data to address research gaps identified in the current review.

The comparative effectiveness of different screening intervals will be assessed in the ongoing MISS trial (expected completion in 2026). The trial will randomize 60,000 women ages 45 to 49 years presenting for their first or second mammography screening to one of three arms—annual screening according to Italian screening program guidelines, biennial screening with DBT/sDM, or a tailored screening interval based on breast density (women with dense breasts being screened annually and women with nondense breasts screened biennially). Participants will be followed for six years to compare the cumulative incidence of advanced breast cancer (stage 2 or higher), recall from screening, and interval cancers between screening intervals. A second Italian trial (Tailored Screening for Breast Cancer in Premenopausal Women, or TBST) planned to randomize 33,000 women ages 44 to 45 years to annual screening or tailored screening based on breast density; the results of this study will be part of a pooled analysis with the MISS trial.

Two ongoing Italian trials are comparing use of DBT/sDM versus DM. The MAITA trial is randomizing 8,000 women ages 45 to 65 years to one round of screening with DBT/sDM or DM. After one year for women ages 45 to 49 years and two years for women ages 50 to 65, all participants will be re-screened with DM. The similarly designed IMPETO trial aims to randomize 6,000 women ages 45 to 46 years to one round of screening with DBT/sDM or DM; after one year, all women will be re-screened with DM. The primary outcome is the cumulative incidence of advanced breast cancer (stage 2 or higher). Recall rates and benign biopsy rates will also be assessed. The MAITA trial is expected to be completed in 2026; enrollment in the IMPETO trial was postponed due to COVID-19 and the completion date has not been updated. Additionally, the PROSPECTS trial, set in the United Kingdom, is randomizing 100,000 women ages 49 to 71 years to one round of screening with DBT plus sDM or DM versus DM alone. The primary outcome is invasive cancer detection rates and interval cancer rates. Recall rates and benign biopsy rates will also be assessed.

Finally, a trial to evaluate the comparative effectiveness of DBT and DM mammography is currently underway in the United States and Canada (expected completion 2030).270 The Tomosynthesis Mammographic Imaging Screening Trial (TMIST) is randomizing 128,905 women ages 45 to 74 years. Individuals who are premenopausal at baseline will be screened annually (four times) and those who are postmenopausal biennially (twice) for four years with either DM or DBT during the trial and followed for four additional years (total eight years). The primary outcome is the incidence of advanced breast cancer (defined according to combinations of tumor size; ER, PR, and HER2 status; and tumor spread).268 Secondary outcomes include breast cancer–specific mortality, test performance, interval cancers, and recall and biopsy rates. Potential differences in the intervention by age, menopausal and hormonal status, breast density, race, ethnicity, and family cancer history will also be tested for the study endpoints.

Future comparative effectiveness reviews will benefit from the publication of additional followup from the included trials and of new trials currently underway. Studies using existing registry and cohort data analyzed using advanced statistical methods may also contribute to addressing current evidence gaps.

Conclusions

Previous reviews of breast cancer screening for the USPSTF, and the basis for its current screening recommendations, were grounded in evidence from effectiveness trials that showed decreased breast cancer mortality with mammography screening for women ages 50 to 69. Newer publications with long-term followup to trial endpoints would not change previous conclusions based on these trials, indicating a screening benefit for this age group. No new trials of the effectiveness of breast cancer screening are forthcoming, yet unanswered questions remain with respect to features of an optimal screening program designed to save the most lives while not subjecting healthy people to screening-related harms.

Comparative effectiveness trials comparing different screening modalities have not reported mortality outcomes, but among those with results from multiple rounds of different screening interventions an effect on mortality might be inferred if subsequent screening rounds had fewer advanced cancers. The ongoing trials comparing DBT to DM from Europe included in this review do not show a signal suggestive of stage shift, however, which would be anticipated if a health benefit is ultimately to be obtained. Overall, the studies indicated no or minor differences between DBT and DM screening in effectiveness and potential harms. Results from studies comparing screening programs involving supplemental imaging were too limited to evaluate potential benefits that could be inferred from signs of stage shift but increased false-positive and biopsy harms occurred with supplemental screening.

The current evidence synthesis reflects progression of the science from questions of effectiveness towards questions of comparative effectiveness. Also, while related questions on test performance were examined in previous reviews, the current review uses different selection parameters to include studies. Applying the USPSTF review procedures and evidence requirements to the comparative effectiveness literature on breast cancer screening intentionally narrowed the focus, resulting in fewer included studies, relative to prior reviews. Changes in screening recommendations could arise from evidence on the effectiveness of new screening technologies or improved understanding of differential effects of screening starting and stopping age, or evidence on supplemental screening for women based on their breast cancer risks and personal preferences. Our review found little evidence to guide these refinements in breast cancer screening. Ongoing trials and future comparative studies may help fill the research gaps we have outlined, ideally in populations including people reflective of the U.S. demographic composition with respect to race and ethnicity. Notably, nearly all breast cancer screening trials have been conducted outside of the United States, most enrolling mainly White European populations. Studies are needed that focus on and enroll adequate numbers of underrepresented populations that face increased risk of breast cancer mortality. Finally, research and programs to identify and address factors underlying inequities in breast cancer survival, especially for Black women, are needed to improve interventions along the clinical pathway, including screening, timely diagnostic evaluation, and high-quality treatment programs, that could lead to better health and survival.

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (4.1M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...