U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Evans DG, Astley S, Stavrinos P, et al. Improvement in risk prediction, early detection and prevention of breast cancer in the NHS Breast Screening Programme and family history clinics: a dual cohort study. Southampton (UK): NIHR Journals Library; 2016 Aug. (Programme Grants for Applied Research, No. 4.11.)

Cover of Improvement in risk prediction, early detection and prevention of breast cancer in the NHS Breast Screening Programme and family history clinics: a dual cohort study

Improvement in risk prediction, early detection and prevention of breast cancer in the NHS Breast Screening Programme and family history clinics: a dual cohort study.

Show details

Chapter 4PROCAS: Predicting Risk of Breast Cancer at Screening

Introduction

Individual breast cancer risk is highly variable, and yet the interval for mammography in the NHSBSP is the same (3-yearly) for all women regardless of risk. The only exception is women who identify themselves through cancer genetic services and are assessed as high risk according to the 2013 updated NICE guidance. These women are now entitled to annual screening between the ages of 50 and 59 years, although this is still not fully embedded in the NHSBSP. During the 2011–12 screening round, overall NHSBSP coverage was 77.0%.156 Uptake of routine invitations for women aged 50–70 years was 73.1%,156 with comparatively lower uptake (67.7%) in the 71–74 years age group, including the programme’s extension to women aged 71–73 years.156 A total of 15,749 women aged ≥ 45 years had cancers detected by the screening programme in 2011–12, a rate of 8.1 cases per 1000 women screened, with the cancer detection rate being highest among women aged > 70 years (13.9 per 1000 women screened).156 Uptake to breast screening in Greater Manchester is typically slightly lower than the national average, with 69.5% of eligible women screened during the 2011–12 screening round.

The Greater Manchester NHSBSP covers five main areas of Greater Manchester: Tameside, Oldham, Salford, Manchester and Trafford. In each of these areas there are several local screening sites. Attendance at screening varies broadly, both across each main area and by local screening site. Table 27 shows the overall uptake in each screening area, along with figures for the sites with the lowest and highest uptake in each area.

TABLE 27

TABLE 27

Uptake to breast screening in the Greater Manchester NHSBSP in the last screening round for each site (as at 31 May 2014)

Evidence suggests that approximately 10% of women of screening age are likely to be at moderate or high risk of breast cancer. Although those with a significant family history of breast cancer may be aware that they are at increased risk, a large number of women without significant family history may be at increased risk because of lifestyle factors and be unaware of this risk. Furthermore, with the exception of those who request referral to a family history or genetics clinic, there is currently no way for women to receive risk assessment and access appropriate risk-based surveillance and preventative interventions.

Risk factors

There are a number of breast cancer risk factors, which fall broadly into the following categories: family history, hormonal and reproductive risk factors, genetic factors, lifestyle risk factors and MD.

Family history of breast cancer in relatives12

When assessing risk based on family history of breast cancer, the following factors must be taken into consideration: degree of relationship (first degree or greater); whether the cancer is unilateral or bilateral; age at onset of breast cancer; number of cases in the family and whether or not they occur on the same side; other related early-onset tumours (e.g. ovarian); and number of unaffected relatives (large families with many unaffected relatives will be less likely to harbour a high-risk gene mutation).

Hormonal and reproductive risk factors

Exposure to endogenous oestrogens is an important risk factor for breast cancer. Early age at menarche (aged < 12 years) and late menopause (aged > 53 years) increase breast cancer risk by increasing years of oestrogen and progesterone exposure.1321

Long-term combined HRT use after the menopause (> 5 years) is associated with a significant increase in breast cancer risk, although the risk from oestrogen-only HRT appears much lower and may be risk neutral.1619 The results of a meta-analysis suggest that HRT appears to increase risk cumulatively by 1–2% per year, although the risk disappears within 5 years of cessation.15 A meta-analysis suggested that both during current use of the combined oral contraceptive and 10 years post use, there may be a 24% increase in risk of breast cancer.13

The age at first full-term pregnancy influences the RR of breast cancer, with early first full-term pregnancy offering some protection by transforming breast parenchymal cells into a more stable state, potentially resulting in less proliferation in the second half of the menstrual cycle. Women having their first full-term pregnancy over the age of 30 years have double the risk of women who had their first full-term pregnancy under the age of 20 years, and this is the case for both women in the general population and those at highest risk from a BRCA1/BRCA2 mutation.20,21

Genetic factors

Mutations in breast cancer genes such as BRCA1 and BRCA2 are too infrequent to affect risk prediction appreciably in the models for the general population. However, recently identified SNPs in many genes and outside genes (n = 77),27 which individually confer small changes in risk, may prove useful in predicting larger differences in risk when considered together. Four GWASs,810,27 published before our programme grant, found common genetic variants (SNPs) each carried by 28–44% of the population associated with a 1.07–1.26 RR of breast cancer. These variants, linked to four genes (FGFR2, TOX3, MAP3K1 and LSP1), confer as much as a 1.17–1.64 risk if two copies are carried. When combined in an individual, they give higher than additive risk of breast cancer.29 Another variant, CASP8, is associated with reduced breast cancer risk.30 There are now 77 genetic variants associated with breast cancer risk, but their application requires further validation and assessment of interactions. Therefore, to improve the accuracy of existing risk prediction models, it is necessary to investigate validated SNPs as they are discovered, and, where possible, incorporate these genetic factors into the best-performing risk models.

Lifestyle risk factors

Expert reports estimate that 15–40% of breast cancer may be preventable by lifestyle change: weight control, exercise and reduced alcohol intake.157,158 These estimations are consistent with recent cohort studies which report 25–30% less breast cancer among women who adhere to cancer prevention recommendations, that is, those who are a healthy weight (BMI of 18.5–25 kg/m2), limit alcohol (< 1–2 units per day) and engage in regular moderate or vigorous physical activity (100–200 minutes per week), than among women who do not follow these recommendations.159161 We and others have demonstrated that losing and maintaining a weight loss of ≥ 5%, either before or after the menopause, reduces post-menopausal breast cancer risk by 25–40% in the general population.162164 Weight control also appears to be important for reducing the risk of post-menopausal breast cancer among women at moderate risk with a family history with low-penetrance genetic variants (15–30% lifetime risk),159,160,165 and among high-risk BRCA1/BRCA2 carriers (up to 60–80% lifetime risk). BMI is the only lifestyle factor that is currently incorporated into any risk model.

Mammographic density

The relationship between MD and breast cancer risk was first established using subjective categorical classifications, based on descriptors or on the percentage of the breast area occupied by radio-opaque (dense) connective and epithelial tissue, as opposed to the radiolucent fatty component of the breast.23,26 Women with high density were found to have a risk 4–6 times higher than that of those with predominantly fatty breasts.33 Subsequently, semiautomated methods were developed to quantify more accurately the percentage area of dense tissue. In particular, Cumulus software34 became the accepted gold standard for quantification of density for research purposes. Cumulus is an interactive program in which an operator selects a threshold to separate dense from fatty regions. It has never been widely adopted in routine clinical practice as it requires a skilled operator and is time-consuming, but Cumulus measurements have been shown to relate strongly to breast cancer risk.26 In the USA, the most widely used categorisation of density is the American College of Radiology’s BI-RADS (Breast Imaging Reporting and Data System), a subjective visual assessment into four classes which aims to identify mammograms in which the sensitivity of mammography is reduced because of MD.166 In the UK, the use of BI-RADS is not common, and other techniques such as estimation of percentage density using a VAS have been successfully related to risk of cancer.167

Subjective visual estimates of MD and thresholding methods suffer from the limitation that the assessment is in two dimensions, as the mammogram is a projection image of the three-dimensional structure of the breast. The same volume of dense tissue could, therefore, give rise to different density estimates, depending on compression and imaging. To overcome this and reduce dependence on imaging parameters, researchers developed methods to calculate the three-dimensional volumes of fatty and dense fibroglandular tissue in digitised film mammograms, either by a process of calibration involving the acquisition of images of, for example, a calibrated stepwedge168,169 or by modelling the physics of the imaging process.170 With the advent of FFDM, it has become possible to routinely quantify the volumes of dense fibroglandular and fatty tissue in the breast, using models of the imaging process and information from the Digital Imaging and Communications in Medicine (DICOM) header associated with the images. The first two commercial products to achieve this were Quantra, from Hologic, and Volpara, from Volpara Solutions. Although the body of research associating such methods to risk is not as extensive as that for visual and semiautomated methods because of the limited longitudinal FFDM data available, these methods have been shown to correlate with BI-RADS and Cumulus, and agree with MR-based assessments of volumes of fat and gland.171175

Risk estimation models

Breast cancer risk is generally assessed using models that include a combination of the previously stated risk factors.11,22,149,150 Breast cancer risk models perform well at predicting the overall number of breast cancer cases arising in a particular population, but are poor at identifying the specific individuals.22

In the USA, the Gail model is widely used.149,150 Until recently, the two most frequently used models were the Gail model and the Claus model. More recently, in the UK, the TC and BOADICEA models have been used.

Gail model

The Gail model was originally designed to determine eligibility for the Breast Cancer Prevention Trial, and has since been modified (in part to adjust for race)11,22 and made available on the National Cancer Institute website (http://bcra.nci.nih.gov/brc/q1.htm). The model has been validated in a number of settings and probably works best in general assessment clinics,176 where family history is not the main reason for referral. The Gail model is based on age, first-degree family history, the number of surgical biopsies of the breast and reproductive factors such as age at menarche, age at first pregnancy and age at menopause. The major limitation of the Gail model is the inclusion of only first-degree relatives, which results in underestimating risk in the 50% of families with cancer in the paternal lineage, and also takes no account of age at onset of breast cancer.57,176 As a result, it performed less well in our own validation set from a FHC (Table 28), substantially underestimating risk overall and in most subgroups groups assessed.

TABLE 28

TABLE 28

Known risk factors and how they are incorporated into existing risk models

Claus model

Three years after the publication of the Claus model, lifetime risk tables for most combinations of affected first- and second-degree relatives were published.43 Although these do not give figures for some combinations of relatives, such as for mother and maternal grandmother, an estimation of this risk can be garnered by using the mother–maternal aunt combination. An expansion of the original Claus model estimates breast cancer risk in women with a family history of ovarian cancer.177 The major drawback of the Claus model is that it does not include any of the non-hereditary risk factors, and agreement between the Gail and Claus models has been shown to be relatively poor.178180 Although the tables make no adjustments for unaffected relatives, the computerised version is able to reduce the likelihood of the ‘dominant gene’ with increasing number of affected women. However, the tables give consistently higher risk figures than the computer model, suggesting either that a population risk element is not added back into the calculation or that the adjustment for unaffected relatives is made from the original averaged figure rather than assuming that each family will have already had an ‘average’ number of unaffected relatives.57 The latter appears to be the likely explanation, as inputting families with zero unaffected female relatives gives risk figures close to the Claus table figure. Another potential drawback of the Claus tables is that these reflect risks for women in the 1980s in the USA. These are lower than the current incidence in both North America and most of Europe. As such, an upwards adjustment of 3–4% for lifetime risk is necessary for lifetime risks < 20%. Our own validation of the Claus computer model showed that it substantially underestimated risks in the FHC. However, manual use of the Claus tables provided accurate risk estimation (see Table 28). A modified version of the Claus model has now been validated as ‘Claus extended’, by adding risk for bilateral disease, ovarian cancer and three or more affected relatives.76

BRCAPRO

Parmigiani et al.181 developed a Bayesian model that incorporated published BRCA1 and BRCA2 mutation frequencies, cancer penetrance in mutation carriers, cancer status (affected, unaffected or unknown) and age of the consultee’s first- and second-degree relatives. An advantage of this model is that it includes information on both affected and unaffected relatives. In addition, it provides estimates for the likelihood of finding either a BRCA1 or a BRCA2 mutation in a family. An output that calculates breast cancer risk using the likelihood of BRCA1/BRCA2 can be utilised. None of the non-hereditary risk factors can yet be incorporated into the model (see Table 28). The major drawback from the breast cancer risk assessment aspect is that no other ‘genetic’ element is allowed for.15 Therefore, in breast cancer-only families it will underestimate risk. As a result, BRCAPRO produced the least accurate breast cancer risk estimation from our FHC validation. It predicted only 49% of the breast cancers that actually occurred in the screened group of 1900 women.57

Tyrer–Cuzick model

Until recently, no single model integrated family history, surrogate measures of endogenous oestrogen exposure and benign breast disease in a comprehensive fashion. The TC model, based partly on a data set acquired from IBIS-I and other epidemiological data, has now done this.11 The major advantage over the Claus model and BRCAPRO is that the model allows for the presence of multiple genes of differing penetrance. It does give a read-out of BRCA1/BRCA2, but also allows for a lower-penetrance BRCAX. As can be seen in Table 28, the TC model addresses many of the pitfalls of the previous models: significantly, the combination of extensive family history, endogenous oestrogen exposure and benign breast disease (atypical hyperplasia). In our original validation, the TC model performed by far the best at breast cancer risk estimation.57

Model validation

In a previous study, the goodness of fit and discriminatory accuracy of the above four models was assessed using data from 1317 women. The main analysis was on data from 1933 women attending our Family History Evaluation and Screening Programme in Manchester, UK, who underwent ongoing screening, of whom 52 developed cancer.57 All models were applied to these women over a mean follow-up of 5.27 years to estimate risk of breast cancer. The ratios of expected to observed numbers of breast cancers (95% CI) were 0.48 (0.37 to 0.64) for Gail, 0.56 (0.43 to 0.75) for Claus, 0.49 (0.37 to 0.65) for Ford and 0.81 (0.62–1.08) for TC (see Table 28). The accuracy of the models for individual cases was evaluated using receiver operating characteristic curves. These showed that the AUC was 0.735 for Gail, 0.716 for Claus, 0.737 for Ford and 0.762 for TC. The TC model was the most consistently accurate model for prediction of breast cancer. Gail, Claus and Ford all significantly underestimated risk, although with a manual approach the accuracy of Claus tables may be improved by making adjustments for other risk factors (‘Manual method’) by subtracting from the lifetime risk for a positive endocrine risk factor (e.g. a lifetime risk may change from 1 in 5 to 1 in 4 with late age of first pregnancy). The Gail, Claus and BRCAPRO models all underestimated risk, particularly in women with a single first-degree relative affected with breast cancer. TC and the Manual model were both accurate in this subgroup. Conversely, all of the models accurately predicted risk in women with multiple relatives affected with breast cancer (i.e. two first-degree relatives and one first-degree plus two other relatives). This implies that the effect of a single affected first-degree relative is higher than may have been previously thought. The Gail model is likely to have underestimated in this group, as it does not take into account age of breast cancer, and most women in our single first-degree relative category had a relative diagnosed at < 40 years of age. The Ford, TC and Manual models were the only models to accurately predict risk in women with a family history of ovarian cancer. As these were the only models to take account of ovarian cancer in their risk assessment algorithm, this confirmed that ovarian cancer has a significant effect on breast cancer risk.

The Gail, Claus and BRCAPRO models all significantly underestimated risk in women who were nulliparous or whose first live birth occurred after the age of 30 years. Moreover, the Gail model appeared to increase risk with pregnancy under the age of 30 years in the familial setting. It is not clear why such a modification to the effects of age at first birth should be made, unless it is as a result of modifications to the model made after early results suggested an increase with BRCA1/BRCA2 mutation carriers. However, the Gail model has determined an apparent increase in risk with early first pregnancy and it would appear to be misplaced from our results, and from subsequent studies published on BRCA1/BRCA2. Furthermore, the Gail, Claus and BRCAPRO models also underestimated risk in women whose menarche occurred after the age of 12 years. The TC and Manual models accurately predicted risk in these subgroups. These results suggest that age at first live birth also has an important effect on breast cancer risk, while age at menarche perhaps has a lesser effect. The effect of pregnancy under the age of 30 years appeared to reduce risk by 40–50%, compared with an older first pregnancy or late age/nulliparity, whereas at the extremes of menarche there was only a 12–14% effect. Our study remains the only one to validate risk models prospectively and, clearly, further such studies are necessary to gauge the accuracy of these and newer models. Indeed, the tendency to modify models to adapt for new risk factors without prospective validation in an independent data set is a problem, and can lead to erroneous risk prediction.

BOADICEA

Using segregation analysis, a group in Cambridge, UK, has derived a susceptibility model (BOADICEA) in which susceptibility is explained by mutations in BRCA1 and BRCA2 together with a polygenic component reflecting the joint multiplicative effect of multiple genes of small effect on breast cancer risk.182 The group has shown that the overall familial risks of breast cancer predicted by the model are close to those observed in epidemiological studies. The predicted prevalences of BRCA1 and BRCA2 mutations among unselected cases of breast and ovarian cancer were also consistent with observations from population-based studies. The group also showed that its predictions were closer to the observed values than those obtained using the Claus model and BRCAPRO. The predicted mutation probabilities and cancer risks in individuals with a family history can now be derived from this model. Early validation studies have been carried out on mutation probability but not yet on cancer risk prediction.183

Model selection in NHS Breast Screening Programme

The Claus, BRCAPRO and BOADICEA models are unsuitable for population prediction, as they are entirely based on family history risk factors. Therefore, the models being assessed as part of this project are Gail and TC.

National Institute for Health and Care Excellence guidelines for moderate- and high-risk women

Women are considered to be at moderate risk of breast cancer if they have a 10-year risk of between 5% and 8%, as measured by TC, and at high risk if they have a 10-year risk of ≥ 8%. Current NICE guidelines for management of women at risk of familial breast cancer recommend that women between the ages of 50 and 59 years who are at increased risk but do not have a BRCA1/BRCA2 mutation be offered annual mammography and advice about menstrual, reproductive, hormonal and lifestyle risk factors.142 Chemoprevention with tamoxifen or raloxifene is also recommended for those at increased familial risk of breast cancer.142

Methods

Collection of risk information

Risk information was collected by running a large-scale, regional study entitled PROCAS within the Greater Manchester NHSBSP. Recruitment was carried out in two phases: for the first 3 years of recruitment, all women invited for breast screening were sent an invitation to participate in the PROCAS study; the second phase of recruitment involved inviting only those who had not previously attended screening in the area. As screening is triennial, this meant that all women attending screening during the recruitment period were invited once during this time.

A two-page questionnaire (see Appendix 1) was devised to collect the risk information required to calculate individual breast cancer risk. Family history information, including number and ages of sisters, current age or age at death of mother and details of any relatives affected by breast or ovarian cancer, was collected. It was not possible to collect information on unaffected second-degree relatives, as this would have required a much longer questionnaire, which we believe would have deterred women from participating. Hormonal risk factors, namely age at menarche, menopausal status, HRT use and parity, were collected. The following lifestyle information was also collected: current BMI, BMI aged 20 years, clothes size, alcohol consumption and exercise habits.

Women were mailed the questionnaire and a consent form in the interval between receiving the call for screening and attendance, in five screening areas of Greater Manchester. Women were consented to the study when they attended their screening appointment. The vast majority of participants were consented by a radiographer, rather than by a dedicated member of the study team. Completed consent forms and questionnaires were sent to the study team, based at the University Hospital of South Manchester, where the questionnaire data were entered onto a study database and a 10-year TC risk score for each individual was automatically produced. Participants were also asked to indicate whether or not they wished to be informed of their individual risk of breast cancer.

Uptake to screening in the Greater Manchester NHSBSP across all screening sites during the first phase of recruitment was 68%, and overall uptake to PROCAS across all sites during this time was 37%. There was wide variation in uptake to screening and to PROCAS in the various screening sites, as demonstrated in Table 29.

TABLE 29

TABLE 29

Percentage uptake to screening and PROCAS by site in recruitment phase 1

In the second phase of recruitment, uptake to screening across all sites was 58% and uptake to PROCAS was 47%. As recruitment phase 2 involved recruiting only those attending screening for the first time, it is not possible to report uptake by site, as these data are gathered collectively for all attendees and so cannot be obtained for specific groups of women. It also means that there will be a significantly higher number of younger participants than in recruitment phase 1.

Demographics of PROCAS participants

Tables 30 and 31 show the demographics of the PROCAS participants recruited in each recruitment phase. During recruitment to phase 1, the proportion of women in age groups 50–54, 55–59 and 60–64 years was 20–25%; fewer women were in the 65–69 age group (17%) and in the under-50 (7%) and over-70 age groups (6%). The majority were white (91%) and almost 4% did not report their ethnicity. Initial TC scores were low (< 2), average (2–4), moderate (5–7) and high (≥ 8) for 19.6%, 70.6%, 8.6% and 1.2%, respectively. As expected in phase 2, the proportion of women recruited were younger, with 43% aged < 50 years and 49% aged 50–54 years, and the majority were white (89%). Equally high proportions in both phases stated a preference to be informed of their risk.

TABLE 30

TABLE 30

Demographics of PROCAS participants recruited in recruitment phase 1

TABLE 31

TABLE 31

Demographics of PROCAS participants recruited in recruitment phase 2

The higher percentage of Asian participants in phase 2 may simply reflect the ethnicity spread to younger ages in those of Asian origin in the UK and Greater Manchester. The NHSBSP does not have ethnicity data to determine which ethnicities have been invited for screening.

Table 32 shows further characteristics of the whole PROCAS population. Approximately one-third of the PROCAS population were in the normal BMI category range; this was a greater proportion than expected from the general population (p < 0.0001). However, the PROCAS population, although containing fewer overweight women (p < 0.0001), also contained a larger proportion of obese women (p < 0.0001) (Table 33).

TABLE 32

TABLE 32

Characteristics of PROCAS population (n = 53,596)

TABLE 33

TABLE 33

Body mass index of PROCAS population compared with 2012 averages for Greater Manchester

Approximately two-thirds of women in the PROCAS study were postmenopausal, with a further 18% being perimenopausal. Nineteen per cent reported being current or previous users (within the last 5 years) of HRT. The majority were parous (87%) and did not have an affected first-degree relative (87%).

The national UK average alcohol consumption for women aged ≥ 45 years in 2009 was 8.5 units per week.185 Assuming that the alcohol intake was in the middle of the range and that those drinking over 28 units averaged 35 units, the average intake of PROCAS women was 6.3 units daily, a little lower than the national average (see Table 32). The majority of women were inactive (81%) as defined by the EPIC study.22

The mean Index of Multiple Deprivation score for all women in the PROCAS population was 24.53.186 This ranged from 13.09 in Trafford to 38.44 in the Manchester district, where a higher Index of Multiple Deprivation score indicates a higher level of deprivation.

Risk feedback

All participants recruited to the study were asked to specify whether or not they wished to be informed of their personal 10-year TC breast cancer risk. Participants were advised that the majority of women would receive their risk via letter at the end of the study, but that all women who were found to be at high risk of breast cancer (≥ 8% 10-year TC risk) and a small number of those at low and moderate risk would receive an invitation for a risk feedback appointment with a study clinician experienced in risk communication. All participants had at least two opportunities to opt out of receiving personal breast cancer risk information: first at the time of initial consent (by not ticking a box labelled ‘I wish to know my risk’) and later by contacting the study co-ordinator. Those who received an invitation for a risk appointment were also given a further opportunity to opt out of receiving their risk by declining an appointment.

The majority of women recruited to PROCAS have received their risk feedback via letter in 2014 and 2015. This has been done as part of an externally funded study, which has allowed us to work directly with participants to co-design a letter that effectively communicates individual breast cancer risk while minimising negative psychological impact; to explore the acceptability of receiving this information; and to explore intentions to change behaviour. However, all women found to be at high risk of breast cancer (TC 10-year risk of ≥ 8%) and a proportion of those at moderate (TC 10-year risk of 5–7.99%) and low risk (TC 10-year risk of ≤ 1.5%) have received their risk feedback either in person or via telephone.

In total, to June 2014, 984 participants across the low-, moderate- and high-risk categories were invited for a risk feedback appointment and 687 (69.82%) participants attended their appointment. Uptake of risk feedback varied across the different risk categories, with highest uptake among those at highest risk. Pairwise comparisons showed that there were significant differences between attendance in those in the high- and moderate-risk categories versus those in the low-risk category (p < 0.001, p < 0.005, respectively) (Table 34).

TABLE 34

TABLE 34

Uptake to risk feedback invite by risk category

It became apparent during the risk feedback process that information provided by participants on their PROCAS questionnaire was not always accurate. As a result, participants’ risk often changed following a risk consultation, and in some cases participants’ risk changed to such an extent that they no longer remained in the same risk category. Table 35 shows how the numbers of participants in each risk category change following risk counselling. It is apparent that there were fewer changes in risk categorisation in those who were originally assessed as being low risk. This is to be expected because the majority of errors are with recording of relatives’ cancer diagnoses and the low-risk participants are very unlikely to have had any affected relatives. The greatest proportion of changes in risk occurred in those originally assessed as having a 10-year TC risk of ≥ 8%. This is largely because the PROCAS questionnaire, owing to space restrictions, did not collect information on unaffected female relatives, which is an important factor in risk prediction.

TABLE 35

TABLE 35

Changes to risk following risk counselling

The process of risk feedback, although part of the planned PROCAS study, was not part of the originally funded programme grant. Therefore, full analyses for all participants recruited to PROCAS are still ongoing. However, based on the first 40,000 participants, we are able to report that referral for 18-monthly screening was offered to 330 women, of whom two declined, and referrals from primary care providers have been received for 260 women, with 23 already receiving screening through the FHC. Thus, 283 out of 330 women (86%) have commenced additional breast screening. Three breast cancers have now been detected on the interval 18-monthly breast screen: (1) a multifocal DCIS with an 8 mm and 4 mm focus of invasive grade 2 node-negative invasive ductal carcinoma in a 57-year-old; (2) a 51-year-old with a 15 mm grade 2 node-negative invasive ductal carcinoma; and (3) a 54-year-old with a 7 mm grade 1 node-negative invasive ductal carcinoma. Of the first 40,000 participants, 10 of the 575 with ≥ 8% 10-year risk were identified with breast cancer on the first mammogram; three of the five detected subsequently were picked up on the interval mammogram; one was an interval cancer before risk counseling and the other was picked up on the 3-year mammogram. Therefore, 15 (2.6%) cancers have occurred in those with a TC ≥ 8% 10-year risk. Six cancers (2.6%) have occurred in the 232 women with a 5–8% 10-year risk with ≥ 60% density: four at prevalent screen and two on the 3-year mammogram. Only nine breast cancers (0.64%) have occurred in the 1395 women with < 1.5% 10-year risks. After confirmation of the high-risk category, there have been 15 out of 441 (3.4%) who developed breast cancer.

Reattendance at screening

Of the first 40,000 participants, for high-risk women attending their risk feedback appointment the reattendance rate at the next 3-year screening was 93% (200/215). However, this rose to 99% (200/202) for those actually invited (six were aged over 70 years and therefore were not invited, one died and six had moved area). For low-risk women the reattendance rate was 81% (43/53). In addition, we were able to assess reattendance in those women who did not have risk feedback. Of those due to have a further mammogram, 112 out of 143 (78%) had reattended. Overall reattendance at the next 3-yearly screen was 411 out of 454 (90.5%) for those risk counselled, with high-risk reattendance significantly higher (p = 0.0006; p < 0.0001 for those invited) than for usual reattendance rates but reattendance rates for low-risk and those not counselled were not significantly lower (p = 0.65 and 0.065). Figures from the 2012–13 Greater Manchester NHSBSP showed that among women who attended their previous mammogram and whose last screen was within the last 5 years, 39,058 were invited and 32,925 (84.3%) attended. Overall reattendance at the next 3-yearly screen for women who attended their risk appointments over all three risk categories (high, moderate and low) was 90.5%. Reattendance was significantly higher for high-risk women invited for feedback (p = 0.015) than usual reattendance rates in Greater Manchester, but was not significantly lower for low-risk women and non-counselled women.

Risk perception

Of the first 40,000 participants, 253 out of 459 (55%) high-risk women and 56 out of 100 (56%) low-risk women filled in the risk perception questionnaire. Thirty-one high-risk women had previously attended the FHC for risk assessment between 1991 and 2011 (median 1997), some 1–20 years prior to their risk assessment in PROCAS. There was a clear trend for high-risk women to ascribe higher levels of risk to themselves than did low-risk women, and previously counselled women had more accurate risk perceptions for themselves and the general population than low-risk women. Women expressed risk in both descriptive categories and ORs as being higher in the high-risk group, although their estimates of the population risk were similar. Only a minority (37% high, 29% low) gave the ‘correct’ current lifetime risk range for the general population of 1 in 8–10, apart from those seen previously in the FHC (57%).185

There have been no untoward adverse events in women at high risk that we are aware of, and women were content to receive their risk information and keen to take action despite 86% learning their risk for the first time. Women at low risk were pleased to be informed of their risk, but only one (1%) expressed a desire to cease screening.

The present study has shown that it is possible to collect and feedback risk information to women at both high and low risk of breast cancer from a large population-based mammography screening programme. We believe that this is the first study to both assess and feed back breast cancer risk information on a population basis on this scale. Women at high risk were more likely than those at low risk to perceive that they were high risk before risk counselling. This is most likely due to the almost certain presence of a family history of breast cancer in those at high risk and the absence of this in those at low risk. Accordingly, both attendance at risk counselling (69% vs. 52%) and reattendance at the subsequent mammography screen were significantly higher in women counselled about their high risk than in those told about their low risk (p < 0.0001). Indeed, high-risk women were more likely than the entire screening population to reattend. Low-risk women, the majority of whom had received prior screening, were reluctant to discontinue screening. It is reassuring to screening programmes, which are judged by reattendance rate, that there was not a significant drop-off in attendance at the subsequent screen when such a programme as PROCAS is introduced.

Women at high lifetime risk of breast cancer are now recommended in the UK to be offered annual mammography screening between 40 and 60 years of age.142 There was a high take-up of the offer of additional screening in high-risk women in PROCAS; for the 14% not referred for extra screening by their GP, we are aware that this decision not to refer might have been the GP’s rather than the woman’s. It is reassuring that not only does the TC programme reliably identify women at high risk (the 2.6% detection rate can be considered to represent a 3-year period, including lead time: thus, 0.9% annually), but > 1% have been detected with extremely good prognosis stage 1 cancers at the interval screen. Although these numbers are small for the interval screens, they represent an extremely high rate. As all three of the cancers occurred in women in their fifties, even the grade 1 cancer would probably have presented during the woman’s lifetime and might not be considered an ‘overdiagnosis’.

This study has also assessed risk perception. Unlike many previous studies such as our own that were based mainly on women coming forward concerned about their breast cancer risks owing to family history,187189 the present study addresses risk perception in women at either end of the risk spectrum from the general population, the great majority of whom had not been assessed previously. Risk perception was, as reported previously, not overly accurate;187,189 however, high-risk women were significantly more likely than low-risk women to assess their risks as above average in both a verbal and an OR format. Perception of population risk was not statistically significantly different between the two previously uncounselled groups. However, those seen previously in the FHC had better overall risk perceptions, as we have reported before.188

There are some limitations to the present study. Although the study represents sampling of the whole screening population, only 43% of those screened joined the study. This could have biased the population to women with higher risks. A survey alongside our FHC did not suggest that this was the case, with the proportion identified as moderate risk from family history alone not being higher than those already identified in the 40–49 years age group in our region.190 We have not conducted formal assessment of the impact of risk information on anxiety and intention to change behaviour, although funding has been sought for this and this is planned in a new prospective arm. There were some inaccuracies in women’s filling-in of the questionnaires, particularly in relation to bilateral disease and the timing of menopause. In future, using an online version with prompts and pop-up questions to confirm these areas is likely to improve accuracy. Certainly, if only a paper questionnaire is used, confirming details in those whose management will change is important, as this may change risk category substantially. This means that, for those identified at high risk, further assessment is necessary. This is likely in practice, as women identified at high risk from questionnaire data would usually be offered referral.

Measurement of mammographic density in PROCAS

In the initial proposal, we planned to undertake visual assessment of mammograms using three breast density measures: VASs, Cumulus thresholding and the Manchester Stepwedge (for mammograms recorded on film).

Assessment using VASs had previously been employed by mammographic film readers in Manchester in CADET (Computer-Aided Detection Evaluation Trial),191 and a significant association with cancer risk was found.167 Cumulus thresholding has been widely used in research studies and the relationship with risk of developing cancer is well established.26,87 However, the research that underpins these relationships was based on analysis of film mammograms, and the NHSBSP was in the process of a transition to FFDM at the time the PROCAS study commenced. Risk relationships for digital mammography are at an early stage because of a lack of longitudinal data. The appearance of digital mammograms differs significantly from that of FFDM images, despite the application of post-processing. VAS and Cumulus are both area-based methods; they are assessments of a two-dimensional image of a three-dimensional structure. Consequently, they provide estimates of the proportion of the breast area occupied by dense fibroglandular tissue, and the estimates can vary depending on the way in which the breast is positioned. The third method we proposed to use was one we developed ourselves and which was evaluated in a pilot trial.168 This involved the analysis of digitised film mammograms by imaging a calibrated stepwedge alongside the breast. Preliminary evaluation of our own method and a similar one developed by colleagues in California192 suggests that calibration techniques provide risk information. Such methods were particularly attractive in the context of the NHSBSP, as an automatic read-out could be obtained from digitised mammograms, which could be integrated with risk information by appropriate computer programs.

As PROCAS progressed, two other breast density methods became available to us. These were both designed for use with FFDM images, provided that the raw (‘for processing’) data were available. The first of these is Quantra, from Hologic, which we obtained in July 2010. We started to run Volpara from Volpara Solutions in September 2010. Both of these methods were based on the work of Highnam and Brady,170 originally for film mammograms, which models the physics of the imaging process and enables the computation of volumetric breast density.

Visual estimation of percentage density using visual assessment scores

Screening mammograms are routinely reviewed by pairs of readers, with arbitration by a further pair, if required. Reader pairs generally comprise a consultant radiologist or a breast physician working with an advanced practitioner radiographer, but pairings are pragmatic, with the proviso of a maximum of one advanced practitioner radiographer in a pair. In analogue (film) mammography, the mammograms are displayed on an illuminated viewer, and for digital mammography, the images are presented on high-resolution monitors. In PROCAS, readers assess MD at the time of reading the screening films, recording estimates for all four views on a single paper form containing four 10-cm horizontal VASs, labelled 0% and 100% at the left and right ends, respectively. The two readers complete VAS forms independently. The forms are digitised and processed by custom software which reads the patient identification number, finds the positions of the scales and marks, converts them to percentage densities and outputs the results in a spreadsheet. As visual assessment is subjective, it suffers from intra- and interobserver variability. To improve consistency between readers, we developed a method for correcting values adjusting for each reader.193

Cumulus thresholding

Cumulus thresholding is applicable to both digitised film mammograms and FFDM images; for film mammograms we used a Vidar CAD-PRO digitiser. Cumulus software was obtained for use in this project, and training was undertaken in January 2010 by one of us (RW) (see Contributions of authors) who had been trained by the Toronto team that developed the software. Although several readers were trained, assessment using Cumulus was undertaken by a single reader (JS), whose performance was validated on test sets of data developed for this purpose by RW and the Toronto team. It takes approximately 1 day to analyse 200 images, so a single MLO view (the contralateral breast for cancer cases) was analysed for each woman in the FH-Risk cohort and a case–control set in the PROCAS-screening women. Analysis involves delineating the pectoral muscle and adjusting thresholds to identify the breast area and glandular component. The method produces measures of breast area, dense tissue area and hence fat area and area-based percentage density. As Cumulus is based on an operator’s assessment, it is subject to intra- and interobserver variability.

Manchester Stepwedge

The Manchester Stepwedge method is applicable only to digitised mammograms on which the calibrated aluminium stepwedge and thickness markers (on the breast compression plate) have been imaged. Calibration data were obtained for each analogue mammography unit used in PROCAS. To analyse the images, digitisation was undertaken using a Vidar CAD-PRO digitiser. An operator then ran custom software which locates the stepwedge and the compression markers in each image, providing an opportunity to review the automated detections and correct them as necessary. Analysis of the distance between pairs of markers in the mammogram enables accurate measurement of compressed breast thickness, taking into account tilt of the compression plate. The brightness of each pixel in the mammogram can be matched to the stepwedge image and, with the compressed thickness and calibration data, this enables computation of the thickness of fibroglandular tissue. The method accounts for differences in compression, plate tilt, imaging parameter changes and the drop-off in breast thickness where the breast loses contact with the compression plate. It outputs a measure of the volume of fat and gland in the breast and hence percentage density by volume.

Quantra

Quantra, from Hologic, is applicable to FFDM images obtained on Hologic 2D Systems (Hologic, Inc., Marlborough, MA, USA), GE 2D Systems (GE Healthcare Life Sciences, Buckinghamshire, UK) or Siemens Mammomat Novation Systems (Siemens Medical Solutions USA, Inc., Malvern, PA, USA) for which the raw (‘for processing’) data are available. It is a fully automated method that uses a model of the physics of the imaging process along with data from the DICOM image header to calculate the thickness of fibroglandular tissue at each pixel position in the image. It provides values for each screening view, each breast and per patient, giving the volume of the breast in cubic centimetres, volume of fibroglandular tissue (in cubic centimetres), percentage density by volume, a BI-RADS-like score and the area of dense tissue as a percentage of breast area. During the course of PROCAS, the version of software changed, and we are now using version 2.0.

Volpara

Volpara, from Volpara Solutions, is able to process images from a range of manufacturers (Hologic, GE, Siemens and Fuji). It is a fully automated method in which knowledge of tissue attenuation coefficients, the physics of the imaging process and information in the DICOM header are used to compute glandular thickness at each pixel position. Volpara uses a relative physics model which reduces the need for accurate imaging physics data, but depends on locating a suitable fatty reference area within the image.172,194 Volpara outputs the fibroglandular tissue volume, total breast tissue volume, percentage of density by volume, and a Volpara Density Grade correlated with BI-RADS. The software developed during the course of the PROCAS study, with the most recent version used being 1.4.5.

Mammogram data in PROCAS

The mammogram data used in the PROCAS study have been obtained using analogue mammography and two different types of FFDM system [Fischer Senoscan (Carestream Health Inc., Rochester, NY, USA) and GE Essential (GE Medical Systems Ltd, Chalfont St Giles, UK)]. At the outset of the study, the raw digital data were not collected. Table 36 gives a summary of the data available for analysis and the MD methods available for each data source.

TABLE 36

TABLE 36

Number of PROCAS cases acquired from each data source and MD methods applicable for each image type

Figure 25 shows the mammographic percentage density distributions for VAS, Quantra and Volpara for all women recruited to the PROCAS study. In total, 50,831 women had VAS MD assessment with a mean percentage density of 27.4, compared with 38,706 women who had Volpara measurement and mean density of 7.05 and 36,014 women who had Quantra measurement and a mean density of 12.01.

FIGURE 25. Measurement of density assessment for VAS, Volpara and Quantra.

FIGURE 25

Measurement of density assessment for VAS, Volpara and Quantra. (a) Mean = 27.396, SD = 17.079, n = 50,831; (b) mean = 7.05, SD = 4.037, n = 38,706; and (c) (more...)

Analysis of density data

We have undertaken a number of different analyses of MD data in PROCAS. The most comprehensive is a cohort analysis comparing the density of cancer cases with that of non-cancer cases. We have also undertaken a more rigorous case–control study which compares the density of the contralateral breast of women with unilateral screen-detected cancer in FFDM images with the density of the same breast in matched controls. Ideally, we would have evaluated density in the prior mammograms (the screening mammograms prior to cancer detection) as this would have provided a genuine assessment of the ability of MD to predict cancer. Within PROCAS this is not currently possible for digital data, as insufficient women have had a cancer at their second screen, or in the interval following the first screen; however, as the data become available we will undertake this analysis. The contralateral breasts of women with unilateral cancers are used as a surrogate for the priors, and MD is assessed by all available methods. Priors are when a previous mammogram is available before the mammogram which detected the cancer. We present a small case–control analysis of the film priors of cancer cases compared with matched controls.

We have also undertaken analyses of the impact of HRT, parity and menopausal status on volumetric MD; of the inter- and intraobserver variability in visual assessment of MD;195 of the relationship of MD to ethnicity;196 and of factors affecting density assessment.197 These analyses are described below, along with a method that we have developed which enables the adjustment of visual assessments to compensate for differences in practice between observers.193 Finally, we have evaluated the potential of using automated measures of breast volume to predict self-reported weight and BMI;198 the acquisition of weight data is problematic, and an automated, objective approach would be helpful.

Cohort study

Aim

The aim of this study was to compare the MD of women who developed breast cancer with that of those who did not, and hence to evaluate the performance of the different density assessment methods (VAS, Volpara and Quantra) employed in the PROCAS study.

Methods

Design

The study design was a large cohort study of women from the Greater Manchester area who were invited for breast cancer screening from October 2009 to June 2014. Cases were those who developed breast cancer while taking part in the PROCAS study, while controls were those who did not develop breast cancer while taking part in the PROCAS study. Density was assessed using all mammographic views. Mammograms were mainly analogue in the initial 12-month period (October 2009–September 2010), switching to completely digital thereafter (October 2010–June 2014). The date of entry to the study was the date of mammogram at study entry. Although the study was notified of breast cancers by the three methods below, a determined ‘last date of follow-up’ was not possible, as the study could not be certain that a participant was not affected by breast cancer on a certain date using the case ascertainment method.

Identification of cases

Cancer cases were identified by one of three methods:

  1. Matching the PROCAS data set to the Somerset Cancer Registry. The Somerset Cancer Registry is a ‘real-time’ database that collects information about the patient journey.199
  2. Matching with the NWCIS database, a histological database of breast cancer diagnoses, for cancers diagnosed in the north-west.200
  3. Notifications from participants that they had been diagnosed with breast cancer.

The three sources of information were cross-checked and all cancer diagnoses were validated.

Inclusion criteria

To enable processing with Volpara and Quantra, all women in this study had to have GE digital screening mammograms with raw (‘for processing’) image data as well as MD assessment on VAS. Mammograms were mainly analogue in the initial 12-month period (October 2009–September 2010), switching to completely digital thereafter (October 2010–June 2014).

Exclusion criteria

All women with a previous diagnosis of cancer were excluded from this study, as were those who did not have MD assessment by all three methods. These were the only exclusion criteria.

Outcome

The outcome for this study was the development of breast cancer by June 2014.

Density assessment

Mammographic density for the first mammogram while taking part in the PROCAS study was assessed by an area-based method (VAS) and volumetric methods (Volpara and Quantra). The VAS measures were made by two independent readers per case, drawn from a pool of 17 readers, and averaged. Version 1.4.5 of Volpara and version 2.0 of Quantra were used to obtain volumetric density data. For Volpara, the mean density across the four views was used, while for Quantra the maximum was used, in accordance with recommended practice.

Statistical analysis

In order to examine the relationship between density methods and case–control status, analysis was performed using logistic regression (in SPSS). Univariate associations were performed using quartiles for each density measure, with the lowest quartile as the referent category. Further multivariate associations were performed adjusting for age, menopausal status and BMI.

Results

In total, 33,543 women had MD assessment by all three density measures, of whom 401 had a previous diagnosis of cancer and were, therefore, excluded from this particular study. This left 33,142 women, of whom 437 developed breast cancer (1.32%).

Table 37 shows the number and percentage of cases and controls in each quartile for each density measure, as well as the univariate and multivariate associations. In the univariate analysis, all density measures, with the exception of Volpara percentage density, were associated with an increased risk of developing breast cancer. The strongest association was for VAS, with those in the highest quartile having twice the odds of developing breast cancer of those in the lowest quartile. Corresponding odds for Volpara dense volume, Quantra dense volume and Quantra percentage density were in the region of 1.5–1.7. Further adjustment for age, menopausal status and BMI made the associations with the third quartile of Volpara of dense volume and second quartile of Quantra percentage density non-significant, but the highest quartile of Volpara became statistically significant (OR 1.60, 95% CI 1.15 to 2.23). The other associations with Volpara and Quantra were of similar magnitude to those in the univariate analysis. On the other hand, the ORs for VAS increased further after adjustment for age, menopausal group and BMI, with those in the highest quartile having an OR of developing cancer of 2.75 compared with those in the lowest quartile.

TABLE 37

TABLE 37

Number of cancer and control cases by quartiles of density measures, univariate and multivariate ORs for developing breast cancer

Discussion

In the PROCAS cohort, for whom VAS assessment and the two volumetric methods were used, VAS showed the strongest associations with the development of breast cancer, but all methods showed some associations. For the cancer cases, the image including the cancer was included in the analysis. We have established that in the majority of cases the difference in volumetric density between mammographic images with cancer and the opposite (cancer-free) breast is small,197 but the inclusion of diagnostic images showing cancer might have slightly increased the average density of the cases.

One limitation of this analysis is that it includes a mix of prior and diagnostic mammograms; however, owing to the transition of mammography from film screen to digital during the course of the PROCAS project and the necessity of using FFDM images for this study, most of the cancers are diagnostic mammograms. Until we have a larger temporal data set, we are unable to comment on the way in which density changes prior to and at the time of diagnosis, but this will remain a longer-term aim. As the questionnaire was administered at the time of initial mammography in PROCAS, the covariate information may be less accurate for those women with cancers detected as interval cancers or at a subsequent screen.

In this analysis we carried out adjustment for a limited number of factors (age, BMI and menopausal status); however, further adjustment for other factors such as HRT and parity will be an important next step. Another issue with these data is that for Volpara the mean density assessed across the four views was used, while for Quantra the maximum was used. It would be interesting to evaluate both of these strategies on both volumetric methods to establish which produces the most predictive estimates of density.

Case–control study

Aim

The aim of this study was to compare MD in the contralateral breast of screen-detected cancers at the time of diagnosis with that of matched controls and hence to evaluate the performance of different density assessment methods employed in the PROCAS study.

Methods

Design

The study design was a case–control study, in which cases were those who developed unilateral breast cancer during their initial screening round while taking part in the PROCAS study. Cases were matched to controls whose mammograms were deemed cancer free at both the initial and the subsequent screening rounds. For controls, the mammograms from the initial screening round were analysed.

Inclusion criteria

To enable processing with Volpara and Quantra all women had GE digital screening mammograms with raw (‘for processing’) image data. For analysis with Cumulus, processed (‘for presentation’) images were required. For inclusion as a cancer case, breast cancer was identified at the first screen following recruitment to the PROCAS study. For inclusion as a control, a cancer-free screening mammogram subsequent to the initial screen in the PROCAS study was required. These criteria ensured that that risk information was current for the mammograms analysed, and that the control mammograms were unlikely to show early signs of cancer.

Exclusion criteria

All women with a previous diagnosis of cancer were excluded from the study. Cancer cases were excluded if they had bilateral breast cancer or unknown laterality. Women with breast implants and those with unacceptable values for BMI calculated from self-reported height and weight (< 10 kg/m2 or > 60 kg/m2) were also excluded from analysis.

Questionnaire data

PROCAS questionnaire data were used to obtain age, menopausal status and self-reported height and weight.

Identification of cases

Cancer cases were identified by one of three methods:

  1. Matching the PROCAS data set to the Somerset Cancer Registry. The Somerset Cancer Registry is a ‘real-time’ database that collects information about the patient journey.199
  2. Matching with the NWCIS database, a histological database of breast cancer diagnoses, for cancers diagnosed in the north-west.200
  3. Notifications from participants that they had been diagnosed with breast cancer.

The three sources of information were cross-checked and all cancer diagnoses were validated.

Cancers were categorised into those detected at the initial screen in PROCAS and those detected at a subsequent screen, and then those which were obtained on a GE system and had raw data were identified. A total of 324 cancer cases were identified who matched the inclusion and exclusion criteria.

Matching

Controls were identified from the existing PROCAS study database. As they were required to have two screens in the PROCAS study, women with two screening appointments more than 180 days apart were identified. These women were then matched to the PROCAS data set on NHS number and date of the initial mammogram after recruitment to PROCAS, and the exclusion criteria listed above were applied.

Cancer cases were matched to three controls on the basis of age (within 6 months), menopausal status (premenopausal, perimenopausal, postmenopausal or unknown), HRT use (current, never or previous) and BMI categories (underweight < 18.5 kg/m2, normal weight 18.5–24 kg/m2, overweight 25–29 kg/m2 and obese > 30 kg/m2). When an exact match was not possible, the matching criteria were relaxed, for example age matched within 1 year or BMI matched to the next category.

Density assessment

Mammographic density was assessed by area-based methods (VAS and Cumulus) and volumetric methods (Volpara and Quantra). The VAS measures were made by two independent readers per case, drawn from a pool of 17 readers. Cumulus was undertaken by a single trained and validated reader (JS) who assessed a single MLO view of 180 cancers and 540 controls, presented in random order in four batches, each containing approximately 50 cancers and 150 matched controls. The assessor was blinded to case–control status. Version 1.4.5 of Volpara and version 2.0 of Quantra were used to obtain volumetric density data.

Statistical analysis

The data were merged into a single database for statistical analysis. The demographic characteristics were reported as number and percentage by case–control status. Comparisons of categorical data were made using the chi-squared test. For those variables for which the data were ordinal, a chi-squared test for trend was also conducted. Continuous variables were assessed by means of an unpaired sample t-test when the distribution was normally distributed or by the Mann–Whitney U-test when the distribution was not normally distributed.

To examine the relationship between density methods and case–control status, analysis was performed using conditional logistic regression (in SPSS) owing to the matched nature of the data set. Univariate associations were performed initially, and multivariate associations were performed adjusting for breast area (for Cumulus) and for breast volume (for Volpara and Quantra).

Results

Table 38 shows the composition of the case–control data set. There was no significant difference between cases and controls in any of the descriptors listed, apart from the TC risk score computed at entry to the PROCAS study, which was significantly higher for cases than for controls (p < 0.05).

TABLE 38

TABLE 38

Composition of the case–control data set

The mean age of the women was 60 years. Seventy-four per cent were postmenopausal, and 94% indicated that they were not current users of HRT. The mean BMI was approximately 28 kg/m2 (overweight), with about one-third of women in each of the BMI categories. The majority of women declared their ethnicity as white. In the cancer group, and hence in the controls, there was an equal split with regard to the laterality of the cancer.

Table 39 shows univariate analysis of MD measured by the area-based methods.

TABLE 39

TABLE 39

Mammographic density by area-based methods: VAS and Cumulus

Density measured using VAS was significantly associated with cancer status, and showed a dose–response relationship with increasing density (χ2 trend 33.3; p = 0.000). Those in the highest quartile of dense area and percentage density for Cumulus had an increased likelihood of cancer (OR 1.76 and 1.93, respectively), compared with those in the lowest quartile. Adjustment for breast area made little difference to the ORs for Cumulus dense area and Cumulus percentage density. For Cumulus dense area, the OR for the highest category became 1.87, with 95% CI 1.10 to 3.19. For Cumulus percentage density, the OR for the highest density group, following adjustment for breast area, was 1.80 with 95% CI 1.03 to 3.15. Table 40 shows univariate analysis of the volumetric MD measures.

TABLE 40

TABLE 40

Mammographic density by volumetric methods: Volpara gland volume and MD and Quantra gland volume and MD

Volpara percentage density showed an association with cancer status and a dose–response relationship with increasing density (χ2 trend 9.2; p = 0.002). The relationship with Volpara gland volume was less clear.

Adjustment for breast volume for Volpara increased the OR of the highest percentage density group to 2.61 with 95% CI 1.55 to 4.39, and for Volpara gland volume the OR of the highest group was increased to 1.72 with 95% CI 1.12 to 2.64. There was no association between MD measured by Quantra and cancer status, even after adjustment for breast volume.

Discussion of mammographic density measures

We performed a matched case–control analysis using the contralateral breast images of women with unilateral breast cancer to determine which MD method showed the strongest association with the presence of cancer. Much of the literature on MD and risk is based on relative, area-based measures applied to film mammograms, but in the PROCAS study the vast majority of mammograms are FFDM images and hence are amenable to processing by automated volumetric density software. This had the potential additional benefits of allowing absolute rather than relative density measures, which should be less susceptible to change in weight, which has previously been associated with a change in the fatty content of the breast.201 Such methods enable objective measurement that is independent of observer bias and imaging parameters, and is reproducible and feasible on a large scale. However, in this analysis, subjective assessment by mammographic readers demonstrated the strongest relationship with cancer, despite known interobserver variability.202

Volumetric measures fared less well, with the exception of percentage density measured using Volpara. Commercial volumetric measures were developed to fulfil a need for density assessment in the USA, where readers in many states are obliged by law to inform women of their MD; most readers currently use the subjective BI-RADS categorisation, but FDA-approved volumetric methods are an attractive alternative in a litigious environment. Volumetric measures are thus used most often not to identify risk of developing cancer, but to identify women for whom mammography is less effective. However, the volumetric software from both manufacturers is evolving to quantify density more accurately in response to the drive for personalised screening. It is possible that the relationship of dense tissue to fat in area-based measures is more strongly related to cancer risk than that in volumetric measures, and that using volumes of fat and gland independently and in different proportions may provide improved risk prediction. Furthermore, manufacturers of volumetric density software have recently begun to output area-based measures of density. Our results also indicate that correcting MD measures for breast volume may help in strengthening the association with cancer.

Visual assessments made by the readers in the PROCAS study are subject to inter- and intraobserver variability.202 The VAS density readings were undertaken in a pragmatic fashion at the time of radiological assessment of the images rather than in a carefully managed, artificial environment. Despite this, the average VAS reading from the pair of readers was found to be associated with cancer, with the OR increasing for higher density estimates. VAS reading is relatively time-consuming and required subsequent automated analysis of the VAS forms to convert the markers into percentages. It would, however, be straightforward to computerise the process, with readers sliding a cursor to indicate percentage. As such, VAS was chosen for incorporation into the best-performing prediction model for the purposes of this report. VAS was also available on all subjects.

The semiautomated thresholding approach showed some relationship with cancer, but was not as effective as either VAS reading or Volpara percentage density. All of the Cumulus assessments took place in a limited time period by a single observer blinded to case–control status, but this was a considerable time after reader training and validation, and the images were acquired using FFDM, unlike those used for validation. Cumulus is impractical for large-scale use, as it is labour intensive and requires a skilled operator.

Density case–control study of film priors using the Manchester Stepwedge and visual assessment score

Aim

The aim of this case–control study was to compare MD in the screening round prior to detection of breast cancer using a case–control methodology. Differences were compared using an area-based (VAS) and a volumetric-based (Manchester Stepwedge) measure.

Methods

The study design was case–control, in which cases were those who developed unilateral breast cancer after their first screening round while taking part in the PROCAS study. Cases, therefore, had to have a ‘normal’ screen prior to developing breast cancer during their second screen or between screens. Cases were matched to controls who were deemed to be cancer free at both the initial and the subsequent screening rounds. For cases and controls, the mammograms from the initial screening round were analysed.

To enable processing with the Manchester Stepwedge, only data from women who were imaged using analogue mammography with the stepwedge calibration object in position at the first screen following recruitment to the PROCAS study were included. All women with a previous diagnosis of cancer were excluded from the study. Cancer cases were excluded if they had bilateral breast cancer or unknown laterality. Controls were also identified from the PROCAS study database as women with two screening appointments more than 180 days apart, and an initial PROCAS film mammogram showing a stepwedge imaged alongside the breast. For controls, the subsequent screening mammogram was read as cancer free. Women with breast implants and those with infeasible values for BMI calculated from self-reported height and weight (< 10 kg/m2 or > 60 kg/m2) were also excluded from analysis. Questionnaire data were used to obtain age, menopausal status and self-reported height and weight.

Cancer cases were matched to one control on the basis of age (within 1 year), menopausal status (premenopausal, perimenopausal, postmenopausal or unknown), HRT use (current, never or previous) and BMI categories (underweight < 18.5 kg/m2, normal weight 18.5–24 kg/m2, overweight 25–29 kg/m2 and obese > 30 kg/m2). When an exact match was not possible, the matching criteria were relaxed, for example age matched within 18 months or BMI matched to the next category.

Density assessment

Mammographic density of cases was assessed in the prior mammogram of the breast that developed breast cancer; for controls, density of the same breast as that of their matched case was used. The VAS measures were made by two independent readers per case, drawn from a pool of 17 readers. The Manchester Stepwedge software was used to produce results of volumetric breast densities for both groups of patients. The programme enabled the operator to identify the stepwedge and the positions of radio-opaque markers along the edges of the mammogram. These data were used along with calibration data to estimate the thickness of dense tissue at all points in the compressed breast image. The software outputs breast volume, dense volume and percentage of dense tissue (dense volume as a proportion of breast volume)

Statistical analysis

Demographic characteristics were reported as number and percentage by case–control status. Comparisons of categorical data were made using the chi-squared test. For those variables where the data was ordinal, a chi-squared test for trend was also conducted. Continuous variables were assessed by means of an unpaired sample t-test when the distribution was normally distributed or by the Mann–Whitney U-test when the distribution was not normally distributed.

To examine the relationship between density methods and case–control status, analysis was performed using conditional logistic regression (in SPSS) owing to the matched nature of the data set.

Results

In total, 104 women with analogue mammograms developed breast cancer during the course of the study. Forty-four of these were diagnosed at their first screen while taking part in the PROCAS study and were therefore not eligible for this particular case–control study. The remaining 60 women were eligible for inclusion; however, following exclusion of those for whom there was no calibrated stepwedge on the mammograms and those with missing analogue mammograms, the available sample was 49 women. For one further subject, the software failed, and this subject was subsequently excluded from the study.

Women were matched to one control, and Table 41 shows the demographic characteristics for cases and controls. The matching criteria were adequate, with women in the case and control groups being of similar age (mean approximately 59 years) and BMI (mean approximately 27 kg/m2), and with similar proportions of women who were postmenopausal (65% both groups) and current users of HRT (23% of cases and 25% of controls). Study participants were also similar with regard to other characteristics, including parity (approximately 85% in each group were parous), initial TC score (mean: cases 3.02, controls 2.84; p = 0.46), ethnicity, previous breast biopsies and year of mammogram.

TABLE 41

TABLE 41

Demographic characteristics for PROCAS stepwedge case–control study

Table 42 shows the number and percentages of cancer and control subjects for each density method. Density methods were split into quartiles, with the lowest quartile used as the referent group. There were no statistically significant associations with any density method; however, the VAS for the CC view did approach statistical significance for the second quartile (OR 3.24, 95% CI 0.99 to 10.54). We did not adjust for any other factors.

TABLE 42

TABLE 42

Number and percentages of case and control subjects for quartiles of density measures, with associated ORs (95% CIs)

Discussion

These data are interesting because by using the prior mammograms of cancer rather than the contralateral mammogram at time of diagnosis the density data genuinely assess risk of developing cancer. However, the numbers are very small and no density measure achieved statistical significance.

There were limitations of the method that affected the viability of some of the results. With regard to the markers which are used to measure compressed breast thickness, ideally two pairs should be located to enable the measurement of paddle tilt. In most cases the software identified at least two pairs of markers, but in some instances incomplete or poorly located pairs were identified (e.g. when a marker coincided with a patient identification label). This produces inaccurate thickness estimates and hence errors in density assessment.

Ideally, we would have liked to have analysed MLO views as well as CC views with the stepwedge method, but the process of digitisation and analysis is time-consuming, and we decided to perform an initial investigation of a single view in the first place. We did, however, analyse VAS results for both mammographic views and for the CC view alone to enable comparison with the stepwedge method. VAS for a single view approached statistical significance for those in the second quartile. We would also ideally match with more controls to increase the power to detect a significant effect. These analyses do not correct for other factors such as BMI.203 Although we did not correct for BMI explicitly, data were matched on BMI category.

The data set contained both interval (n = 11) and screen-detected cancers (n = 38). Had any signs of abnormality been missed when the prior mammogram was first read, this might have had an impact on the density at the initial screen.

From these data we are unable to predict the presence of cancer from MD, with either area-based or volumetric density methods. However, the evaluation of a larger set of images is required; previous research has demonstrated the ability of visual and computer-assisted density assessment to predict later cancers in more extensive data sets.167,204 Longitudinal assessment (such as that employed by Kerlikowske et al.,205 but using continuous objective density assessment) may also be important.

The stepwedge method has previously been evaluated in a screening population,206 and was found to be a feasible but time-consuming method of obtaining volumetric estimates from film mammograms. The main drawback of the technique is that the stepwedge and markers have to be imaged at the time of mammography, and images without these objects cannot be analysed. The numbers of cancers evaluable in the study were too small to make any meaningful evaluation. As it is now possible to calibrate digital mammograms without using a stepwedge, this method is unlikely to be used in the future.

The relationship between volumetric and area-based mammographic density to age and hormonal factors

Introduction and aims

Percentage breast density estimated visually or assessed by computer-assisted area-based measures declines with age, menopausal status and parity and increases with current HRT use.58,207211 Automated volumetric density measurement methods, including Quantra and Volpara, remove subjectivity; it is important to determine how these methods relate to age and endocrine changes, and here we describe these associations. For comparison we also present VAS measurements.

Methods

Women undergoing routine screening in the NHSBSP who agreed to enter the PROCAS study completed questionnaires concerning personal information, including weight, height, parity, menopausal status and HRT use.

There were originally 50,929 subjects, of whom 731 were excluded owing to a previous diagnosis of cancer, as were a further 506 with a current diagnosis of cancer. At the time of analysis, volumetric density measures were available for Quantra (23,253) and Volpara (11,947) subjects. From these cohorts, a further 1692 (7.3%) and 826 (6.9%) were excluded from the Quantra and Volpara groups, respectively, because they had a BMI outside the range 17.5–60 kg/m2. A further 188 and 95 subjects were excluded because their average breast density distribution had not returned a reading. Finally, a further three in each group were excluded owing to discrepancies in the reporting of ‘ever used HRT’ and ‘still on HRT’. Thus, the final numbers of subjects in this substudy were 21,370 in the Quantra data set and 11,023 in the Volpara 1.4.0 data set.

Results

Descriptive statistics for the subjects for each of the two density methods are shown in Table 43. Density significantly declined with age (p < 0.001) by both methods. Figure 26 shows plots of gland volume and percentage density for Quantra and Volpara. Both methods show a decline in volumetric density until the 56–60 years age group for percentage density, and a further decline in gland volume until the 61–65 years age group is seen only with Volpara.

TABLE 43

TABLE 43

Descriptive statistics for subsets analysed by Quantra and Volpara

FIGURE 26. Plots of gland volume and percentage MD by volume by age for Quantra and Volpara.

FIGURE 26

Plots of gland volume and percentage MD by volume by age for Quantra and Volpara. (a) Gland volume vs. age (Quantra); (b) gland volume vs. age (Volpara); (c) percentage density vs. age (Quantra); and (d) percentage density vs. age (Volpara).

The effect of hormonal factors on glandular volume is illustrated in Figure 27. In women aged < 50 years the mean percentage density by Quantra was 19.23 for premenopausal women and 16.29 for postmenopausal women (p < 0.05), and in women aged 51–55 years these figures were 18.06 and 15.72, respectively (p < 0.05). For density measured by Volpara, the corresponding figures are 7.74 and 6.64 (p < 0.05) for women aged < 50 years, and 7.54 and 6.00 (p < 0.05) for women aged 51–55 years.

FIGURE 27. Plots of gland volume and percentage MD by age and menopausal status, HRT use and parity for Quantra and Volpara.

FIGURE 27

Plots of gland volume and percentage MD by age and menopausal status, HRT use and parity for Quantra and Volpara. Gland volume vs. menopausal status by age group by (a) Quantra and (b) Volpara (black, premenopausal; blue, perimenopausal; green, postmenopausal); (more...)

Current HRT use was associated with significantly higher percentage density (p < 0.05) for those over 50 years (Volpara) and from 51 to 70 years (Quantra). Nulliparity was associated with higher density at all ages (p < 0.05 for Quantra above the age of 50 years and for Volpara for age 51–65 years).

Conclusion

These data indicate that volumetric percentage density by both methods is related to age and endocrine factors in the same directions as area-based methods. In women under the age of 55 years, density was significantly higher for premenopausal than postmenopausal women. For those aged over 50 years, current use of HRT also showed an increase in density, as did nulliparity.

Ethnicity and mammographic density

Introduction and aims

Increased mobility of the world population has resulted in many countries having a diverse ethnic mix, now apparent in the breast screening age group in Greater Manchester.212 Ethnicity is related to risk of breast cancer, with women of white ethnicity being more likely than women in other racial groups to develop breast cancer.213 In one study, approximately 141 per 100,000 women of white ethnic origin were found to have developed the disease, compared with 119 per 100,000 for African American women, 96 per 100,000 for Asian American women, 90 per 100,000 for Hispanic/Latina women and 50 per 100,000 for Native American/Alaskan native women.214

Published data relating MD to ethnicity have yielded mixed results. A UK study of 428 symptomatic patients using Quantra showed significant differences between white, Asian and black women, but did not control for any confounding factors such as age or HRT use.215 White, Hispanic, Asian, Native American and black women participated in a study of 28,501 women in a breast-screening programme in Washington, USA.216 After adjusting for age, differences in MD were found between Native American and white women, and between white and Asian women. However, when BMI, HRT use, menopausal status and parity were taken into account, the difference between Native American and white women was no longer significant. More recent research in similar ethnic groups evaluated the breast density of 442 women.217 African American women were found to have higher density than Asian American women after adjusting for BMI, family history, menstrual and reproductive factors. In this work, Asian American and white women were found to have similar MDs. In contrast with this, a British study found that Asian women had significantly lower breast density, assessed using Wolfe grades, than Caucasian participants.218 However, in a study of 15,292 women of Asian, white, African American and other (Native American and Caribbean) racial backgrounds, no significant differences were found when confounding factors, including bra size, were taken into account.219 The picture is, thus, unclear; previous studies have evaluated different populations using a variety of methodologies, including subjective assessment of density.

Use of automated digital volumetric measurement of breast density may offer advantages over visual and semiautomated methods, including objectivity, reproducibility, suitability for population-based studies, resolution and the ability to assess absolute, rather than relative, breast density.220 Regardless of the degree of association with risk, the identification of women with high MD is important because the detection of cancers using conventional mammography is more difficult in this case,221 and it may be appropriate to use alternative screening methodologies.

The aims of this substudy are to determine whether or not there is an association between MD and ethnicity in the PROCAS cohort.

Method

Data were used in this substudy from women for whom GE FFDM images with raw (‘for processing’) image data and questionnaire data on ethnicity, date of birth, HRT use, weight and height were available. Records for all non-white British or Irish participants recruited to PROCAS before 15 June 2011 were examined, and the first 1038 white British or Irish PROCAS participants with suitable data were also included. Women diagnosed with cancer at the time of screening and women with previous breast cancers were excluded.

Mammograms were analysed using Hologic’s Quantra software, which provided measures of breast volume, glandular volume and percentage density by volume for the left and right breasts. These were averaged to provide a single measure of each type per woman.

Questionnaire data were extracted from the PROCAS study database. BMI was calculated from self-reported height and weight. One-way analysis of variance was used to determine whether or not a relationship existed between the breast density measures and ethnicity. Pairwise comparisons were carried out on each ethnic group versus white British/Irish (the largest group which was used as a reference) using Scheffé’s test. A general linear model was used to further investigate the link between average breast density and ethnicity while adjusting for HRT use, BMI and age. Univariate analysis of the variables was performed and pairwise comparisons were done using Bonferroni’s test.

Ethnicities

The ethnic categories available for participants to select on the questionnaire were Asian or Asian British – Bangladeshi, Indian, Pakistani, Chinese; black or black British – African or Caribbean; Jewish origin; Jewish Ashkenazi; mixed – white and black African/Asian/black Caribbean; white – British or Irish; and other – please specify. Women were instructed to ‘please tick all that apply’ on the questionnaire. In subsequent analysis, the Jewish Ashkenazi women were included in the ‘Jewish origin’ category.

Results

The age of participants ranged from 46 to 74 years. The mean BMI for all the ethnic groups in the study was > 25 kg/m2, in the overweight range. Mean ages and BMIs for each group are shown in Table 44.

TABLE 44

TABLE 44

Mean BMI and age in each ethnic group

Just over one-third of the women in the study had ever used HRT. Usage was highest in the Jewish group and lowest in women of black origin and those of Asian or mixed race (Table 45). The mean age of women who reported ever using HRT (61.41 years) was significantly higher than that of women who had never used it (57.19 years) (p < 0.01). This is likely to relate to the menopausal status of the individuals.

TABLE 45

TABLE 45

Hormone replacement therapy use for women of different ethnicities

The volumetric MDs of women in the different ethnic groups are presented in Table 46, along with fat and gland volumes.

TABLE 46

TABLE 46

Gland and fat volumes (cm3) and percentage density by volume for women of different ethnicities

Using Scheffé’s test, slight differences were observed in the average volumetric density in all ethnic groups. However, only women of Jewish ethnic origin had significantly higher breast density than the white British or Irish population (p = 0.012).

Once adjusted for age, BMI and HRT use, the results showed that the difference between average breast density of the Jewish participants and that of the white British or Irish women was of only borderline significance (p = 0.053).

Discussion

Investigation of the relationship between breast density and ethnicity, although facilitated by the availability of automated methods of measuring density, remains difficult because of the many confounding factors such as the possible impact of a change in lifestyle on second-generation immigrants, and wide variations between definitions of ethnic groups. This substudy is the first that has specifically compared the volumetric breast density of Jewish women with that of white British or Irish women; this comparison is particularly interesting because of the known difference in genetic susceptibility to breast cancer of Ashkenazi Jewish women.222 The high rate of HRT use found in this group is of interest.

The population studied is unlikely to be representative of women of screening age in Greater Manchester, as attendance at screening is not uniform across all ethnic groups, with women of non-white origin less likely to present for screening.223 Further, the sample was selected from the PROCAS study on a pragmatic basis aimed at maximising the proportion of non-white British participants. The mobile units used for screening relocate to facilitate access, and the uptake of screening and the proportion of women consenting to take part in PROCAS vary according to location, with lower rates in less affluent areas of the city.

The work reported here uses Quantra for MD assessment. This holds several potential advantages over visual and semiautomated methods including objectivity, reproducibility, suitability for population-based studies, resolution and the ability to assess absolute, rather than relative, breast density.220 Regardless of the degree of association with risk, the identification of subgroups of women with high MD is important because the detection of cancers using conventional mammography is more difficult in this case,221 and it may be appropriate to use alternative screening methodologies.

In this sample we found that the only ethnicities for which, after adjusting for potential confounding factors, there was some limited evidence of a difference in breast density were white British or Irish and Jewish women. This is in contrast to recently reported data from the UK, which found that Asian women had lower breast density as measured by Quantra; however, that research was carried out in a symptomatic population rather than a screening population, and did not adjust for confounding factors such as age and BMI.215 Quantra also provides data on volume of glandular tissue in the breast. This may be more reliable than percentage density, as it is affected less by the weight of the women at the time of imaging.201

Using breast volume as a surrogate for weight/body mass index

Introduction

Mammographic density is one of the strongest modifiable risk factors for breast cancer and is usually reported as a relative measure, describing the proportion of the breast area or volume occupied by radio-dense tissue. However, there is evidence that when women gain weight, they gain breast fat and hence gain breast volume.201 This results in a decrease in percentage breast density and hence a decrease in the apparent risk of developing breast cancer, despite the gain in weight conferring an actual increase in risk for postmenopausal women. Similarly, loss of weight (which is beneficial in terms of breast cancer risk) leads to an increase in percentage density and apparent risk. For this reason, breast density measurements made for the purpose of assessing cancer risk are often corrected for BMI or weight as well as for age.211,224

As weight is not routinely recorded at mammographic screening, and the weight of many women changes between screens, it would be advantageous to find a surrogate measure that could be used to correct relative density measures. It has been proposed that the breast volume or fat volume computed by commercial breast density software could be used in this way.225 The aim of this evaluation198 is to determine whether or not this would be appropriate.

Method

We used PROCAS questionnaire data, including height and weight at the time of screening, and the corresponding digital screening mammograms were analysed with Quantra (version 1.3) and Volpara (version 1.3.1). Quantra outputs total breast volume and dense tissue volume for both breasts, combining data from the CC and MLO views. A single average total breast volume was calculated for each case. The fat volume was obtained by subtracting the dense tissue volume from the total breast volume. Volpara gave measures of average breast volume and average fat volume from both the left and the right breasts. It also provided average total breast volume and average fat volume for each case.

Women were excluded if they had had a previous breast cancer or tissue biopsy, if essential data were missing or invalid or if their recorded clothes size was unrealistic for their calculated BMI. We thus analysed data from 7398 women, out which 500 data sets were set aside for evaluation, leaving a sample of 6898 data sets. Weight ranged from 36 kg to 172 kg and BMI ranged from 15.17 kg/m2 to 62.38 kg/m2, with 95.7% of women declaring themselves to be white British or Irish.

To test the significance of a possible association between weight or BMI and the volumetric breast measures, Pearson’s correlation was used. This was run as a two-tailed test, and a correlation coefficient (r) > 0.4 was taken as a positive correlation between two variables. Owing to the large population size, an additional criterion of r > 0.7 was used to indicate a significant association, in this instance the correlation squared (r2) is 0.49, which would indicate that almost half of the variation in ‘true’ BMI (or weight) between women could be explained by the prediction model. Linear regression was used to produce predictive models for weight and BMI from the sample population; these were applied to the test set data in order to predict weight and BMI for these 500 women. Predicted values were compared with self-reported weight and BMI values and analysed by calculating intraclass correlation (ICC).

Finally, to increase the data available for testing, the predictive models were applied to the entire data set available and histograms were plotted to show the differences between self-reported and calculated weights and BMIs.

Results

All volumetric breast measurements showed a positive correlation with either weight or BMI (Table 47). Figure 28 shows an example plot for weight versus Volpara breast volume.198 In general the points on the graphs become more scattered with increasing volumetric measurements, suggesting that predictive models may perform better for women of lower weight/BMI.

TABLE 47

TABLE 47

Correlations between density measures and self-reported weight and BMI in a population of 6898 women

FIGURE 28. Self-reported weight plotted against breast volume measured by Volpara.

FIGURE 28

Self-reported weight plotted against breast volume measured by Volpara.

For illustration, self-reported and predicted values for the minimum, maximum and mean self-reported weights and BMIs in a separate group of women were obtained and are presented for Quantra in Table 48 and for Volpara in Table 49.

TABLE 48

TABLE 48

Self-reported and predicted weight and BMI in a separate group of 408 women

TABLE 49

TABLE 49

Self-reported and predicted weight and BMI in a separate group of 237 women

Intraclass correlation was used to assess agreement between predicted and self-reported values. For all predicted values, the ICCs indicated moderate agreement. For weight, the ICC ranged between 0.609 and 0.634. The lowest ICC agreement was obtained when using the values predicted using Volpara breast volume data, for which the ICC = 0.609 (95% CI 0.522 to 0.683; p < 0.001). The highest ICC agreement was obtained when using the values predicted using Quantra fat volume data, for which the ICC = 0.634 (95% CI 0.573 to 0.689; p < 0.001).

For BMI, the ICC ranged between 0.594 and 0.629. This indicates a slightly weaker agreement between these predicted values and self-reported values than for weight. The lowest ICC agreement between self-reported and predicted values was obtained with values predicted using Volpara breast volume data, for which the ICC = 0.594 (95% CI 0.505 to 0.671; p < 0.001). The highest ICC was the same for two of the sets of predicted values analysed against self-reported values, the values predicted using Quantra breast volume data and those predicted using Quantra fat volume data, for which the ICC = 0.629 (95% CI 0.567 to 0.685; p < 0.001).

Frequency histograms showing the difference between actual (self-reported) and predicted weight and BMI values and those calculated by applying the predictive models to the entire data set are shown in Figure 29.

FIGURE 29. Frequency histograms (a) showing the difference between actual (self-reported) and predicted weight for Volpara; (b) showing the difference between actual (self-reported) and predicted weight for Quantra; (c) showing the difference between actual (self-reported) and predicted BMI for Volpara; and (d) showing the difference between actual (self-reported) and predicted BMI for Quantra.

FIGURE 29

Frequency histograms (a) showing the difference between actual (self-reported) and predicted weight for Volpara; (b) showing the difference between actual (self-reported) and predicted weight for Quantra; (c) showing the difference between actual (self-reported) (more...)

Discussion

One of the limitations of this work is that we have assessed the volumetric measures against self-reported, rather than measured, values. It is known that self-reported data are subject to errors, particularly in people who are overweight,226 and this could potentially have contributed to the greater spread of data seen with increasing weight. For similar reasons, it is unsurprising that prediction of BMI was less successful than prediction of weight as the calculation of BMI involved the use of two self-reported data items, weight and height. It is, however, difficult to obtain measured values for weight and height in a screening setting, in which appointments are short and space and privacy on mobile units are limited, and so the use of self-reported data is a pragmatic solution. Should risk-adapted screening be introduced, it is likely that many of the data collected will be self-reported, and verification will only be implemented should women fall into high-risk groups, or on the borderline between average and high risk.

Our results indicate that volumetric breast measurements made from mammograms are not an adequate surrogate for self-reported weight and BMI in models of individual risk. A correlation of 0.634 does not provide evidence of accurate prediction (the correlation squared is only 0.40, which can be interpreted as the proportion of variation in ‘true’ BMI between women explained by the prediction model – this is not a large proportion). However, volumetric measurements could be used as a sanity check on self-reported data, rather than asking about clothes size, for example, which is itself error-prone because of variations in size between clothing manufacturers. They could also be used in cases where women fail to provide self-reported data.

Repeatability of visual assessment score assessment of mammographic density

Visual assessment of MD is the only method that was applicable to all mammograms in PROCAS. This is a subjective, relative, area-based method, and in PROCAS it was implemented as a visual estimate of percentage density by an expert observer recorded on a VAS. This form of MD estimation has been shown to have a strong association with risk of breast cancer when both the MLO and CC views of the breast are used,167 and there is evidence that measurements of the relative area of dense tissue are more predictive of breast cancer than categorical measurements of MD.33 However, the inherent subjectivity of visual assessment is of concern when it is applied for risk stratification and the reliability of identifying individuals at increased or reduced risk of breast cancer is important. Although there is a literature on the variability of categorical assessment of MD by observers,205,227 or using computer-assisted thresholding methods such as Cumulus,204 there was a lack of data on variability of visual assessment recorded on a continuous scale until we undertook this analysis for the PROCAS study.202

Aim

The aim of this substudy was to examine the repeatability of breast density assessment using VASs.

Method

Seven of the PROCAS VAS density assessors (five consultant radiologists and two breast physicians) each repeated the assessment of breast density for 100 sets of mammograms that they had previously assessed for MD during the PROCAS study. The level of agreement between the original and repeat assessments was investigated.

The readers had between 2 and > 10 years’ experience of MD assessment using VAS and had all completed between 3000 and 7604 density assessments in 2011, with the exception of the reader with > 10 years’ experience, who had undertaken 661 VAS assessments that year.

A set of 100 cases was selected for each reader by randomly sampling 10 sets of mammograms from each decile of the distribution of VAS density as assessed in the PROCAS study by that reader during the period May 2010–May 2011. Selection in this manner ensured the inclusion of cases across a range of densities, and an interval of at least 12 months between the initial assessment of the images and the repeat assessment undertaken for this substudy in May and June 2012. The mammograms were cases that had been read as normal, and were from women without a previous history of breast surgery of any kind, or previous breast biopsy. All of the mammogram images were produced by GE Senographe Essential FFDM systems (GE Healthcare Ltd, Chalfont St Giles, UK); in PROCAS we have observed differences in mean visually assessed density between different types of mammograms (GE Digital, Fischer Digital and analogue), so for consistency we used images from a single platform.

Both the initial MD estimation undertaken for the PROCAS study and the repeat assessment for this substudy took place under similar conditions. Readers viewed the MLO and CC images of both breasts and marked their density estimates for each view on a single paper form with a set of four 10-cm VASs, labelled 0% and 100% at the ends. The forms were scanned and density estimates were converted into numeric values using custom software. All assessment took place in the same clinical reporting room, with images displayed on Planar Dome E5 5MP self-calibrating monitors (Ampronix Imaging Technology, Irvine, CA, USA). Readers were blinded to the assessment of other readers and to their own previous assessments.

Agreement between the initial and repeat sets of density results for each reader was examined using the Bland–Altman limits of agreement framework.228,229 Differences between paired readings were plotted against their mean, with horizontal lines indicating the mean difference and 1.96 standard deviations (SDs) above and below the mean difference. These lines represent the limits of agreement, between which 95% of differences are expected to lie.229 This approach is often used when the level of agreement between two distinct methods of measurement is evaluated. In this substudy, we treated pairs of density assessments as replicates by the same method, that is, repeat observations at different times. We can thus also examine a reader’s coefficient of repeatability, 2.77 sw, where sw is the within-subject SD, estimated by the square root of the residual mean square in a one-way analysis of variance.229 Replicates by the same method are expected to be within one coefficient of repeatability of each other for 95% of subjects.

Results

In Table 50 we present a summary of the differences between the initial (‘old’) density assessments of each reader and those (‘new’) assessments produced for this repeatability substudy. Results are shown by mammographic view for each of the seven readers, along with estimates of the 95% limits of agreement, within-subject SD and coefficient of repeatability. Six of the seven readers have positive mean differences for all views. This indicates, on average, an increase in density estimates between the initial assessment and repeat assessment. For the seventh reader, on average, a decrease in density estimates was observed.

TABLE 50

TABLE 50

Differences between repeated breast density estimates, by mammographic view and reader

Reader 4, who had only 2 years’ experience reading using VAS, showed the widest limits of agreement (–13.95 to 40.43 for the left mammogram lateral oblique view) and largest coefficient of repeatability (38.60 for the right CC view). The sizes of the mean differences for reader 4 (10.19 to 14.66 percentage points) are considerably larger than those for the other six readers (0.34 to 5.78 percentage points). Reader 1 (the most experienced VAS reader with the lowest annual VAS workload) showed the narrowest limits of agreement (–11.15 to 17.35, left CC view) and smallest coefficient of repeatability (14.40, left mammogram lateral oblique view).

Scatterplots of the new and old estimates, and Bland–Altman plots of the differences between them, are presented for all readers in Figures 30 and 31. There is little variation between the different mammographic views within readers. The high proportion of points above the lines of perfect concordance in Figure 30 shows the increase from the initial to repeat assessments by most readers, while Figure 31 illustrates the variation around the mean difference between the estimates for each reader.

FIGURE 30. Scatterplots of new (y-axes) and old (x-axes) density estimates by reader and view.

FIGURE 30

Scatterplots of new (y-axes) and old (x-axes) density estimates by reader and view. All axes range from 0% to 100%. Solid diagonal lines are the lines of perfect concordance. LCC, left CC; LMLO, left mammogram lateral oblique; RCC, right CC; RMLO, right (more...)

FIGURE 31. Bland–Altman plots of the difference (new – old) against the mean of new and old readings.

FIGURE 31

Bland–Altman plots of the difference (new – old) against the mean of new and old readings. Solid horizontal lines are the mean differences and dashed horizontal lines are the 95% limits of agreement. LCC, left CC; LMLO, (more...)

Discussion

We have investigated the repeatability of MD assessment by visual assessment recorded on VASs. Six of the seven readers’ density estimates were higher at the repeat assessment; this may at least partly be explained by the effect of the gradual transition from screen film mammography to digital mammography during the period of investigation. Screen film mammography was still being used during the initial PROCAS assessments of density, so at that time readers were assessing images from both modalities. We have observed that visual assessments of film mammograms tend to be higher, on average, than those of digital mammograms. It is possible that once the film comparator was removed (as was the case by the time of the repeat assessments) the readers may have realigned their baseline for digital mammography and hence increased their estimates.

Variation around the mean difference between current and previous readings was considerable, with 95% of differences expected to fall, at best, within 14.25 percentage points of the mean. Such a high degree of variability is problematic where MD is used to identify individuals at increased risk of breast cancer or where visual assessment of breast density is used to predict response to a preventative intervention. For example, a 12- to 18- month reduction in visual breast density of at least 10 percentage points was found to be a predictor of a reduction in risk of breast cancer in a study of tamoxifen as a preventative treatment in high-risk women.230 This result was obtained using a single, highly experienced, observer and the level of intraobserver variability found in our pragmatic study would not be clinically acceptable in this context. However, it is worth noting that a previous study using synthetic mammograms found that reader accuracy in assessing change in VAS breast density could be improved by viewing the current images and priors simultaneously.231

Interobserver variability of visual assessment score assessment and correction method

Introduction

Interobserver variability is inherent in the visual assessment of breast density, and was investigated in PROCAS using a common set of images assessed by 12 readers.195 To account for the differences, we developed a method for adjusting readers’ estimates.193

Methods

To assess interobserver variability, the MD of 120 screening cases with GE Senographe Essential FFDMs was independently assessed by 12 experienced mammographic readers (consultant radiologists, breast physicians and advanced practitioner radiographers). Assessment was performed using a separate VAS for each projection (CC and MLO) of each breast. These were scanned and converted to percentages, and then averaged. A Cumulus density result was also produced for each image by a trained and validated user. The level of agreement between readers was assessed using Bland–Altman limits of agreement228 and the concordance correlation coefficient (CCC). The VAS percentage densities were also converted to BI-RADS breast composition categories [(1) < 25% glandular; (2) 25–50% glandular; (3) 51–75% glandular; and (4) > 75% glandular] and agreement on this ordinal scale was measured with Cohen’s weighted kappa.

We developed a two-stage method to adjust different observers’ estimates of breast density in order to make them comparable.193 First, results from all observers are transformed onto the same distribution. Individual readers produce VAS density results on their own distribution; we compute the empirical cumulative distribution function (ECDF) separately for each view by each reader, and then construct the overall ECDF by averaging the individual ECDFs, weighting each reader equally. We transform an original ‘raw’ VAS by a reader to its position in the ECDF of the reader, and then transform that position in the overall ECDF back to the 0–100% density scale.

Different readers perform assessment on different sets of cases, and each case is assessed by two readers. The second stage of the process is to account for any differences in case mix seen by different observers, exploiting differences in pairwise assessment after the first stage of the process. We can then estimate a correction factor for each reader to deal with differences in case mix.

We applied this two-stage approach to 12 experienced mammographic readers assessing a total of 13,694 screening cases from the PROCAS study. We then investigated the effect of using the adjustment method on risk stratification by examining the numbers of women who would be reclassified into a different risk group after adjustment.

Results

This section contains text reproduced with permission from Sergeant JC, Walshaw L, Wilson M, Seed S, Barr N, Beetles U, et al. ‘Same task, same observers, different values: the problem with visual assessment of breast density’, Proc SPIE 8673, Medical Imaging 2013: Image Perception, Observer Performance, and Technology Assessment, 86730T, March 28 2013.202

The greatest difference between any two readers’ estimates for the same case was 67.75 percentage points, while the mean difference between two readers ranged from 0.76 to 28.58 percentage points. The 95% limits of agreement between pairs of readers were –6.96 to 18.62 at their narrowest and –59.13 to 1.97 at their widest. Pairwise CCC values ranged from 0.44 to 0.92, while the overall CCC across the 12 readers was 0.70. Pairwise kappa values for the BI-RADS classification ranged from 0.37 to 0.84, with a mean of 0.65.

An illustration of the variability of reader estimations of density is shown in Figure 32. Here, the readings of each of the 12 readers are plotted against each other, and against the Cumulus estimates of density. Figure 33 shows Bland–Altman plots illustrating pairings with the poorest and best agreement. For Cumulus, the widest 95% limits of agreement were for observer 9 (–3.94 to 49.71) and the narrowest were for observer 3 (3.29 to 32.55); these are shown in Figure 34. CCC values ranged from 0.48 to 0.87 and kappa values ranged from 0.40 to 0.80.

FIGURE 32. Scatterplot matrix of density results by observers (labelled 1–12) and Cumulus, with all axes scaled 0–100% and lines of perfect concordance shown.

FIGURE 32

Scatterplot matrix of density results by observers (labelled 1–12) and Cumulus, with all axes scaled 0–100% and lines of perfect concordance shown.

FIGURE 33. Example Bland–Altman plots for (a) the pair of readers with the widest 95% limits of agreement (observers 1 and 8); and (b) the pair with the narrowest 95% limits of agreement (observers 2 and 7).

FIGURE 33

Example Bland–Altman plots for (a) the pair of readers with the widest 95% limits of agreement (observers 1 and 8); and (b) the pair with the narrowest 95% limits of agreement (observers 2 and 7).

FIGURE 34. Example Bland–Altman plots for (a) the widest 95% limits of agreement with Cumulus (observer 9); and (b) the narrowest 95% limits of agreement with Cumulus (observer 3).

FIGURE 34

Example Bland–Altman plots for (a) the widest 95% limits of agreement with Cumulus (observer 9); and (b) the narrowest 95% limits of agreement with Cumulus (observer 3).

We applied the two-stage density adjustment approach to 13 experienced mammographic readers assessing a total of 13,694 screening cases. Figure 35 shows box plots of the readers’ estimates of MD for the cases they read. A scatterplot matrix of pairwise density assessments corresponding to these is shown in Figure 36.

FIGURE 35. Box plots showing reader VASs for 13 readers, each reading a subset of 13,694 screening cases.

FIGURE 35

Box plots showing reader VASs for 13 readers, each reading a subset of 13,694 screening cases.

FIGURE 36. A scatterplot matrix of pairwise density assessments corresponding to the assessments in Figure 35.

FIGURE 36

A scatterplot matrix of pairwise density assessments corresponding to the assessments in Figure 35. Blank cells in the scatterplot matrix indicates readers who never assessed the same mammograms. All axes are scaled 0–100% and lines of perfect (more...)

In PROCAS, women are categorised as having a high risk of developing breast cancer if their 10-year risk as computed by a validated family history-based risk model is between 5% and 8% and their breast density is at least 46%, the 90th percentile observed in the study population. In this substudy, 1125 of the 13,694 women had a 10-year breast cancer risk of at least 5% but < 8%. Of these, 126 had initial breast density estimates of at least 46%, and were therefore classified as high risk. Following VAS density adjustment, 147 women were classified as high risk. Of these, 35 women were reclassified (3.5% of those initially classified as non-high risk). Fourteen women who were initially classified as high risk had their risk category reduced after adjustment (11.1% of those initially classified as high risk). This is illustrated in the reclassification table (Table 51).

TABLE 51

TABLE 51

Reclassification of women into risk categories following adjustment of VASs

Discussion

Substantial lack of agreement was found between readers visually assessing percentage breast density, and between the readers and Cumulus assessments. This study demonstrates the need for reader harmonisation, either by standardised training before reading commences or by adjustment of results after reading has taken place, should VAS density assessment be used for risk stratification.

Adjustment of VAS estimates of percentage breast density to take account of interobserver variation thus had a substantial effect on which women were classified as being at high risk of developing breast cancer owing to the combination of their 10-year risk estimate and breast density. If VAS assessment of density is to be used to help assess cancer risk in order to inform screening strategies and preventative interventions, adjustment must be considered.

Selection of best performing risk prediction model in PROCAS and incorporation of density

Aim

To assess, using incident and prevalent breast cancers identified on the first mammography screen as part of PROCAS, the predictive performance and characteristics of risk stratification categories (1) from the TC and Gail risk models; (2) from MD; and (3) when combined. The aim is to assess their rank-ordering performance at baseline.

Methods

Data

Sample selection

As of 3 March 2014 there were 53,184 women and 632 confirmed breast cancers (DCIS and invasive) that occurred after enrolment. The following exclusions were made for this report:

  • 769 who had a previous diagnosis of breast cancer (22 cancers in PROCAS)
  • nine who had a bilateral breast cancer diagnosis, because the breast density measurement would be affected
  • 45 who had missing data on the side of breast cancer diagnosis
  • 2400 who had no visual assessment of breast density available (eight cancers)
  • 14 who were older than 73 years at enrolment (no cancers)
  • nine with BMI of > 80 kg/m2 and 22 with BMI of < 10 kg/m2 (one cancer).

This left 49,916 women who were breast cancer free at baseline, of whom 547 were diagnosed with breast cancer after enrolment into PROCAS and had a valid breast density measurement.

Tyrer–Cuzick risk model

The 10-year absolute risk from version 6.0 of the algorithm was applied to data as provided on the questionnaires before quality control was complete.

Gail risk model

The 10-year absolute risk (which includes competing mortality) was obtained using the code from the NRI website in April 2014.232 It was applied to data as provided on the questionnaires. Number of first-degree relatives is based on only mothers and sisters. Ethnicity was taken to be white unless reported as black, when that was used. The number of previous biopsies was not recorded in the questionnaire; it was taken to be 1 if the woman had reported a previous biopsy of her breast.

Breast density

Mean percentage density was obtained from two readers and eight views. For cancers this was obtained from the mean of the scores on the contralateral breast.

Statistical methods

Age, BMI, TC and Gail 10-year risk, and breast density were tabulated by percentiles. The relationship between density and age and BMI was assessed. To combine breast density with the TC and Gail absolute risks, a residual was obtained by fitting a linear regression of density against age and BMI, for those with BMI information. The model also included a term for type of mammogram (digital or film), because visual density measurements from digital images are known to be systematically less than those from film. The difference between that expected given age and BMI and that observed was used as a measure of breast density that is by definition independent of age and BMI and zero when the women has mean density for her BMI (if known) and age. Logistic regression was used to assess the significance of TC and the density residual (DR), and to calibrate the RR from the density measure. The calibrated RR from VAS was combined with the TC score to produce TC + density absolute 10-year risks. Tables showing the breakdown of cancers in 1–2%, 2–3%, 3–5%, 5–8% and ≥ 8% 10-year risk groups showed the stage, grade and lymph node positivity data by risk group for both TC and TC + density predictions. To compare against the TC groups, a table with the same number of women in each group was produced.

Results

Table 52 summarises the distribution of age, BMI, the TC 10-year risk and VAS density by case status. Breast cancer cases had marginally higher BMI but were well matched on age. Breast cancer cases also had higher overall 10-year risk based on the TC and Gail models.

TABLE 52

TABLE 52

Quantiles of risk factors used in the present analysis

Figure 37 shows that MD was strongly influenced by both age and BMI.

FIGURE 37. Box plots of visually assessed MD against age (years) for (a) breast cancers diagnosed; and (b) the complete cohort; and box plots of visually assessed MD against BMI (kg/m2) for (c) breast cancers diagnosed; and (d) the complete cohort.

FIGURE 37

Box plots of visually assessed MD against age (years) for (a) breast cancers diagnosed; and (b) the complete cohort; and box plots of visually assessed MD against BMI (kg/m2) for (c) breast cancers diagnosed; and (d) the complete cohort.

Table 53 also shows that density was affected by the type of mammogram. Old analogue film mammograms were associated with higher VASs than digital mammograms.

TABLE 53

TABLE 53

Quantiles of density by other factors

A linear model was fitted by regressing density on age, BMI and type of mammogram (digital or not). The DR was the difference between the expected value of density from this model and that observed. In other words, the DR is interpreted as the difference between a woman’s density and the average density for women of the same age and BMI. For example, if DR = +10, then the woman has density of 10 percentage points more than would be expected based on her age and BMI.

Table 54 shows summary results of using TC and Gail with the DR to predict case status in the cohort. The Gail model only achieved an AUC c-statistic of 0.54, while TC was better, at 0.57. Adding DR to the models significantly improved the discrimination to a c-statistic of 0.58 for Gail and 0.60 for TC. Furthermore, adding DR to each model increased the ORs between upper and lower quantiles in both models.

TABLE 54

TABLE 54

Adding the DR to Gail and TC models, summary measures of performance

The multivariate ORs are given for the difference between the upper 75th and lower 25th quantiles. The p-value is from the stepwise likelihood-ratio test for each predictor (TC or Gail first, then density); similarly, the AUCs are when TC or Gail are used alone, and then when the DR is added.

Figure 38 compares the receiver operating characteristic curves showing the improved performances with density added as DR.

FIGURE 38. Receiver operator characteristic curves showing performance of Gail and TC models with and without DR.

FIGURE 38

Receiver operator characteristic curves showing performance of Gail and TC models with and without DR.

Figure 39 shows the histogram of ORs from fitted logistic regression models with (1) TC alone, (2) DR alone, and (3) TC and the DR combined.

FIGURE 39. Cancers and risk scores for Gail and TC.

FIGURE 39

Cancers and risk scores for Gail and TC.

Figure 40 shows a scatterplot with cancers marked in red against the 10-year risk score from Gail and TC.

FIGURE 40. Gail and TC predictions (green X = breast cancer diagnosed).

FIGURE 40

Gail and TC predictions (green X = breast cancer diagnosed).

Tables 55 and 56 show TC and Gail predictions broken down by cancer detection. Neither TC nor Gail was predictive of a higher proportion of higher stage, grade or lymph node involvement.

TABLE 55

TABLE 55

Tyrer–Cuzick groups and cancer characteristics

TABLE 56

TABLE 56

Gail groups and cancer characteristics

Table 57 does the same analysis for the DR, by restricting the number of women in each group to be the same as the absolute risk groups from Tables 53 and 54. Density does appear predictive of increased proportions of higher-stage cancers.

TABLE 57

TABLE 57

Density residual groups and cancer characteristics, with approximately the same number in each as observed in the TC groups (slight differences owing to ties)

Tables 58 and 59 show how the prediction from Tables 53 and 54 change when density is combined with TC and Gail. Combining Gail and TC means that the proportion of low-risk cancers that have high stage (stages 2b and 3) is substantially lower. Only 18 out of 272 (6.6%) breast cancers occurring in women with a MD-adjusted TC score of < 3.5% 10-year risk had high stage at diagnosis, whereas 28 out of 222 (12.6%) of those with TC risks of > 3.5% in 10 years had high-stage cancers (p = 0.029). If we assume that the inclusion of prevalent cases adds a soujourn time of 2.5 years and that overall follow-up is, on average, 4 years, then in women with average or below-average risks (< 3.5% 10-year risk) the chances of getting a high-stage cancer with 3-yearly NHSBSP screening is 18 in 4 × 34,670 = 1.3 per 10,000 per year, compared with 28 in 4 × 14,699 = 4.76 per 10,000 annually (p < 0.001). TC picked out only 29.8% of the population as having a 10-year risk of > 3.5%. While adjusting the Gail score with density also predicted higher-stage cancers, 49.8% of the population was deemed to have a 10-year risk of > 3.5%. Thirty-five high-stage cancers occurred in 333 (10.5%) women at < 3.5% 10-year risk, compared with only 11 in 191 (5.8%) (p = 0.077).

TABLE 58

TABLE 58

Tyrer–Cuzick + DR groups and cancer characteristics

TABLE 59

TABLE 59

Gail + DR groups and cancer characteristics

Density was associated with stage grade and nodal status, which drives the pattern seen in Tables 60 and 61. Table 60 shows a clear rising proportion of cancers with increasing DR. Further data on the cancers are given in Tables 6165.

TABLE 60

TABLE 60

Cancers with stage information, broken down by DR

TABLE 61

TABLE 61

Cancers with stage information, broken down by age

TABLE 65

TABLE 65

Cancers with grade information, broken down by age

TABLE 62. Logistic regressions of pathology vs.

TABLE 62

Logistic regressions of pathology vs. the DR and age

TABLE 63

TABLE 63

Cancers with LN information, broken down by DR

TABLE 64

TABLE 64

Cancers with LN information, broken down by age

These tables also show that breast cancers diagnosed after the age of 65 years were less likely to be high stage with lymph node involvement, and were of lower grade.

Discussion

The results from our analyses of prospective breast cancers in PROCAS indicate that TC, Gail and density are predictive for a screening population, and that density may be combined with a prediction model. TC appeared to perform better than Gail and, in particular, has good discriminatory value for overall prediction and high-stage prediction when combined with density. More cancers occurred with Gail and TC in higher-risk groups, but little difference by risk group was identified in predicting the probability of high stage, grade or lymph node status given a cancer diagnosis based on Gail or TC. High density was linked to higher stage and lymph node involvement. Although the addition of density significantly improved the AUCs for both Gail and TC, the overall AUC scores were still modest.

We have previously shown that TC is accurate in risk assessment57 in the FHC setting and the present study has shown that it predicts accurately in the general population. A number of breast cancer risk models have been developed in the past 25 years.22 These incorporate known genetic, reproductive and other risk factors to a greater or lesser extent (see Table 28). Gail et al.149,150 described a risk assessment model, which focuses primarily on non-genetic risk factors with limited information on family history. A model of RRs for various combinations of the utilised risk factors (see Table 28) was developed from case–control data from the Breast Cancer Detection Demonstration Project. Individualised breast cancer probabilities from information on RRs and the baseline hazard rate are generated. These calculations take into account competing risks and the interval of risk. The data depend on having periodic breast surveillance. The Gail model was originally designed to determine eligibility for the Breast Cancer Prevention Trial, and has since been modified (in part to adjust for race) and made available on the National Cancer Institute website.232 The model has been validated in a number of settings and probably works best in general assessment clinics, in which family history is not the main reason for referral,149,150,176 although it should also be useful in general population screening programmes. The major limitation of the Gail model is the inclusion of only first-degree relatives, which results in underestimating risk in the 50% of familial risk with cancer in the paternal lineage and also takes no account of age at onset of breast cancer.

The Claus model177 and BRCAPRO181 are primarily genetic models calculating a likelihood of either a putative high-risk dominant gene177 or BRCA1/BRCA2.181 Breast cancer risks are imputed from this calculation. As such, given the rarity of BRCA1/BRCA2 or the putative dominant gene in the Claus model, these models are useful only in the familial setting and are not relevant to the current study. BOADICEA69 is another model primarily developed to assess genetic risk, but has been validated in a population-based series of breast cancers. Although the inclusion of non-genetic risks is anticipated, these are not yet available in the online model.

The TC model,11 based partly on a data set acquired from IBIS-I and other epidemiological data, incorporates both familial and non-genetic risk factors in a comprehensive way.11 The major advantage over the Claus model and BRCAPRO is that the model allows for the presence of multiple genes of differing penetrance. It does give a read-out of BRCA1/BRCA2, but also allows for a lower-penetrance BRCAX. As such, the TC model addresses many of the pitfalls of the previous models: significantly, the combination of extensive family history, endogenous oestrogen exposure and benign breast disease (atypical hyperplasia). It is unsurprising, therefore, that the model performs better than the simpler Gail model, this being particularly so in the familial setting57 and now, as we have shown, in the population setting.

Mammographic density is the single assessable risk factor with the largest population attributable risk and also has a substantial heritable component.233,234 The difference in risk between women with extremely dense, as opposed to predominantly fatty, breasts is approximately four- to sixfold.235 The incorporation of MD into standard risk prediction models has been associated with some improvement in precision of risk prediction.34,35 Here we have shown that adding an adjusted MD score to Gail and TC not only improves the discrimination significantly, but also predicts higher-stage cancers. It is likely that our results also indicate that TC should replace Gail in North America, as a study from Canada has shown that TC substantially outperformed the Gail model.75 The authors applied 10-year absolute risk of breast cancer, using prospective data from 1857 women, over a mean follow-up length of 8.1 years, of whom 83 developed cancer. The 10-year risks assigned by Gail and TC differed, with ranges at the extremes of 0.001% and 79.5%. The mean Gail- and TC-assigned risks of 3.2% and 5.5%, respectively, were lower than the cohort’s 10-year cumulative probability of developing breast cancer of 6.25%. Agreement between assigned and observed risks was better for TC [HL X4(2) = 7.2, p-value 0.13] than for Gail, with Gail significantly underpredicting cancers (p < 0.001). The TC model also showed better discrimination (AUC = 69.5%, 95% CI 63.8% to 75.2%) than the Gail model (AUC = 63.2%, 95% CI 57.6% to 68.9%). In almost all covariate-specific subgroups, Gail mean risks were significantly lower than the observed risks, while IBIS-I risks showed generally good agreement with observed risks, even in the subgroups of women considered at average risk (e.g. no family history of breast cancer, BRCA1/BRCA2 mutation negative).

A further study from Marin County using data from 12,843 participants, of whom 203 had developed breast cancer during a 5-year period, showed that TC achieved an AUC of 0.65 (95% CI 0.61 to 0.68), compared with 0.62 (95% CI 0.59 to 0.66) for Gail and 0.60 (95% CI 0.56 to 0.63) for BRCAPRO. The corresponding estimated expected versus observed ratios for the models were 1.08 (95% CI 0.95 to 1.25), 0.81 (95% CI 0.71 to 0.93) and 0.59 (95% CI 0.52 to 0.68). In women with an age at first birth of > 30 years, the AUC for the TC, Gail and BRCAPRO models was 0.69 (95% CI 0.62 to 0.75), 0.63 (95% CI 0.56 to 0.70) and 0.62 (95% CI 0.56 to 0.68), and the E : O ratio was 1.15 (95% CI 0.89 to 1.47), 0.81 (95% CI 0.63 to 1.05) and 0.53 (95% CI 0.41 to 0.68), respectively.

The current study has, therefore, shown that TC is a reliable model for risk prediction in a UK general population screening programme. The discriminatory value is significantly improved by incorporating an adjustment for age and BMI-adjusted MD. These results have implications for national breast screening programmes as it appears effective to offer 3-yearly mammography screening in around 70% of the female population aged 47–73 years of age and the higher rates of high-stage cancers in women with above-average risk would justify an 18-monthly interval screen, which may downstage such cancers.

Publications arising from work in Chapter 4

Sergeant JC, Wilson M, Barr N, Beetles U, Boggis C, Bundred S, et al. Inter-observer agreement in visual analogue scale assessment of percentage breast density. Breast Cancer Res 2013;15(Suppl. 1):17.195

Sergeant JC, Walshaw L, Wilson M, Seed S, Barr N, Beetles U, et al. ‘Same task, same observers, different values: the problem with visual assessment of breast density’, Proc SPIE 8673, Medical Imaging 2013: Image Perception, Observer Performance, and Technology Assessment, 86730T, March 28 2013.202

Sperrin M, Bardwell L, Sergeant JC, Astley S, Buchan I. Correcting for rater bias in scores on a continuous scale, with application to breast density. Stat Med 2013;32:4666–78.193

Hashmi S, Sergeant JC, Morris J, Whiteside S, Stavrinos P, Evans DG, et al. Ethnic variation in volumetric breast density. In Maidment ADA, Bakic PR, Gavenonis, S, editors. Breast Imaging 11th International Workshop. Berlin Heidelberg: Springer-Verlag; 2012. pp. 127–33.196

O’Donovan E, Sergeant J, Harkness E, Morris J, Wilson M, Lim Y, et al. Use of Volumetric Breast Density Measures for the Prediction of Weight and Body Mass Index. In Fujita H, Hara T, Muramatsu C, editors. Breast Imaging 12th International Workshop. Berlin: Springer-Verlag; 2014. pp. 282–9.198

Evans DG, Brentnall AR, Harvie M, Dawe S, Sergeant J, Stavrinos P, et al. Breast cancer risk in young women in the national breast screening programme: implications for applying NICE guidelines for additional screening and chemoprevention. Cancer Prev Res (Phila) 2014;7:993–1001.190

Copyright © Queen’s Printer and Controller of HMSO 2016. This work was produced by Evans et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK379493

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (8.8M)

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...