Developing the risk score

Melanie J Davies; Laura J Gray; Dariush Ahrabian; Marian Carey; Azhar Farooqi; Alastair Gray; Stephanie Goldby; Sian Hill; Kenneth Jones; Jose Leal; Kathryn Realf; Timothy Skinner; Bernie Stribling; Jacqui Troughton; Thomas Yates; Kamlesh Khunti

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Davies MJ, Gray LJ, Ahrabian D, et al. A community-based primary prevention programme for type 2 diabetes mellitus integrating identification and lifestyle intervention for prevention: a cluster randomised controlled trial. Southampton (UK): NIHR Journals Library; 2017 Jan. (Programme Grants for Applied Research, No. 5.2.)

Cover of A community-based primary prevention programme for type 2 diabetes mellitus integrating identification and lifestyle intervention for prevention: a cluster randomised controlled trial

A community-based primary prevention programme for type 2 diabetes mellitus integrating identification and lifestyle intervention for prevention: a cluster randomised controlled trial.

Show details

Contents

< Prev Next >

Chapter 3Developing the risk score

This chapter was based on previously published data, reproduced with kind permission from Springer Science+Business Media: Diabetologia, Detection of impaired glucose regulation and/or type 2 diabetes mellitus, using primary care electronic data, in a multi-ethnic UK community setting, vol. 55, 2012, pp. 959–66, Gray LJ, Davies MJ, Hiles S, Taub NA, Webb DR, Srinivasan BT, Khunti K, excerpts of text, tables 1, 2 and 3, and figures 1 and 2 (please note that minor edits have been made, with the permission of the authors, for consistency).¹¹³

Introduction

Risk scores are a way of stratifying a population for targeted screening. They use data from risk factors to calculate an individual’s score; a higher score reflects higher risk. Risk scores can be applied either to an individual as a questionnaire (these scores generally require only data from non-invasive risk factors, which would be known by members of the public) or to a population. Population risk scores are usually developed for use in primary care where a piece of software is used to calculate the score for everyone listed on the electronic medical records using routinely stored data. Screening invitations can then be sent to those at the highest risk.

Over the past decade, a plethora of risk scores have been developed and validated for detecting those at risk of T2DM. One of the first risk scores developed in this field was the FINDRISC score. This risk score was developed for use in Finland; it is questionnaire based and designed to be completed by members of the public to detect those at risk of developing T2DM in the future.²⁷ It includes eight questions relating to age, BMI, waist circumference, BP, history of high blood glucose, family history of diabetes mellitus, physical activity and consumption of vegetables, fruits or berries. This score has been shown to have acceptable levels of discrimination and, since its development in 2003, it has been validated for use in Greece,¹¹⁴ Bulgaria,¹¹⁵ Italy,¹¹⁶ Spain¹¹⁷ and Sweden.¹¹⁸ It was decided not to validate this score for use in this project for a number of reasons. First, the FINDRISC was not developed for detecting those with existing undiagnosed PDM (IFG and/or IGT) and T2DM. Second, it is well reported that risk scores that have been developed for a particular population tend to have low validity when used on another. In addition, the FINDRISC does not include ethnicity, which is an important risk factor when assessing risk in a multiethnic population such as in the UK.²⁸^,¹¹⁹^,¹²⁰ Third, the questionnaire nature of this risk score and the inclusion of patient-specific risk factors that would not be available routinely in primary care meant that this risk score could not be implemented in primary care for population-based stratification.

The Cambridge Diabetes Risk Score (CDRS) addresses some, but not all, of these issues.¹²¹ This score was developed to detect undiagnosed T2DM and it collects data on age, sex, BMI, steroid and antihypertensive medication, and family and smoking history. This score would be suitable for use in primary care but it does not detect current undiagnosed PDM and it does not reflect the higher incidence of T2DM in those from black and minority ethnic (BME) groups. The FINDRISC identifies people who are at risk of developing T2DM in the next 10 years, with the CDRS detecting current undiagnosed T2DM only. To date, there is no evidence base for intervening in such a group for the prevention of T2DM. The evidence from the large pivotal trials for preventing T2DM is in people with IGT.³¹ The ultimate aim of this programme of work is to develop and test a pragmatic intervention, taking the learning of the previous trials, delivered in a UK primary care setting. Therefore, we wished to identify people who have PDM rather than those at risk of developing diabetes mellitus in the future. Hence, it was decided to derive and statistically validate a new risk score that detects PDM/T2DM for use in a multiethnic population using data from two existing population-based screening studies from Leicester and Leicestershire.¹²²^,¹²³

The development and validation of the Leicester Practice Risk Score (LPRS) had three phases. Initially, a pilot score was developed and validated, and tested in two general practices (phase one). The aim of the pilot phase was not so much to assess the performance of a risk score per se, but to test the feasibility of a risk-score approach for identifying people with PDM in primary care. Owing to the milestones required for the programme of work, this feasibility testing needed to be completed before the final data set from the large population-based screening study was ready for analysis. A very simple pragmatic score was therefore derived to enable this approach to screening to be tested. Reporting of details of how this score was derived is outside the scope of this report. Following this pilot, complete data from a large-scale population-based screening study [Anglo–Danish–Dutch study of Intensive Treatment In people with screen detected diabetes in primary care (ADDITION)] became available; therefore, the score was redeveloped based on the learning from the pilot study. This score was subsequently used to identify those at high risk for screening within Let’s Prevent (phase two). Following this, the score was updated based on subsequent improvements in data completeness in primary care and the addition of HbA_1c to the diagnostic criteria for T2DM (phase three). Given that this score is published and used in clinical practice, full details of the development and validation are given for the final updated score.

Data sets

Data sets from two existing closely related screening studies were used throughout all three phases, that is, ‘Screening Those At Risk’ (STAR) and ADDITION. These are described briefly below and their shared methodology is outlined in the final section.

Screening Those At Risk

The STAR study aimed to identify the prevalence of PDM and undiagnosed T2DM in those with at least one recognised risk factor for diabetes mellitus. Between 2002 and 2004, 3225 individuals aged 40–75 years inclusive (25–75 years for those with South Asian, Afro-Caribbean and other ethnicity owing to the reported higher risk of T2DM) with at least one risk factor for T2DM were invited for screening from 17 general practices. Risk factors for inclusion into the study included a documented clinical history of coronary heart disease, hypertension, dyslipidaemia, cerebrovascular disease or peripheral vascular disease, previous history of IGT, gestational diabetes, polycystic ovary syndrome in those with a BMI of > 25 kg/m², a first-degree relative with T2DM or BMI of > 25 kg/m², and current or ex-smokers. Full details of the methodology and results are published.¹²²

Anglo–Danish–Dutch study of Intensive Treatment In people with screen detected diabetes in primary care-Leicester

This study has been described in detail elsewhere.¹²³ In summary, ADDITION-Leicester invited a randomly selected 30,950 people aged 40–75 years (25–75 years if non-white, although those aged 25–40 years are excluded from these analyses) without diagnosed diabetes mellitus from 20 practices from Leicester and the surrounding county for screening between 2004 and 2008; 6749 individuals attended screening (response rate 22%). All 6749 participants underwent an OGTT and, therefore, people with PDM and previously undiagnosed T2DM were identified. Those found to have undiagnosed T2DM were included in a RCT of intensive treatment versus standard care;¹²⁴ data from this trial are not included in this analysis. The analysis is based solely on the cross-sectional screening data and, therefore, includes people identified with normal glucose, PDM and T2DM.

Shared protocols

In both studies all screened participants received an OGTT using 75g of glucose, and had biomedical and anthropometric measurements taken by a trained member of research staff, which included data such as medical history, medication, BMI, BP, and a self-completed questionnaire. The questionnaire collected data on smoking status, alcohol consumption, occupational status, ethnicity, physical activity, the FINDRISC score and a number of scales to measure domains such as well-being and anxiety.

All participants were diagnosed with screen-detected IFG, IGT and T2DM according to WHO 1999 criteria,⁸ with PDM referring to the composite of IGT and/or IFG. HbA_1c was collected for all participants at baseline.

Anthropometric measurements were performed by trained staff following standard operating procedures, with height being measured to the nearest 0.1 cm using a rigid stadiometer (Seca, Hamburg, Germany) and weight in light indoor clothing measured to the nearest 0.1 kg with a Seca scale (Seca UK, Birmingham, UK). BMI was defined as weight in kilograms divided by height in metres squared (kg/m²). Waist circumference was measured at the mid-point between the lower costal margin and the level of the anterior superior iliac crest to the nearest 0.1 cm.

Data sets used for development and statistical validation of the risk scores

The use of each data set across the three phases is outlined in Table 3. Given the larger sample size and population-based approach, the ADDITION data set is preferable for the development of a risk score, with STAR then being used for temporal (i.e. evaluation on external data from the same centre) validation. Owing to the unavailability of the ADDITION study data in a format suitable for analysis in late 2007 when the pilot study was commenced, the development of the initial risk score was divided into two phases. In phase one, a pilot risk score was developed using data from the STAR study, specifically for use in the pilot screening study. Temporal validation using the ADDITION study data set was carried out retrospectively. In phase two, the risk score for use in the Let’s Prevent study was developed. Its design is based on analysis of the ADDITION study data set, which, being larger than the STAR data set, allows greater sensitivity to the possible predictive values of potential risk factors. The same approach was used when the risk score was updated in 2010.

TABLE 3

Data used for the development and statistical validation across the three phases

The characteristics of those included in the two data sets are given in Table 4. The mean age in the ADDITION-Leicester data was 57.3 years, with 48% being male. Three-quarters of the cohort were white European, with 23.5% of other ethnicity (of which the majority were South Asian, 91%). Of the 6390 people aged ≥ 40 years screened as part of the ADDITION study, 927 (14.5) were found to have PDM and 206 (3.2) had undiagnosed T2DM based on an OGTT, which rises to 485 (7.6%) when including HbA_1c in the diagnostic criteria. The STAR data set had similar characteristics but with slightly more people reporting that they were smokers (25% vs. 14%).

TABLE 4

Characteristics of data sets used for model building and temporal validation

Statistical methods

The purpose of the risk scores was to identify those at greatest risk of glucose intolerance, defined as those with either T2DM or PDM (which includes IFG and/or IGT) who, up until screening with an OGTT, had been undiagnosed. All of the scores developed and validated as part of this project used similar methodology. To avoid repetition this is detailed below. Where differences occurred, these are also summarised.

Development

Variables considered

The variables to be considered for inclusion in the score are limited to those that are included in the ‘typical’ general practice database with a good level of reliability and completeness. The consensus is that the following items satisfy these conditions: age, sex, BMI, ethnicity (white European or other), family history (of type 1 or type 2 diabetes mellitus), smoking status (current smoker or ex or non), prescribed antihypertensives, statins or steroids, history of CVD (myocardial infarction, stroke, heart valve disease, atrial fibrillation, angina, angioplasty or peripheral vascular disease) and deprivation [measured using the Index of Multiple Deprivation (IMD) calculated from the individual’s postcode]. This pool of variables assessed covers the majority of those included in previously developed screening tools and screening guidelines.¹²⁵^,¹²⁶

Modelling

All modelling was carried out in Stata (version 11.1) using logistic regression with the composite of IGR [defined as IFG or IGT on OGTT (not including HbA_1c 6.0–6.4 at this stage)] or T2DM [OGTT or HbA_1c ≥ 6.5% (48 mmol/mol)] versus normal as the dependent variable. A staged approach to variable selection was taken. First, we assessed the association of each variable and the outcome independently (PDM/T2DM). Those that were significantly (p < 0.05) associated with the outcome were then assessed in combination and those that became non-significant when adjusted for other variables in the model were removed. This process was then repeated. Each combination of variables was compared in terms of the area under the receiver operating characteristic (ROC) curve, with the aim of maximising this. The effect of adding each previously excluded variable into the model was assessed to make sure that no potentially important variables were missed; again, their significance and effect on the ROC was assessed. Once a final model was established we assessed all possible two-way interactions and the addition of polynomial terms, although we acknowledged that we would have limited power to explore these. The importance of introducing functional polynomial terms was also assessed using the Akaike information criterion.¹²⁷ Throughout the analysis, missing data were not imputed and analysis was carried out on a complete-case basis.

The updated risk score (described in phase three) also included HbA_1c of ≥ 6.5% in the definition of T2DM, given that HbA_1c was recommend as a diagnostic tool by WHO in 2011.¹⁰ HbA_1c was not used in the definition of PDM as, although using a range of 6.0–6.4% has been recommended for identifying those at high risk of developing diabetes mellitus in the future,¹² WHO concluded that there was insufficient evidence for classifying PDM using HbA_1c.¹⁰

Creating a scoring system

Once a final model has been developed a risk score needs to be devised from this. For the pilot score a crude, easy-to-calculate score was developed (see Phase one: pilot risk score results for details). For the initial and updated risk scores, the scores were derived by summing each of the β coefficients from the best fitting model.

Once a score had been devised, the discrimination of the score was assessed using the area under the ROC curve. Calibration was assessed using the Hosmer–Lemeshow statistic,¹²⁸ the Brier score¹²⁹ and a calibration plot of estimated prevalence of PDM and T2DM grouped by the predicted probability.

Statistical validation

Each of the scores developed were temporally validated (see Table 3 for data sets used). Each score was validated against the outcome for which it was developed. For the updated risk score, temporal validation was carried out using six different outcomes that reflect how the score would be used in clinical practice (i.e. one method of diagnosis will be chosen): (1) T2DM diagnosed using OGTT; (2) T2DM diagnosed using HbA_1c; (3) PDM defined as IGT or IFG on OGTT; (4) HbA_1c between 6.0% and 6.4%; (5) T2DM or PDM on OGTT; and (6) HbA_1c of ≥ 6.0%.

The ROC curve was plotted for each outcome and the area under the curve was calculated. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratio for a positive test (LR+) and likelihood ratio for a negative test (LR−) with 95% CI were calculated, comparing each cut point on the score to the outcome.

The results of each of the risk scores developed and validated are presented below (see Results). The pilot risk score and initial risk score are described briefly with full details of the updated practice risk score given along with details of the development of a piece of software to run this risk score in general practices.

Phase one: pilot risk-score results

The risk score developed to conduct the screening pilot in primary care was derived using the STAR data set. The final model is given in Table 5.

TABLE 5

Final model for the pilot risk score

From the coefficients of the final model a crude pilot risk score formula was defined as the sum of:

person’s age (in years)
twice their BMI (in kg/m²)
10 if they are male (no change if female).

A simplified score was preferred for the pilot study, as this would be easily calculated by researchers; the score for the main study was calculated using a piece of software and, therefore, the simplification of the relative weightings of the score components was not needed.

Using the score to identify the 10% most at risk for invitation to further screening identified 132 people with PDM in the STAR data set and 41 people with T2DM, representing a sensitivity of 20.1% and a specificity of 92.4%. Increasing the threshold so that the top 28% of people at risk were invited for screening increases the sensitivity to 43.6% and reduces the specificity to 75.2%.

Results from the pilot screening study

Methods

Two primary practices were identified to test the pilot risk score in order to assess how effectively the tool could be used to identify those at highest risk.

Two contrasting practices were selected for the pilot screening study. Melton Mowbray is a large rural practice comprising a practice population of 36,000 with 20 general practitioners (GPs); 99% of the practice population are Caucasian and 829 patients were listed on the diabetes mellitus register (2.3% of the total practice). Spinney Hill, in contrast, is a large, inner-city practice with seven GPs comprising a total practice population of 16,000 patients, 98% of whom are of South Asian ethnicity. A total of 8% (1311) of patients were listed on the practice register for diabetes mellitus. The two practices used in the pilot study were selected from those that already had ethical permission for ongoing screening as part of the ADDITION study, but in which screening had not commenced. For this reason, recruitment of the practices was not necessary; in the Let’s Prevent study, practices would first need to be recruited to take part in the study.

For the pilot study, information needed for the risk score was obtained from Egton Medical Information Systems [(EMIS) EMIS Health, Leeds, UK] data searches. It was expected that all the general practices in the study area would be using the EMIS computer system. From the practice list data the risk score was calculated and, separately for each practice, used to classify the individuals in descending order of PDM/T2DM risk.

In the pilot study the individuals were then sent letters of invitation (in batches of 200) to take part in the study by their GP; this invitation included a questionnaire (which asked four basic questions, such as whether or not the individual was taking part in any other studies) and a reply slip together with a stamped, addressed envelope. Recruitment was stopped (for reasons of practicality) at the risk score of 125 in both practices. Individuals were invited according to their risk score. Individuals who had not replied were sent a reminder letter. A mobile clinic located on a double-decker bus was used for the majority of the pilot screening owing to its convenience and accessibility, although a small number of screening sessions for participants living in the Leicester area were held at the Leicester General Hospital.

Those individuals who had agreed to take part in the pilot study were sent information on the date, time and place of their appointment as well as instructions that from the previous midnight they should eat nothing and drink only water. The surgery sessions were held only in the morning. Participants were given a telephone reminder a day or two before their session and this had the effect of cutting the number of ‘no-shows’ down from 46% in ADDITION to almost zero. Overall, there tended to be more no-shows from the inner-city Leicester practice than from the practice located in the nearby market town.

Results

In total, 2168 people were found to be at high risk and invited to be screened. A total of 686 people gave a positive reply, representing 31.6% of those invited. At the time of analysis, 264 of those who gave a positive response had been screened (38.5%). Therefore, 12.2% of those originally invited provided data for the pilot screening study.

Baseline characteristics of the 264 participants screened in the pilot study are shown in Table 6. This sample was predominantly (73%) male, somewhat older than those included in the STAR and ADDITION studies, with a mean age of 64.5 years, but with a similar proportion of South Asians and participants with similar mean BMIs. The differences in characteristics between the diagnostic groups are generally as would be expected.

TABLE 6

Anthropometric and clinical characteristics obtained at screening, by diagnostic category, from the two pilot practices combined

Overall, 20.8% of those screened had either PDM (15.2%) or T2DM (5.7%). This was slightly higher than the percentage found in the ADDITION-Leicester population-based screening programme (19.3%). It was anticipated that this difference could be increased further with the refinement of the risk score using the full ADDITION-Leicester data set. Of those found with PDM, the majority had isolated IGT (72.5%).

Overall, the pilot study showed that it was feasible to run a risk score using practice data stored in the EMIS system, to invite people by postal mail to come forward for screening, to get a reasonable response rate to the invitation, to screen those who attended, and to detect people with previously undiagnosed PDM or T2DM.

Phase two: initial Leicester Practice Risk Score results

Phase one showed that it was feasible to use a risk-score approach to identify people with undiagnosed PDM and T2DM in primary care. The next phase was to develop the risk score for use in the main Let’s Prevent study and to assess its validity using the STAR study data set that had been used for the development of the pilot risk score. The starting point for the development of this risk score was different from that for the pilot. The initial plan was to recruit 20 patients from each practice; data on ethnicity would be available in terms of proportions of the main ethnic groups at the practice level but not available from individuals. It became apparent that in order to fulfil the study aim of recruiting those eligible patients judged to be at highest risk of conversion to T2DM, a more efficient sampling plan would be to invite patients for an OGTT, starting from those with the highest risk score and moving down in order of score to a common level across all the practices that would result in the required study sample size. The number of patients invited for OGTT would, therefore, reflect the size and general risk levels of practices, and it would be valid to include a proportion of patients from ethnic minority groups when deriving the risk score.

From the modelling results (Table 7), the initial risk score was defined for the study as:

TABLE 7

Final model for the initial practice risk

Risk score = 0.0407 × age (years)

+ 0.296 (if male, no change if female)

+ 0.934 (ethnicity, as practice proportion of South Asians)

+ 0.0859 × BMI (kg/m²)

+ 0.440 (if family history of diabetes mellitus, no change otherwise)

+ 0.374 (if on antihypertensive medication, no change otherwise).

Statistical validation of the initial risk score was carried out by examining its performance temporally on the STAR data and, additionally, by comparing its discrimination with other standard risk scores and the pilot risk score with respect to the area under the ROC curve. This was carried out for the total sample, and then separately for the South Asian and white European cohorts. Table 8 shows that the ROC area under the curve (AUC) of the initial risk score had better discrimination than either the FINDRISC or the CDRS in both of the data sets on which it was tested; however, in common with the other risk scores, it performed worse on the South Asian subset in the STAR study sample.

TABLE 8

Discrimination of the initial risk score for glucose intolerance, in comparison with the pilot risk score, FINDRISC and CDRS, as assessed by the area under the ROC curve

Using the initial risk score to identify the 10% most at risk for invitation to further screening gave a sensitivity of 19.2% with a specificity of 89.3%; the sensitivity is increased to 46% if the top 25% of at-risk participants are screened.

Phase three: updated Leicester Practice Risk Score results

Given the poor reporting of ethnicity in primary care, the initial score used practice-level ethnicity as a proxy for individual-level ethnicity. This may overinflate the score of white Europeans living in areas with a large South Asian population and vice versa. Recording of ethnicity has since been included in the Quality and Outcomes Framework,¹³⁰ which has significantly improved the level of completeness for individual-level ethnicity, with > 90% of UK practices now recording ethnicity for all newly registered patients.¹³¹ In addition, HbA_1c has been used to diagnose T2DM since 2011. Therefore, it was decided to develop another score that would include individual-level ethnicity and define T2DM using OGTT or HbA_1c to reflect these important changes to clinical practice. Although this score was developed de novo (given the change in outcome and the definition of an important predictor), the same methodology as used to develop the initial score was employed. As this score is now used in practice, a more thorough description of the development and validation is given. The subsequent section outlines the development of the LPRS software for use in primary care for running this updated risk score.

Table 9 shows the model-building process. Of the variables considered for inclusion, prescription of steroids and statins, smoking status, history of CVD and deprivation were excluded from the final model based on their association with PDM/T2DM.

TABLE 9

Model-building process

Table 10 shows the final model produced. Age, sex (male vs. female), BMI, ethnicity (‘other’ vs. white European), antihypertensive therapy (yes vs. no) and family history of diabetes (any type, yes vs. no) were all found to be significant predictors of PDM or T2DM both when modelled separately and together. Adding other variables did not improve the area under the ROC curve. There were no statistically significant two-way interactions, assessing significance at the 1% level, because of the high number of comparisons. Polynomial terms were considered for age and BMI but this did not improve the fit of the model. The area under the ROC curve for the final model was 70.1 (95% CI 68.4 to 71.7). Figure 4 shows the observed vs. the estimated prevalence of PDM and T2DM grouped by the predicted probability. This shows overall good agreement between the observed and predicted estimates. This is reflected in the result of the Hosmer–Lemeshow test based on 10 groups (χ² = 2.4, p = 0.97) and a Brier score of 0.15.

TABLE 10

The association between the set of risk factors included in the score and the glycaemic categories of PDM and T2DM

FIGURE 4

Comparison of the observed vs. the estimated prevalence of PDM or T2DM grouped by decile of predicted probability of PDM or T2DM. Adapted with kind permission from Springer Science+Business Media. Diabetologia, Detection of impaired glucose regulation (more...)

The performance of the score in differentiating between those who had PDM or T2DM diagnosed using either an OGTT or HbA_1c and those who had normal glucose tolerance in the temporal data set is shown in Table 11 and Figure 5. The score can be used in two ways: either by setting the sensitivity to a certain level or by deciding what percentage of the general practice to invite for further testing. If using an OGTT for diagnosis, then 50% of a general practice would need to be invited for testing to detect T2DM with 80% sensitivity, this is raised slightly to 54% being invited if using HbA_1c. To retain 80% sensitivity for the PDM outcomes, the percentage invited would need to be increased to 60% if using an OGTT and 66% for an HbA_1c between 6.0% (42 mmol/mol) and 6.4% (46 mmol/mol). Inviting the top 10% for testing, 9% of these would have T2DM using an OGTT [PPV 8.9% (95% CI 5.8%, 12.8%)] and 26% would have PDM [PPV 25.9% (95% CI 20.9%, 31.4%)]. Using HbA_1c increases the PPV to 19% for T2DM [PPV 18.6% (95% CI 14.2%, 23.7%)] and 28% for an HbA_1c between 6.0% and 6.4% [PPV 28.3% (95% CI 23.1%, 34.0%)]. If screening for both T2DM and PDM using an OGTT, inviting the top 10% for further testing gives a sensitivity of 17%. The high NPV (81.3%) suggests that this cut point is good for ruling out disease.

TABLE 11

Predictive performance of the score using the temporal (STAR) data set for identifying glycaemic categories using either an OGTT or HbA_1c at set levels of either sensitivity or the percentage of the population invited for further testing

FIGURE 5

Receiver operating characteristic curve for T2DM, PDM and T2DM or PDM using the OGTT and HbA_1c. OGTT: (a) area under ROC curve = 70.6%; (b) area under ROC curve = 66.3%; and (c) area under ROC curve = 68.5%. HbA_1c: (d) (more...)

Development of the Leicester Practice Risk Score software

It is well reported that many risk scores are not used in practice. One reason for this may be because little thought is given to implementing them at the development stage.¹³² To enable widespread use of the LPRS we developed a piece of software which uses existing medical records within primary care to calculate the LPRS for each patient within a practice population aged 40–75 years, having excluded people with known diabetes mellitus, the terminally ill and those with coded gestational diabetes (as they are already identified as being at higher risk and it is not necessary to screen them). When developing the software it came to light that many patients will have been screened; as it is unnecessary to rescreen these people, the software analyses existing OGTT/glucose/HbA_1c data. This process also identifies any people with ‘missed’ diabetes mellitus, that is, people with glucose results in the diabetes mellitus range who have not been coded as diagnosed with diabetes mellitus. The output is presented in a single Microsoft Excel^® version 10 (Microsoft Corporation, Redmond, WA, USA) spreadsheet that can be used to check records and recall patients for screening for diabetes mellitus. This software can be downloaded from http://leicesterdiabetescentre.org.uk/The-Leicester-Diabetes-Risk-Score.

Discussion

We have developed a simple and sensitive automated screening tool for use in multiethnic populations that will enable primary care practitioners to rank individuals by their risk of having undiagnosed PDM or T2DM and therefore allow targeting of screening resources. Ranking people by risk allows flexibility in the screening strategy chosen; practices can choose to hone in to the top of the list and invite fewer people for screening for a bigger ‘hit’ rate or, if resources allow, to widen their inclusion criteria, giving greater sensitivity at the offset of the specificity.

Although some existing scores have been validated against HbA_1c,¹³³^,¹³⁴ the updated score is the first to be developed that incorporates the new WHO diagnostic criteria into the outcome. Previous work has shown that different cohorts are detected using either an OGTT or a HbA_1c to diagnose T2DM.¹³⁵ Previously developed scores may now miss people who meet the new diagnostic criteria. This is also the first computer based score developed in a multiethnic population within the UK to identify prevalent disease. The Cambridge risk score was designed to identify undiagnosed diabetes only and does not adjust for ethnicity.¹²¹ Although not taken into account in the original score, a post hoc study using data from both Caribbean and South Asian populations showed that using alternative ethnic specific cut points could give acceptable levels of prediction for undiagnosed hyperglycaemia in these groups, but that further work needed to be carried out to refine these.¹³⁶ The multivariable risk score to predict the 10-year risk of acquiring T2DM (QDScore) predicts the 10-year risk of developing diabetes mellitus and includes similar variables to both the CDRS and scores developed here, but with the addition of deprivation and CVD (both of these were found not to improve the fit of models produced).¹³⁷ Compared with the Cambridge Risk score, the QDScore showed greater levels of discrimination, but only detects incident disease. In addition, the algorithm to compute the risk score has not been published and cannot be used to detect PDM. Other scores, including the Leicester Self Assessment score and the FINDRISC score, have been developed, which rely on the person at risk completing a questionnaire themselves and attending the GP practice.²⁷^,¹³⁸ The score developed here may increase the uptake to screening invitation by removing the need for people to calculate their own risk.

Although the score was developed using high fidelity data from a randomly selected population who all received an OGTT, there are a number of limitations to be taken into account when applying the score. First, the cross-sectional nature of the data limits the score to detecting prevalent undiagnosed disease. The score cannot, therefore, be used to estimate the risk of future disease, although detecting PDM will identify a high-risk group that is likely to develop T2DM in the future. Although this could be viewed as a limitation, screening strategies may want to focus on those who have current undetected disease as a priority. In addition, those scores predicting incident disease may give biased estimates, as those variables that are included in the score are also those that prompt testing. Future work will look at validating the score on a prospective data set. Second, only 22% of those invited for screening in the ADDITION-Leicester study attended. Although this is similar to other studies in similar populations¹³⁹ and reflects the difficulty in recruiting a multiethnic urban population with wide variations in socioeconomic status into research studies, this may have affected the representativeness of the data that the score has been derived from. For example, those screened were slightly older than those invited.¹⁴⁰ It is difficult to predict the possible implications of the response rate to the initial study on the score produced. Reassuringly, the score contains a similar set of variables to other comparable risk scores.²⁷^,¹²¹ Future work will further validate the score on other population-based data sets. There are some limitations with the analysis performed to derive the risk scores. We used a variable selection procedure based on the statistical significance of their association with the outcome to initially select variables for potential inclusion into the models produced. This is not the recommended approach to variable selection and can increase the risk of excluding important variables that become important only after adjustment for other variables.¹⁴¹ We did, however, reassess each variable excluded in this manner to see if their inclusion in the final model would improve the discrimination of the model. Although the approach taken may have been suboptimal, the variables included in the model are similar to those included in previous scores and the score was shown to work adequately when tested in a temporal data set. There is a hierarchy of data sets to be used for evaluating a risk scores validity: (1) internal; (2) temporal (using an external data set from the same centre); and (3) external (using a truly external data set from a different centre).¹⁴² Here we have used a temporal data set; although these data incorporate an independent population, the data set was run to the same standard operating procedures as the ADDITION study and within the same centre. Future work should focus on assessing the updated risk score in an external data set. The method of dealing with missing covariate data could also have been improved. Here risk scores were developed on complete-case data; in hindsight, a better way to deal with missing data would be to use multiple imputation.¹⁴³ Studies have shown that risk scores produced using complete-case data may be biased and could produce scores that perform poorly when used in clinical practice.¹²⁵ Future work should consider the robustness of the scores produced to missing data. Finally, the score was developed using data from Leicester (UK). The ethnic makeup of this area means that the ethnicity component of the score is based on data from South Asian participants (mostly of Indian descent). Although there were participants included from other ethnicities in both data sets (such as Chinese, Caribbean and African), there were insufficient data to model separate scores for each ethnicity. South Asians are known to have a high level of risk,¹⁴⁴ and, therefore, assuming the same level of risk for all BME groups, may overestimate risk for some, but this was thought to be preferential to underestimating risk or estimating risk based on insufficient data.

Summary

In summary, we have developed a valid and sensitive score for identifying those at the highest risk of prevalent PDM or T2DM within a multiethnic UK population. Using an automated tool is simple to implement and can be used to target screening approaches in a cost-effective manner. For example, in the UK, this tool could be used to complement the NHS Health Check programme as the score has been developed using data that are reflective of the inclusion criteria of the health checks. The results from the screening study using the initial version of the risk score, which incorporated practice-level ethnicity data, are reported in Chapter 6.

Copyright © Queen’s Printer and Controller of HMSO 2017. This work was produced by Davies et al. under the terms of a commissioning contract issued by the Secretary of State for Health. This issue may be freely reproduced for the purposes of private research and study and extracts (or indeed, the full report) may be included in professional journals provided that suitable acknowledgement is made and the reproduction is not associated with any form of advertising. Applications for commercial reproduction should be addressed to: NIHR Journals Library, National Institute for Health Research, Evaluation, Trials and Studies Coordinating Centre, Alpha House, University of Southampton Science Park, Southampton SO16 7NS, UK.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK409312

Contents