This review provided a critical examination of three components of commercially available apps for diabetes self-management: (1) available features, (2) usability, and (3) clinical efficacy, including harms. This review bridges the gap between systematic reviews examining all types of mHealth (including apps that are proprietary or otherwise unavailable to consumers) and reviews that only examine features or usability of commercially available apps. Other mHealth researchers have explored several of these components in a single review, such as summarizing evidence on commercially available apps;66 or summarizing evidence on all mHealth technologies and analyzing these technologies’ adherence to clinical recommendations, features, and potential risk to patients.16 This review builds on previous work by assessing the usability of apps that are currently commercially available. Our goal was to synthesize relevant information in a consumer-friendly way to both provide guidance to those currently making choices about which app to use, and to highlight research gaps that need to be addressed. Our focus on both evidence and user experiences is aligned with the goals of leaders in mHealth and diabetes fields, including the Digital Diabetes Congress67 and Xcertia.68

Limited Statistical Efficacy of Commercially Available Apps

Our results highlight that relatively few apps available through app stores have evidence of efficacy, which is consistent with findings of other systematic reviews.15, 16 For example, we did not find evidence for many of the apps that appear first when searching Google and Apple app stores, such as Diabetes: M, Diabetic Diet, MySugr, Blood Glucose Tracker, Sugar Sense, Diabetes and Blood Glucose Tracker, Carb Manager, or Diabetes In Check. Of the eight apps we identified as available for download in English in the United States, use of five apps (Glucose Buddy, Diabetes Manager, Diabetes Diary, Gather Health and BlueStar) demonstrated improvement in at least one outcome compared to controls, including HbA1c,2629, 37, 38 and out of-range hypo and hyperglycemic episodes. Use of one app (BlueStar) was associated with an increase in medication dosage, identification of self-entered medication errors, and satisfaction with care. One app was only available in the United States in Mandarin (WellTang).35,48 Use of this app demonstrated improvement in HbA1c, fasting blood glucose, 2-hour post-breakfast blood glucose, diabetes knowledge, and self-care behaviors.35 Two additional apps40, 52, 53 were not available in the United States; use of these apps demonstrated an improvement in HbA1c24, 36 and triglyceride levels,30, 31 as well as a reduction the number of severe (grade 2) hypoglycemic episodes.30, 31

Limited Clinical Efficacy of Commercially Available Apps

We found a clinically meaningful reduction in HbA1c of at least 0.5 percent in studies of five apps when compared with usual care. Of the five apps, two were for type 1 diabetes (Diabeo Telesage and Glucose Buddy) and three were for type 2 diabetes (BlueStar, WellTang, and Gather Health). Of note, we could not determine the effect of two apps (Diabetes Diary or mDiab) on HbA1c due to lack of information on between-group difference-in-differences. These findings demonstrate that only a few commercially available apps have clinical evidence supporting improved glycemic control.

Study Findings May Be Generalizable to Most Diabetes Patients

Findings from these short-term studies may be generalizable to most patients with type 1 and type 2 diabetes.

Study participants with type 1 diabetes were on average 33 to 40 years old with a diabetes duration of 16 to 25 years, making them comparable to the typical adult with type 1 diabetes who is usually diagnosed as a child, adolescent, or young adult.69 However, participants may have had more severe diabetes than typical type 1 diabetes patients, as measured by insulin pump usage. Nationally 20 percent of type 1 patients are estimated to use an insulin pump.70 Four studies with type 1 participants reported insulin pump usage ranging from 19 to 66 percent, while two studies of type 1 participants excluded participants with insulin pumps. In addition, multiple studies involved participants on complex management regimens (e.g., multi-day injections) including insulin pumps, which may have increased interest in using an app for self-care management.

Study participants with type 2 diabetes were on average 48 to 55 years old, which falls within the most diagnosed demographic for diabetes of ages 45 to 64.71 Participants had an average diabetes duration of 6.6 to 11 years, which may have made them more likely to use a patient tool—like an app—for self-care management.

Older adults comprise an important subpopulation of patients with diabetes. The percentage of adults with type 2 diabetes increases with age, with the highest prevalence (25.2 percent) among those aged 65 years or older.1 Although type 2 diabetes participants in our studies were on average 48 to 55 years old, older adults should be evaluated in future studies, as more than 40 percent of this group now owns smartphones.72

Variation in Usability Scores

We were only able to give usability scores to eight apps that we could download and access. Of these eight apps, we rated two of the apps as “acceptable” (Glucose Buddy and BlueStar), three as “marginal” (Glucose Buddy Pro, Diabetes Manager, and Dbees), and three as “not acceptable” (mDiab Lite, mDiab, and Diabetes Diary). These results suggest that consumers may have a difficult time using these apps. However, usability is subjective, and unless a consumer can download and test all the evidence-based apps, they may not be able to tell which app is best suited for them.

It is also important to note that the apps we evaluated do not have the same pleasing aesthetics as some of the more popular diabetes apps in the app stores. Because we did not identify published evidence on some of the more popular apps, we did not formally evaluate them in this review. However, other researchers that evaluated the usability of commercially available apps had similar findings. A 2014 systematic review of currently available diabetes apps found that usability for those 50 years and older was “moderate to good” for apps offering a narrow range of functions but “considerably worse” for apps offering more functions.73 Another 2016 study examining 4 popular diabetes apps found that there was “wide variability” in the ease of entering blood glucose, one of the easiest tasks to complete of those examined.74

Limited Evidence To Detect Patterns between Cost, Features, and Efficacy

Our discussions with experts drew attention to the fact that decisionmakers want information on the relationship between costs, features, and efficacy of mobile apps. For example, do apps that require a fee or paid subscription result in larger benefits in outcomes? Are there specific features of apps that lead to improved health outcomes, and others that do not? Unfortunately, because we identified relatively few studies on commercially available apps, study quality was variable, and we could not empirically assess the features and usability of several apps, we could not make any judgements about the relationship between cost, features and efficacy.

Short Duration of Studies

Studies ranged from 2 to 12 months, which is relatively short compared with the lifelong duration of diabetes. It is unclear whether these apps impact long-term outcomes, including microvascular and macrovascular complications.

Methodological Issues With Available Evidence

Our risk of bias assessments revealed that there is lack of consistency in how researchers are reporting their mHealth studies. Limited information on randomization, allocation, masking, and analysis of drop-outs are common methodological problems in studies of health care interventions. However, other methodological issues specific to mHealth made it difficult to interpret and apply findings.

In general, the RCTs we identified were inconsistent in what they considered to be a positive effect of an app (i.e., pre-post differences, between-group differences, or both). In some cases, this was because the main purpose of the study was to see if both groups had a change from baseline. For example, the study on NexJ34 was interested in whether a health coaching intervention was efficacious both with and without an app, so pre-post differences for both groups were presented. Still, study authors calculated the difference-in-difference between groups for HbA1c.

Study design also made it difficult to determine what effect could be attributed to the app and what was attributable to the additional interactions with study personnel or providers. For example, the 2011 RCT29 on BlueStar included multiple intervention groups with varying degrees of support by providers, but the main comparison was between the most intensive intervention versus usual care, so it was impossible to determine what was the effect of that additional support. For several studies, the intervention group had the ability to message providers or study staff and get an immediate response while usual care participants had to go through standard channels like phone calls or monthly appointments. In these cases, the control group did not provide a sufficient degree of attention control so it is not clear whether the app or the extra attention was causing the effect. This makes it difficult to interpret and apply findings across health care contexts where patients may not have as much support.

Additional issues that came up in several studies included inconsistent or missing information on how much participants used apps (i.e., the “dosage” of the intervention), limited information on the content of diabetes education provided by the app or provider, and not examining potential harms.

Most of the systematic reviews we included in this review commented that there is a lack of rigorous research on apps for diabetes.15, 2123 Our conversations with KIs revealed that there are many advocacy, research, and professional groups working to create guidance on both the reporting of mHealth studies, and on the interpretation of what constitutes an “effective” app. During our research, we identified tools to standardize mHealth reporting, such as the CONSORT-EHEALTH checklist.75 These tools attempt to standardize the level of detail included in studies so that the results can be interpreted in a meaningful way; however, it does not appear that these tools have been consistently used even though the checklist was published in 2011, before a majority of the studies were published.

Limitations

In addition to limitations caused by the variable quality of identified studies, there were three major limitations in this review: limitations created by the type of report, limitations caused by the lack of access to some of the commercially available apps, and limitations in how usability was assessed.

Rapid Review Limitations

We identified our list of potentially relevant studies from five recently published systematic reviews as well as hand-searching. As a result, we may have missed eligible studies. Also of note, although we took steps to critically assess the potential for bias in these studies, we did not consider every potential area for bias. Specifically, we did not evaluate primary and secondary outcomes as specified by study authors. Therefore, we could not tell if these outcomes were selectively reported.

Limitations From Lack of Access to Apps

We focused on commercially available apps accessible by the general public; however, defining “commercially available” became difficult. Of the 13 apps we evaluated, only 10 were available on Apple platforms and 10 available on Android platforms. Of the 10 Apple apps, we were unable to download one because it was only available for download from the French Apple App Store. This means we could not provide first-hand usability scores and consumer details about the app and had to rely on second-hand, potentially biased sources, mainly the developer Web sites. So, while we included the app because it was a commercially available app with evidence, it is unavailable to use in the United States.

On the Apple platform we were able to download three apps that we could not subsequently log into. The Android platform had two apps that were unavailable from the United States Google Play Store, and three that we could not log into. There was one app that we were able to download on an Apple device, but it was not in English. For this app, we based our assessment of features on potentially biased information from the developer.

Due to limited funding, our evaluation of three paid apps’ characteristics (Diabetes Manager, mDiab, and Glucose Buddy Pro) was only conducted on one platform, an Apple iPad. Therefore, we were unable to report any discrepancies in features and functions across platforms.

Finally, it is likely the versions of apps we assessed may have been different from the versions of apps that were studied, as most (7 out of 13) apps had been updated since the studies were published.

Limitations in Usability Assessment

Our SUS results may not generalize to the diabetes population. The SUS is typically administered to large numbers of actual users of apps- in this case, people with diabetes. Because none of our reviewers had diabetes, they may not have represented the experiences and preferences of people with diabetes. In addition, each app was assessed by only three reviewers.

In addition, reviewers had limited exposure to the app and were bound by the scope of the questions. This scoring tool consists of only 10 questions, available in Appendix D, and was designed to be a “quick and dirty” evaluation tool to assign a score to a process that is descriptive, nuanced, and subjective in nature.76

We were also unable to examine all characteristics of apps that are important to patients. Most notably, we did not examine technology performance outcomes such as malfunctions or crash statistics. While reliability is an important consideration for patient decisionmaking, we were unable to address this characteristic in this review.

Next Steps

Future Research Needs

First, there is a need for longer-term studies (more than 1 year) on apps for diabetes. Diabetes is a chronic condition and the risk of serious complications increases over time. These complications can take several months to years to develop, and are some of the most important outcomes for studies to address. Therefore, longer-term studies are necessary to tell whether an app has an impact on the development of these complications. In addition, longer-term studies are important in determining whether patients continue to engage with these apps, or if they eventually lose interest. Longer-term studies could also help determine if the beneficial effects of apps on short-term outcomes hold up over time.

It is particularly difficult to assess long-term outcomes in studies of apps, since apps are constantly changing. In longer-term studies, or multiple studies of one app, it is critical to report the app version, timing of updates, and any significant changes to features or content. This helps to determine if the results can be applied to the most recently updated app and current health care context. Researchers should also consider study designs other than RCTs to answer questions pertaining to long-term outcomes. An example is a cohort study where the outcomes of those who use an app versus those who do not are tracked over several years. Interviews and surveys could be used to ask why patients continue to use an app or not, and how patients’ interest in an app changes over time.

Second, researchers should consistently include harms in studies of diabetes apps, particularly hypoglycemic episodes. Ideally, studies would separate hypoglycemic episodes by severity, distinguishing between self-reported mild episodes and those that require medical assistance. It is important to report mild and severe hypoglycemic episodes for both shorter and longer-term studies.

Third, researchers who use RCT methodology should carefully consider how much interaction with study personnel and providers each group receives, and control for these interactions as much as possible (i.e., attention control). This would help ensure that the findings represent the effect of the app, not of the additional support. Future researchers should also consider head-to-head comparisons of multiple apps. This study design would provide adequate attention control, and would be more patient-centered, as many patients know they want to use an app in care but do not know which one is most appropriate for them.

Fourth, researchers should consider evaluating the most popular apps from app stores- and conversely- making researched apps available to patients. As previously discussed, relatively few commercially available apps are supported by evidence, so patients do not know how these apps will affect their diabetes-related outcomes. Patients and physicians need evidence on the apps that are currently available to them if they are to make informed decisions on which app to use in care.

Last, there is a need for a broader research and dissemination agenda on diabetes apps. Depending on the privacy policy, app developers can collect enormous amounts of information on which apps are being downloaded and used, how use changes over time, and how patient data changes over time. A registry that connects this information to other data sets, such as medical record data, would provide a wealth of knowledge that could move this field forward. This type of study is not likely to be funded by individual app developers; therefore, there is a need for collaboration between app developers, researchers and consumers to develop this registry and update it as apps change over time.

Implications for Clinicians and Patients

Although there is limited evidence that commercially available mobile apps improve diabetes-related outcomes, patients are downloading and using them anyway. Strong evidence can help people make informed choices, but when evidence is limited, patients who use these apps are essentially experimenting on themselves. Considering this, clinicians should consider asking their patients if they use apps in their self-management, and determine if the information provided by these apps adheres to current guidance for diabetes self-management. Patients should be aware that there is little evidence supporting the effectiveness of these apps, and should be wary of claims that these apps will improve their outcomes if not supported by evidence.

Evidence Should Be Available in App Stores

As previous researchers have noted, information on which apps have been studied is not readily available to patients through app stores. The result is that patients could potentially be using apps that either do not impact health outcomes or actually cause harms. This is a huge problem for all health apps, and there should be greater efforts by app developers and app stores to present this information to users.

Patient-Centered Decision Tools

mHealth for diabetes is an important topic for researchers, patients, providers, health systems, and professional groups. There have been efforts by many different research and professional groups to summarize the evidence on this topic. There is now a need to interpret and apply the current findings in a patient-facing way. This could take the form of patient-centered decision tools that help patients judge and select apps based on their personal needs and preferences as well as evidence of efficacy. These types of tools could help patients by describing which apps have evidence of efficacy and which ones do not, and indicating which outcomes may improve as a result of using the apps. Tailoring this information to patient preferences and needs, and updating the tools as more research is published, could empower patients as they navigate the vast amount of information available on these apps, and direct them to the apps that are most likely to improve their health outcomes.