U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Academy of Medicine; The Learning Health System Series; Grossmann C, Chua PS, Ahmed M, et al., editors. Sharing Health Data: The Why, the Will, and the Way Forward. Washington (DC): National Academies Press (US); 2022.

Cover of Sharing Health Data

Sharing Health Data: The Why, the Will, and the Way Forward.

Show details

12CASE STUDY: MAYO-GOOGLE PARTNERSHIP

Interviewees: John Halamka, MD, MS, President, Mayo Clinic Platform; and Jeff Anderson, PhD, MBA, Director of Accounts, Clinical Data Analytics Platform

ABSTRACT

Security and privacy are foundational to digital health care innovation. In September 2019, Mayo Clinic entered a 10-year partnership with Google to design a framework for the ethical secondary use of clinical data. This new infrastructure has two components: 1) the Mayo Clinic Cloud, which houses patient records, and 2) the Mayo Clinic Platform, a controlled enclave in which Mayo can share a subset of its clinical data, de-identified, and allow collaborators to link to it or supplement it with their own data for advanced analytics, including the development and training of artificial intelligence (AI) systems. This, unique approach called “data under glass” exemplifies a federated learning model. With algorithms permitted into the enclave and data never leaving the home institution, the Mayo-Google partnership illustrates an approach to how health systems and technology companies can partner to facilitate knowledge generation while addressing privacy and cybersecurity concerns. It also promotes data collaboration and knowledge generation by off-setting the costs of procuring, managing, and storing large amounts of data needed for algorithmic development.

Box Icon

Box

Case Study at-a-Glance: Mayo-Google Partnership.

BACKGROUND

Mayo Clinic is an academic medical center and integrated health system comprised of three “shields”—patient care, education, and research. In September 2019, Mayo announced a 10-year partnership with Google. The partnership establishes a cloud-computing infrastructure, called Mayo Clinic Cloud, which is aimed at benefiting all three shields. The Cloud offers Mayo the ability to centrally store 1.2 million patient records (Furst, 2021). Through this partnership, Mayo also gains access to Google’s AI toolsets, engineering talent, and security experience. Although Mayo considered all cloud providers, the health system chose Google because of the technology company’s combinational strengths in these areas. These factors were important given Mayo’s reliance on secondary use of data to support translational, clinical, and epidemiological research studies.

Overall, Mayo’s recognition of the increasing digitalization of health care prompted the health system to undertake this transformation. The Mayo Clinic Platform is Mayo’s strategic initiative to improve health care through insights and knowledge derived from data and partnerships. In 2020, it pursued three major initiatives:

  • Clinical Data Analytics Platform – the process by which internal and external collaborators access the de-identified data in Mayo Clinic Cloud to discover new cures and treatments
  • Home Hospital Platform – cloud-hosted components that enable serious and complex care at a distance
  • Remote Diagnostic and Management Platform – ingestion of novel data from wearables and home-based devices that is combined with AI algorithms to deliver care recommendations to providers and patients.

Prior to engaging with Google, Mayo Clinic unified data from 70 diverse care sites into a longitudinal patient data store called the Universal Data Platform. This critical step made it easier for Mayo to migrate its structured and unstructured data to a private cloud container within the Google Cloud. The arrangement is analogous to renting a storage unit within a warehouse in which one puts their belongings and secures it with a lock. The warehouse owner cannot open the storage unit, and ownership of the belongings remains with the owner of the storage unit. Although Mayo Clinic built Mayo Clinic Cloud within the Google Cloud, Google is not able to access Mayo data independently since Mayo holds the key. Thus, Google is unable to combine Mayo patient data with data sourced from Google applications such as Search, Gmail, Google Maps, and YouTube.

For Mayo’s internal operations, a patient-identified copy of Mayo’s data is stored in the private cloud container under Mayo’s control and not accessed by third parties. Another copy of the data, which is de-identified, is stored in a private cloud container that can be accessed by authorized third parties with Mayo’s control for analytics. Third parties can develop novel algorithms, validate existing algorithms, and perform data analyses, but the data never leaves the Mayo container. Third parties can only take wisdom with them in the form of the finished algorithm or completed analysis. This is called “data behind glass.”

The mechanism in its entirety represents a federated learning model (see Figure 6). Unlike a centralized data-sharing model in which a singular database hosts all of a person’s accumulated health information and is the locus of aggregation and computing, the federated learning model allows Mayo to maintain physical and logical control of its data while selectively inviting investigators in and having the ability to audit what they do. The resulting learnings are distributed across private, academic, and federal research entities and are exchanged accordingly for further development (McMahan and Ramage, 2017).

FIGURE 6. Diagram of a Federated Learning Model.

FIGURE 6

Diagram of a Federated Learning Model. SOURCE: Google AI Blog. 2021. Google Research: Looking Back at 2020, and Forward to 2021. Available at: https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html (accessed December 6, 2021).

DESCRIPTION

Underpinning this partnership is a well-defined governance structure that consists of several layers of oversight for Mayo Clinic Cloud and Mayo Clinic Platform. A multi-stakeholder task force called “One Table” reviews data access requests and reports to Mayo’s board of governors, who also weigh in on decision making. Nonetheless, executive leadership notes that the tradeoff with establishing a multi-level governance structure is speed and efficiency. However, investing in clear-cut decision-making processes is time well spent. In addition, the Health Data and Technology Advisory (DaTA) Board was created in 2021 and now has 11 members. Members are a diverse group of Mayo patients who live in the Rochester area, charged with providing perspectives and opinions on how potential AI and health technology applications, including the Google partnership and data sharing, will impact individual patients and the community as a whole.

Management of the partnership with Google is governed by a joint steering committee. Technical controls are complemented by policy controls. Due to privacy and ethical concerns that stem from third-party involvement, the steering committee is responsible for establishing a combination of policy and technical controls for regimenting data access and auditing. The technical controls block a third-party user’s access if they connect to the cloud in an unsanctioned way. Policy controls prohibit partners from combining Mayo patient data with other data that could increase the risk of re-identification.

De-identification comes with challenges. From January to April of 2020, Mayo de-identified its structured data, including problem lists, medications, allergies, laboratories, and demographics. While the Health Insurance Portability and Accountability Act (HIPAA) mandates the removal of 18 types of direct and indirect identifiers, such as the patient’s name, phone number, and, in some cases, ZIP code to render the data sufficiently de-identified, Mayo navigated instances where the identity of a patient could be deduced based on the combination of data available (HHS, 2015). As a result, Mayo employed both computer- and human-mediated mechanisms to redact datasets and the concept of “bin size” to assess if the data is sufficiently de-identified. Studying domestic and international privacy law, Mayo came to the conclusion to use a bin size of 10, which means that a dataset was considered sufficiently de-identified if the data could be thought to be any one of 10 individuals in the database. Once Mayo performed de-identification, the health system sought certification from a third-party expert to verify that the data had a low likelihood of re-identification.

From April to August of 2020, Mayo de-identified the unstructured data, including clinical notes and reports, which program leaders admitted was a much more difficult task. The process required the removal of text that could enable easy re-identification. For example, if a note included the term “this senator” or specified “a star quarterback” from a named sports team, the note would be considered not sufficiently de-identified. Similarly, text may be typed into notes in a way that compromises privacy, such as the presence of phone numbers typed in a non-standard form (i.e., 5674328999). All such issues had to be addressed before the data could be certified as de-identified.

Operating amid growing skepticism and scrutiny of third-party collaborations, Mayo leadership has had to be conscientious about questions of feasibility and fostering stakeholder buy-in. In the interest of building trust and transparency, Mayo shares details of partnerships as soon as agreements are finalized with a number of leading publications and holds seminars with the broader health care community to gather feedback on its policies and procedures. In partnership with the Healthcare Information and Management Systems Society, it conducted a nationwide survey gauging consumers’ attitudes vis-à-vis data sharing. Mayo Clinic Platform also works with community advisory boards, comprised of both patients and non-patients, to advise on topics such as Mayo’s genomics data sharing policy. Equally important was ensuring the comfort of Mayo’s own research community. This required educating the internal research community about the research benefit of these tools and addressing their concerns. Mayo also espouses the guiding principle of partnering only with external groups whose values align with its internal research practices.

FUTURE DIRECTIONS

The interviewees for this case study cite the use of a federated learning model as the key factor to affording Mayo Clinic the agility and functionality to meet not only its data analytics goals, but also its stewardship responsibilities to patients. For health systems looking to emulate the Mayo-Google model, the interviewees advise starting small, thinking big, and moving fast. All of Mayo’s data projects start as limited pilots, which are only expanded after lessons learned are thoroughly reviewed and risks mitigated. External partnerships and coalitions bring diverse experiences to innovation projects, so it is important for health care institutions to seek alliances with others.

Copyright 2022 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK594445

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.5M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...