Overview of Data Estate Strategy

Integrating healthcare and life sciences data from various systems and applications has been a costly and time-intensive undertaking. To address this, implementing a data estate strategy becomes crucial, as it establishes uniform standards for organizations to efficiently manage all their data, regardless of its storage location or format.

Data Estate Strategy refers to a comprehensive and structured approach adopted by organizations to manage their entire data ecosystem effectively. It involves developing a well-defined plan and set of guidelines for acquiring, storing, processing, securing, and utilizing data across various sources, systems, and applications within and across an organization. As healthcare & life sciences institutions handle a diverse range of data, including clinical data, imaging data, operational, and research data; effective data management becomes crucial for maintaining confidentiality, regulatory compliance, and gaining a competitive edge as well as providing effective care to patients.

Note

Microsoft Fabric is an all-in-one analytics solution for enterprises that covers everything from data movement to data science, Real-Time Analytics, and business intelligence. It offers a comprehensive suite of services, including data lake, data engineering, and data integration, all in one place. Healthcare data solutions in Microsoft Fabric enables healthcare organizations to break down data silos and harmonize their disparate healthcare data in a single unified store where analytics and AI workloads can operate at scale. Leveraging the native capabilities of the platform, health organizations can create connected experiences at each point of care, empower their workforce, and unlock value from clinical and operational data. Healthcare data solutions in Microsoft Fabric are currently in preview and this “Well-Architected” documentation will be updated in a future release to include Healthcare data solutions in Microsoft Fabric.

Data management in Healthcare and Life sciences

As the healthcare industry transitions to a value-based care model with an emphasis on patient-centered care, the volume of patient data generated through immersive experiences has grown significantly. The exponential growth of healthcare data at various touchpoints necessitates a robust data management strategy to effectively manage and utilize this data to generate actionable insights that can improve the overall health of members and patients.

Data management challenges

The healthcare and life sciences industries are complex and dynamic environments that require a high degree of integration and interoperability to function effectively. One main challenge with this industry is that it's traditionally siloed, with different providers and organizations that use different systems and technologies. The absence of integration and interoperability between these systems and technologies has led to inefficiencies, errors, and a lack of continuity in care for patients. Below are some common data management challenges:

  • Data silos: The lack of data sharing between different systems leads to data silos. Healthcare providers have difficulty accessing and sharing patient data, which can lead to a lack of continuity of care.
  • Lack of standardization: Healthcare organizations and life sciences companies use different systems and technologies, making it difficult to communicate and exchange data seamlessly.
  • Complexity of systems: Healthcare and life sciences systems are increasingly complex and a challenge to integrate and interoperate data effectively, which leads to increased costs and delays in care delivery.
  • Malformed or missing data: Malformed or missing data compromise the accuracy and reliability of insights derived from these data.
  • Security and privacy concerns: The security and privacy of patient data is critical to healthcare providers and life sciences companies. Sharing data across different systems can increase the risk of data breaches and compromise patient privacy.
  • Industry regulations: Healthcare and Life sciences industries have the most stringent industry regulations on data handling which makes data sharing and access difficult.
  • De-identification of non-uniform data: De-identification of data is often required by law which is difficult and time-consuming.
  • Geographically unique datasets: It's difficult to transform geographically unique datasets for research (i.e., population health data).

Data management stages

There are various stages involved in managing large data effectively and each stage is equally important to generate high quality actionable insights using the underlying data.

The major stages are the following:

Discovery

Data discovery in the context of healthcare and life sciences refers to the process of identifying the sources of data, data format such as structured, unstructured data, and accessing them. Real-world data and real-world evidence are a few ways to discover data. Real-world data refers to data that is routinely collected from various sources outside of traditional clinical trials, such as electronic health records, claims and billing activities, prescription data, data from wearables, and data collected via patient surveys, or other patient-generated methods. The following image illustrates the most common health care data based on their taxonomy and data standards.

A diagram showing the data estate discovery for health.

Ingestion

Ingestion is the process of connecting, collecting, and controlling the flow of information from various sources identified in the discovery stage. The following images illustrate the different options such as Azure Functions, Logic Apps, Azure Data Factory, etc. provided by Microsoft to ingest various types of information.

  1. The following image shows an ingestion pipeline to ingest IoT data from medical devices such as smart wearables.

A diagram showing an ingestion pipeline to ingest IoT data from medical devices such as smart wearables.

  1. The following image illustrates the idea that medical device generated data might not be in standard format, therefore, it's first normalized and then stored in a FHIR server as a FHIR Observation resource. The steps shown in the diagram are automatically performed by MedTech service in Azure Health Data Services.

A diagram showing the medtech data normalization.

  1. The following image illustrates the ingestion pipeline to work with clinical data, DICOM data, unstructured data, and SDoH.

A diagram illustrates the ingestion pipeline to work with clinical data, DICOM data, unstructured data, and SDoH

Persistence

It's very important to store the ingested data on permanent storage. This allows other applications, such as the machine learning pipeline, to use and generate insights on the data, as well as Power BI to visualize the distribution of data. Microsoft provides various persistence platforms, such as Azure Data Lake storage, Azure Health Data Services, and Microsoft Dataverse, to store your healthcare data. Microsoft also provides APIs such as FHIR service API, DICOM service, MedTech service, Dataverse Healthcare APIs to pull the data into the provided platform.

A diagram showing the CRUD actions.

A diagram showing the data estate persistence for health.

Integration

Integration refers to the process of bringing together different systems, technologies, data sources, and processes to create a cohesive experience for patients, healthcare providers, and other stakeholders. Microsoft Cloud for Healthcare offers ready-to-use tools to integrate the Dataverse data repository with various healthcare data sources such as Azure Health Data Services, non-Microsoft FHIR servers, etc. The following image illustrates the integration of FHIR data to Azure Databricks Delta Lake in Azure Health Data Services. For more information, refer Connecting FHIR Data to Azure Databricks Delta Lake in Azure Health Data Services

A diagram showing the lakehouse integration

Intelligence

Intelligence refers to the process of adding intelligence to our data to draw deeper insights. Microsoft provides tools like Azure Machine Learning, cognitive services, Azure Databricks, Azure Synapse Analytics, etc. to add intelligence to healthcare data.

Analytics

Analytics involves analyzing healthcare data to uncover trends and patterns. Tools like Power BI can be used to visualize trends and patterns in healthcare data to improve clinical decision support and to also enhance operational efficacy.

Below image shows the complete lifecycle of health data:

A diagram showing the complete lifecycle of health data

Below image illustrates the complete lifecycle of MedTech data:

A diagram showing the data estate MedTech complete lifcycle

Data management solutions offered by Microsoft

Microsoft offers a wide set of tools to handle and manage healthcare data. The following table provides a comprehensive list of tools that can be used to manage healthcare data. You can follow the reference link corresponding to each tool for more information.

Data stage Tools Description Benefits Reference link
Ingestion FHIR-Bulk Loader & Export An Azure Function app solution that provides ingesting and exporting FHIR data services. FHIR-Bulk Loader can import hundreds of thousands of files per hour. microsoft/fhir-loader: Bulk FHIR Data Loader
Ingestion FHIR Converter Enables conversion of health data from legacy to FHIR standard Supports the following conversion: 1. HL7v2 to FHIR 2. C-CDA to FHIR 3. JSON to FHIR 4. FHIR STU3 to FHIR R4 microsoft/FHIR-Converter: Conversion utility to translate legacy data formats into FHIR
Ingestion Healthkit-on-FHIR HealthKitOnFhir is a Swift library that automates the export of Apple HealthKit Data to a FHIR® Server. HealthKit data can be routed through the  IoMT FHIR Connector for Azure  for grouping high frequency data to reduce the number of Observation Resources generated. HealthKit Data can also be exported directly to a FHIR Server (appropriate for low frequency data). microsoft/healthkit-on-fhir: HealthKitOnFhir is a Swift library that automates the export of Apple HealthKit Data to a FHIR Server
Persistence Microsoft Cloud for Healthcare Data Model The data models in Microsoft Cloud for Healthcare are based on the Fast Healthcare Interoperability Resources (FHIR) standards framework which are easily deployable in a Dataverse environment. It eases implementation of new use cases and workflows without redefining the healthcare data architecture. The FHIR-based models make Dynamics 365 implementations for healthcare customers easier, quicker, and more secure. Data model overview
Persistence Dataverse Healthcare APIs Supports writing FHIR data to Dataverse entities and reading data from Dataverse entities in FHIR format. Transformation of FHIR data to common data model and vice-versa is automatically handled by the Overview of Dataverse healthcare APIs
Persistence Azure Health Data Services It is a managed platform as a service (PaaS) which provides unified platform to store FHIR, DICOM, and MedTech data. It enables more secure and compliant paths to ingest, persist, and connect health data in the cloud. Get started with Azure Health Data Services
Integration Data Integration Toolkit It provides an extensive collection of default entity maps and attribute maps built to conform to the HL7 FHIR specification which are deployed as Dataverse records. It's highly configurable to accommodate various solution requirements. Overview of Data integration toolkit - Microsoft Cloud for Healthcare
Integration Virtual Health Data Tables Supports bringing data directly from FHIR server into Microsoft Cloud for Healthcare solution without permanently storing the data in the Dataverse entities. Avoids duplication of data and saves storage cost. Overview of virtual health data tables
Intelligence Text Analytics for Health This is a prebuilt feature offered by  Azure AI Language. It is a cloud-based API service that applies machine-learning intelligence to extract and label relevant medical information from a variety of unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records. Text Analytics for health performs four key functions which are named entity recognition, relation extraction, entity linking, and assertion detection, all with a single API call. What is the Text Analytics for health in Azure AI Language? - Azure AI services
Persistence & Analytics Healthcare Database Templates Database templates in Azure Synapse are industry-specific schema definitions that provide a quick method of creating a database known as a lake database that can accelerate building analytics-infused industry applications. You can use these information blueprints to plan, architect, and design data solutions for data governance, reporting, business intelligence, and advanced analytics. Use healthcare database templates with Microsoft Cloud for Healthcare
Analytics FHIR Service Analytics with Azure Databricks Delta Lake Analytics Data Lakehouse is an open data architecture that combines existing features from traditional data lakes and data warehouses. Delta Lake has emerged as the leading storage framework that enables building a Lakehouse architecture on top of existing data lake technologies. Azure Health Data Services enables Lakehouse architectures by exporting parquet files of FHIR data which align to the open  SQL on FHIR  standard.  Building a Lakehouse for FHIR data has these advantages: 1. Combining your FHIR data with other datasets. 2. Having a consistent location of enterprise-ready data enabling more self-service across your organization. 3. Metadata management and versioning of data simplifying data that is often updated. healthcare-apis-samples/src/azuredatabricks-deltalake at main · microsoft/healthcare-apis-samples
Testing Synthea Synthea is a synthetic patient generator that models the medical history of synthetic patients. It outputs high-quality synthetic, realistic but not real, patient data and associated health records covering every aspect of healthcare. The resulting data is free from cost, privacy, and security restrictions. It can be used without restriction for a variety of secondary uses in academia, research, industry, and government. synthetichealth/synthea Wiki

See also

There are some commonly used architectures to work with healthcare data on the Microsoft Cloud for Healthcare platform. These can be used as reference to tailor your exact solutions needed to handle and manage healthcare data. For more information, refer Microsoft Cloud for Healthcare reference architectures..

Learn more about Microsoft Fabric with an end-to-end scenario:

Next steps