Teilen über


Overview of DICOM data transformation in healthcare data solutions

The DICOM data transformation capability in healthcare data solutions allows you to bring your Digital Imaging and Communications in Medicine (DICOM) data to Fabric OneLake. You can ingest, store, and analyze imaging metadata from various modalities, such as X-rays, Computerized Tomography (CT) scans, and Magnetic Resonance Imaging (MRI) scans. The feature enables collaboration, research and development (R&D), and AI innovation for diverse healthcare and life science use cases. This integration between imaging data and clinical data stored in FHIR (Fast Health Interoperability Resources) format empowers clinicians and researchers to interpret imaging findings within the correct clinical context. This interpretation leads to higher diagnostic accuracy, informative clinical decisions, and improved patient outcomes.

The healthcare data solutions pipelines enable seamless transformation of DICOM (imaging) data into tabular formats that can persist in the lake in FHIR (silver) and OMOP (Observational Medical Outcomes Partnership) (gold) formats. They facilitate conducting exploratory analysis and running large-scale imaging analytics and radiomics. The data transformation process through the imaging ingestion pipeline consists of the following stages:

  1. The pipeline ingests and persists the raw DICOM imaging files, present in the native DCM format, in the bronze lakehouse.
  2. Then, it extracts the DICOM metadata (tags) from the imaging files and inserts them into the bronze lakehouse DICOM metastore for simple querying.
  3. The data in the DICOM metastore is converted to FHIR ImagingStudy delta table NDJSON files, stored in OneLake, and transformed to relational FHIR format (silver lakehouse).
  4. Finally, the data is transformed to the Image_Occurrence delta table in OMOP format (gold lakehouse).

This transformation facilitates scenarios such as:

  • Sharing research datasets with role-based access control.
  • De-identifying text and imaging data for research and collaboration.
  • Using DICOM data to train and validate machine learning models.
  • Using DICOM data for conducting clinical studies, epidemiological analyses, and educational activities.

DICOM data transformation is an optional capability under healthcare data solutions in Microsoft Fabric. You have the flexibility to decide whether or not to use it, depending on your specific needs or scenarios.

To explore this capability and learn about the deployment, configuration, and usage, see:

Conceptual architecture

As explained in Data architecture and management in healthcare data solutions, the capability's foundation lies in the medallion lakehouse architecture. Here's how this framework organizes and processes DICOM data across the three lakehouse layers:

  • Bronze: This first layer stores the source imaging data in its original DICOM format (DCM files) and a metastore that contains the full set of metadata (DICOM tags) extracted from the DCM files.

  • Silver: The silver layer (based on the FHIR specification) stores the imaging metadata sourced from the bronze lakehouse. It also stores referential file links to the DCM file locations in the bronze layer. The imaging metadata and file references are stored in the ImagingStudy delta table, whose schema is based on a flattened format of the ImagingStudy FHIR resource (R4.3).

  • Gold: The gold layer (based on the OMOP specification) stores and transforms the imaging data sourced from the silver lakehouse ImagingStudy delta table. The imaging metadata and file references are stored in the gold Image_Occurrence delta table, whose schema is based on the latest development of data standardization for imaging-based observational research. For more information on this standardization, go to OMOP Common Data Model Extension for medical imaging data.

To understand how the DICOM metadata transforms across different lakehouses and review the transformation mapping, see DICOM metadata transformation in healthcare data solutions.