Deploy and configure Healthcare data foundations in healthcare data solutions (preview)

[This article is prerelease documentation and is subject to change.]

Healthcare data foundations offer ready-to-run data pipelines that are designed to efficiently structure data for analytics and AI/machine learning modeling. You can deploy and configure the Healthcare data foundations capability after deploying healthcare data solutions (preview) to your Fabric workspace.

Note

The Healthcare data foundations capability is required to run other healthcare data solutions (preview) capabilities. Ensure that you successfully this capability before attempting to deploy other capabilities.

Deployment prerequisites

  • If using Azure Health Data Services as your FHIR data source, ensure you complete the setup steps in Use FHIR service.

  • If you don't have a FHIR server in your test environment, use the sample data instead. Follow the steps in Deploy sample data to download the sample data into your environment.

Deploy Healthcare data foundations

To deploy Healthcare data foundations to your workspace, follow these steps:

  1. Navigate to the healthcare data solutions home page on Fabric.

  2. Select the Healthcare data foundations tile.

    A screenshot displaying the Healthcare data foundations tile.

  3. On the capability page, select Deploy to workspace.

    A screenshot displaying how to deploy the capability to the workspace.

  4. The deployment can take a few minutes to complete. Refrain from closing the tab or the browser while the deployment is in progress. In the meantime, you can work in another tab.

  5. After the deployment completes, you'll be notified. Select the Manage capability button from the message bar to navigate to the Healthcare data foundations capability management page. Here, you can view, configure, and manage the following deployed artifacts (the image is for representational purposes only):

    A screenshot displaying the artifacts.

You can select each lakehouse and notebook artifact to open it and review the details.

Configure the global configuration notebook

The healthcare#_msft_config_notebook deployed with Healthcare data foundations is the global configuration notebook that helps setting up and managing the configuration necessary for all the data transformations in healthcare data solutions (preview). It encompasses various tasks such as configuring workspace parameters and installing essential packages for data processing.

Features

  • Dynamic OneLake endpoint resolution: The script programmatically resolves the OneLake endpoint based on runtime environments.
  • Fabric runtime validation: Ensures compatibility with the supported Fabric runtime for the spark session.
  • Configurable parameters: Comes with a predefined set of configuration.

Configuration

You need to complete configuring the healthcare#_msft_config_notebook before executing any of the pipelines or notebooks included with the healthcare data solutions (preview) capabilities. All the required parameters need to be updated only once in this global configuration notebook.

Following are the key configuration parameters associated with this notebook:

  • Workspace Config: Specifies workspace and solution names, along with the OneLake endpoint. Use a consistent naming convention (name or GUID) for workspace and solution identifiers.

    • workspace_name: Identifier for the workspace, either its GUID or name.
    • solution_name: Identifier for the healthcare workload artifact, formatted as ArtifactName.ArtifactType or ArtifactId.
    • one_lake_endpoint: Identifier for the OneLake endpoint.
  • Lakehouse/Database Config: Information on bronze, silver, and OMOP databases. Use a consistent naming convention (name or GUID) as used in the Workspace Config section.

    • bronze_database_name: Bronze lakehouse identifier.
    • silver_database_name: Silver lakehouse identifier.
    • omop_database_name: OMOP or the gold lakehouse identifier.
  • Secrets and Keys Config: Secret information such as the key vault name and the application insights key.

    • kv_name: Specifies the name of the key vault service containing all the necessary secrets and keys for running the healthcare data solutions (preview) pipelines. This value should point to the key vault service deployed with the Healthcare data solutions in Microsoft Fabric Azure Marketplace offer.
  • Misc Config: Other extra configuration such as whether to skip the package installation or not.

  • Workload Config: You can toggle this value to set it to either True or False. Setting the value to True uses the artifact workload folder, and setting the value to False uses the lakehouse for sample data and transformation configuration.

When you provision this notebook, the settings are configured automatically. However, you have to provide the kv_name value under the Secrets and Keys Config section, as explained in Configure the FHIR export service.

Important

Avoid executing this notebook, as it is executed in other notebooks during setup.

Additional configuration

Deploying Healthcare data foundations also deploys the following lakehouses and notebooks in your environment in addition to the healthcare#_msft_config_notebook. However, these artifacts don't require any specific configuration changes after provisioning, unless you wish to use custom configuration or data.

Lakehouses

Deploying Healthcare data foundations provisions the following lakehouses in your environment:

  • healthcare#_msft_bronze
  • healthcare#_msft_silver
  • healthcare#_msft_gold_omop

The lakehouses enable you to:

  • Upload data from your local machine.
  • Prepare, clean, transform, and ingest data.
  • Ingest data at scale and schedule data workflows.
  • Transform and ingest data using code in Apache Spark.
  • Access data that resides in an external lake.
  • Automatically import tables filled with sample data.

The lakehouse provisioning creates a SQL analytics endpoint for querying and a default Power BI semantic model for faster reporting that updates with any tables added to the lakehouse.

Notebooks

In addition to the healthcare#_msft_config_notebook, deploying Healthcare data foundations also provisions the following notebooks in your environment:

  • healthcare#_msft_raw_bronze_ingestion
  • healthcare#_msft_bronze_silver_flatten
  • healthcare#_msft_silver_sample_flatten_extensions_utility

healthcare#_msft_raw_bronze_ingestion

In the healthcare data solutions (preview) medallion architecture, data is processed using a multi-layered approach. The first layer is bronze which maintains the raw state of the data source. The second layer is silver which represents a validated, enriched version of the data. The third, and final, layer is gold which is highly refined and aggregated. In this notebook, we ingest data into delta tables in the healthcare#_msft_bronze lakehouse.

The structure of the notebook is as follows:

  • Load data and set up: Begin by loading the necessary configuration details that you can specify.
  • Call BronzeIngestionService: After setting up the prerequisites, utilize the BronzeIngestionService module in the healthcare data solutions (preview) library to ingest the data. By default, the service is configured to utilize the provided sample data. If you wish to use your own FHIR data, update the source_path_pattern value to the location of your data.
  • Verify results: View the ingestion results through a call to the newly created table.

Before running this notebook, ensure that you complete configuring the healthcare#_msft_config_notebook using the steps in Configure the global configuration notebook.

Following are the key parameters of the healthcare#_msft_raw_bronze_ingestion notebook:

  • max_files_per_trigger: Maximum number of new files to consider for every trigger. The data type of the value is integer.
  • source_path_pattern: The pattern to use for monitoring source folders. The data type of the value is variable.
    • Default value: The landing zone paths under abfss://{workspace_name}@{one_lake_endpoint}/{bronze_database_name}/Files/landing_zone/**/**/**/<resource_name>[^a-zA-Z]*ndjson

healthcare#_msft_bronze_silver_flatten

In this notebook, we use the SilverIngestionService module in the healthcare data solutions (preview) library to flatten FHIR resources in the healthcare#_msft_bronze lakehouse and to ingest the resulting data into the healthcare#_msft_silver lakehouse. By default, you aren't expected to make any changes to this notebook. In case you prefer pointing to different source and target lakehouses, you can change the values in the healthcare#_msft_config_notebook.

We recommend scheduling this notebook job to run every 4 hours. The initial run might not have data to consume due to concurrent and dependent jobs, leading to latency. Adjusting the frequency of higher layer jobs can reduce this latency.

healthcare#_msft_silver_sample_flatten_extensions_utility

Extensions are child elements that represent more information and can be present in every element in a resource. To learn more about the extension element, go to FHIR extension element.

Currently, the schema supports extensions as strings. This notebook provides examples of how to access this extension data and utilize it within a data frame. There are two avenues for utilizing the data within extensions:

  • Use the parse_extension utility: This utility is used to retrieve specific fields from the full string extension.
  • Use the extension schema: Use the extension schema to parse the entire string extension.

Before you use this notebook, ensure that you complete the bronze and silver ingestion, as this notebook uses the silver database in the samples.

The structure of the notebook is as follows:

  • Load data and set up: Begin by loading the necessary configuration details that you can specify.
  • Parse extension using the parse_extension utility: Use the parse_extension utility to parse an extension and retrieve individual fields.
  • Parse extension using the provided extension schema: Utilize the provided extension schema to parse the entire string extension.

Following are the key parameters for the parse_extension utility:

  • extension: The full string extension column.
  • urlList: A comma delimited list of URLs. Each comma-separated URL represents a nested level depth.
  • value: The value to be retrieved at the specified URL.
  • field: A comma delimited list of fields in case the value is a complex type. If you select multiple fields, they're concatenated with the <-> token.

See also