Deploy and configure FHIR data ingestion in healthcare data solutions (preview)

Article
03/14/2024

[This article is prerelease documentation and is subject to change.]

FHIR data ingestion enables you to bring your Fast Healthcare Interoperability Resources (FHIR) data to OneLake from a FHIR service such as Azure Health Data Services. You can deploy and configure the FHIR data ingestion capability after deploying healthcare data solutions (preview) to your Fabric workspace and the Healthcare data foundations capability.

Note

The FHIR data ingestion capability is required to run other healthcare data solutions (preview) capabilities if you're using your own FHIR data. The capability also has a direct dependency on the Healthcare data foundations capability. Before you deploy FHIR data ingestion, ensure that you successfully deploy Healthcare data foundations first.

Deployment prerequisites

If using Azure Health Data Services as your FHIR data source, ensure you complete the following steps on the Azure portal:
- Set up your Azure Health Data Services (FHIR service)
- Set up the Azure function app
Deploy Healthcare data foundations.
If you don't have a FHIR server in your test environment, use the sample data instead. Follow the steps in Deploy sample data to download the sample data into your environment.

Deploy FHIR data ingestion

To deploy FHIR data ingestion to your workspace, follow these steps:

Navigate to the healthcare data solutions home page on Fabric.
Select the FHIR data ingestion tile.
On the capability page, select Deploy to workspace. The deployment includes provisioning the FHIR export service notebook for bringing data from the Azure FHIR service to OneLake.
The deployment can take a few minutes to complete. Refrain from closing the tab or the browser while the deployment is in progress. In the meantime, you can work in another tab.
After the deployment completes, you'll be notified. Select the Manage capability button from the message bar to navigate to the FHIR data ingestion capability management page. You can view, configure, and manage the deployed FHIR export service notebook here.

Configure the FHIR export service

The healthcare#_msft_fhir_export_service notebook, deployed with FHIR data ingestion, uses the bulk $export API provided by Azure Health Data Services to export FHIR data to an Azure Storage container on a recurring basis. The FHIRExportService exports the data from the FHIR server and facilitates the status monitoring process of these exports as per the following protocol:

Export function key extraction: Before export initiation, the service extracts the ExportFunctionKey secret from a user-specified Azure Key Vault.
Triggering the Azure Function: The service uses the extracted ExportFunctionKey secret and triggers the designated Azure Function (function app).
Continuous polling: After the bulk export is successfully requested, the URL in the Content-Location header can be used to periodically poll for the status of the export.
Completion confirmation: The polling process continues until the server returns an HTTP 200 status code when the export operation completes. The operation could also partially succeed.

Before you run this notebook, ensure that you complete configuring the healthcare#_msft_config_notebook as explained in Configure the global configuration notebook. Essentially, ensure you complete the following steps:

Update the key vault parameter value kv_name in the global configuration notebook. This value should point to the key vault service deployed with the Healthcare data solutions in Microsoft Fabric Azure Marketplace offer.

kv_name = "%%keyvault_name%%"
Regardless of your chosen path to ingest data into the lake, ensure that you assign the following required permissions to all the users intending to execute the data pipelines and notebooks:
- Key Vault Secrets User role on the deployed key vault service. For more information, go to Provide access to Key Vault keys, certificates, and secrets with an Azure role-based access control.
- Contributor role on the Fabric workspace. For more information, go to Roles in workspaces in Microsoft Fabric.

The healthcare#_msft_fhir_export_service notebook has the following key configuration parameters:

spark: Spark session.
max_polling_days: The maximum number of days to poll the FHIR server for export to complete. The default value is set to three days. The values can range from one day to seven days.
kv_name: Name of the key vault service. Configure this value in the global configuration notebook.
function_url_secret_name: Name of the secret in the key vault service that contains the function URL. Configure this value too in the global configuration notebook.

Important

By default, all new Fabric workspaces use the latest Fabric runtime version, which is currently Runtime 1.2. However, the solution only supports Runtime 1.1 currently.

Hence, post deploying healthcare data solutions (preview) to your workspace, remember to update the default Fabric runtime version to Runtime 1.1 (Apache Spark 3.3.1 and Delta Lake 2.2.0) before executing any of the pipelines or notebooks. If not, your pipeline or notebook executions will fail.

For more information, see Support for multiple runtimes in Fabric Runtime.

FHIR data ingestion options

To use the FHIR data ingestion capability, you can choose one of the following three data ingestion options:

Use the sample data shipped with healthcare data solutions (preview).
Bring your own data to the Fabric lakehouse.
Ingest data using a FHIR service such as Azure Health Data Services.

Note

Ingesting data using a FHIR service only works with first-party Microsoft FHIR services.

If you wish to proceed with option 2 or 3, you need to ensure that you adhere to the following requirements:

All data must be in NDJSON format.
Each file can only contain data related to one FHIR resource.
Each resource in the file must have a metadata field with a Meta.lastUpdated element value. For more information, go to Resource - FHIR v6.0.0-cibuild.
Data must be organized in one of the following two supported folder formats. The file names and folder names are case sensitive.
- Data can be distributed across multiple folders, with file names beginning with the resource name.
- Data can be stored in folders where the folder names correspond to the resource names.

Also, if you wish to bring in your own data to the Fabric lakehouse or use a FHIR service, review the following configuration sections based on your usage scenario.

Bring in your own data

Consider a scenario where you upload data to the healthcare#_msft_bronze lakehouse under files called "FHIRData". Then, update the source_path_pattern in the healthcare#_msft_raw_bronze_ingestion notebook as displayed in the following examples:

Example 1

source_path_pattern = 'abfss://<workspace_name>@onelake.dfs.fabric.microsoft.com/{bronze_lakehouse_name}.Lakehouse/Files/FHIRData/**/<resource_name>[^a-zA-Z]*ndjson'

Example 2

source_path_pattern = 'abfss://<workspace_name>@onelake.dfs.fabric.microsoft.com/{bronze_lakehouse_name}.Lakehouse/Files/FHIRData/<resource_name>/*ndjson'

Use FHIR service

When you run the healthcare#_msft_fhir_export_service notebook, the data is automatically exported to a container named export-landing-zone in the Azure Storage account. You must create a shortcut of this folder in your storage account in the bronze lakehouse.

For example, let's say the shortcut you create in the bronze lakehouse is named FHIRData. The Azure Health Data Service FHIR service always exports data in the pattern outlined in Example 1 in the previous section. So, the source_path_pattern value should be illustrated in the following format: