Use Unstructured clinical notes enrichment in healthcare data solutions (preview)

Artigo
03/14/2024

[This article is prerelease documentation and is subject to change.]

Unstructured clinical notes enrichment is a capability that uses Azure AI Language's Text Analytics for health service to extract key Fast Healthcare Interoperability Resources (FHIR) entities from unstructured clinical notes and create structured data from these clinical notes. This structured data can then be analyzed to derive insights, predictions, and quality measures aimed at enhancing patient health outcomes.

To learn more about the capability and understand how to deploy and configure it, go to:

Unstructured clinical notes enrichment is an optional capability with healthcare data solutions in Microsoft Fabric (preview), which deploys the healthcare#_msft_silver_ta4h notebook within your Fabric workspace. However, the capability has a direct dependency on the Healthcare data foundations capability. Ensure that you successfully deploy, configure, and execute the Healthcare data foundations pipelines first.

Prerequisites

You need to ensure you have the following requirements before executing the healthcare#_msft_silver_ta4h notebook:

The Azure language service is deployed and configured, as explained in Set up Azure Language service.
Deployment, configuration, and execution are completed for the Healthcare data foundations capability, as explained in Deploy and configure Healthcare data foundations and Use Healthcare data foundations.
Unstructured clinical notes enrichment is successfully deployed following the steps in Deploy and configure Unstructured clinical notes enrichment.

NLP ingestion service

The healthcare#_msft_silver_ta4h notebook executes the NLPIngestionService module in the healthcare data solutions (preview) library to invoke the Text Analytics for health service. This service extracts unstructured clinical notes from the FHIR resource DocumentReference.Content to create a flattened output.

You can review the notebook configuration in Configure the healthcare#_msft_silver notebook.

Data storage in silver layer

Post the natural language processing (NLP) API analysis, the structured and flattened output is then stored in the following native tables within the healthcare#_msft_silver lakehouse:

nlpentity: Contains the flattened entities extracted from the unstructured clinical notes. Each row is a single term extracted from the unstructured text after performing the text analysis.
nlprelationship: Provides the relationship between the extracted entities.
nlpfhir: Contains the FHIR output bundle as a JSON string.

To track the last updated timestamp, the NLPIngestionService uses the parent_meta_lastUpdated field in all the three silver lakehouse tables. This tracking ensures that the source document DocumentReference, which is the parent resource, is first stored to maintain referential integrity. This process helps prevent inconsistencies in the data and orphaned resources.

Important

Currently, Text Analytics for health returns vocabularies listed in the UMLS Metathesaurus Vocabulary Documentation. For guidance on these vocabularies, see Import data from UMLS.

For the preview release, we'll use the SNOMED-CT (Systematized Nomenclature of Medicine - Clinical Terms), LOINC (Logical Observation Identifiers, Names, and Codes), and RxNorm terminologies that are included with the sample dataset based on guidance from Observational Health Data Sciences and Informatics (OHDSI).

OMOP transformation

Healthcare data solutions (preview) also provide another capability for Observational Medical Outcomes Partnership (OMOP) analytics. When you execute this capability, the underlying transformation from the silver lakehouse to the OMOP gold lakehouse also transforms the structured and flattened output of the unstructured clinical notes analysis. The transformation reads from the nlpentity table in the silver lakehouse and maps the output to the NOTE_NLP table in the OMOP gold lakehouse.

For more information, go to Overview of OMOP analytics.

Here's the schema for the structured NLP outputs, with the corresponding NOTE_NLP column mapping to the OMOP common data model:

Flattened document reference	Description	Note_NLP mapping	Sample data
id	Unique identifier for the entity. Composite key of `parent_id`, `offset`, and `length`.	`note_nlp_id`	1380
parent_id	A foreign key to the flattened documentreferencecontent text the term was extracted from.	`note_id`	625
text	Entity text as appears in the document.	`lexical_variant`	No Known Allergies
Offset	Character offset of the extracted term in the input documentreferencecontent text.	`offset`	294
data_source_entity_id	ID of the entity in the given source catalog.	`note_nlp_concept_id` and `note_nlp_source_concept_id`	37396387
nlp_last_executed	The date of the documentreferencecontent text analysis processing.	`nlp_date_time` and `nlp_date`	2023-05-17T00:00:00.0000000
model	Name and version of the NLP system (Name of the Text Analytics for health NLP system and the version).	`nlp_system`	MSFT TA4H

Service limits for Text Analytics for health

Maximum number of characters per document is limited to 125,000.
Maximum size of documents contained in the entire request is limited to 1 MB.
Maximum number of documents per request is limited to:
- 25 for the web-based API.
- 1000 for the container.

Enable logs

Follow these steps to enable request and response logging for the Text Analytics for health API:

Enable the diagnostic settings for your Azure Language service resource using the instructions in Enable diagnostic logging for Azure AI services. This resource is the same language service that you created during the Set up Azure Language service deployment step.
- Enter a diagnostic setting name.
- Set the category to Request and Response Logs.
- For destination details, select Send to Log Analytics workspace, and select the workspace deployed with the Azure Marketplace offer for healthcare data solutions (preview). For more information, go to Deploy Azure Marketplace offer.
- Save the settings.
Navigate to the NLP Config section in the NLP ingestion service notebook. Update the value of the configuration parameter enable_text_analytics_logs to True. For more information about this configuration, see Configure the healthcare#_msft_silver_ta4h notebook.

View logs in Azure Log Analytics

To explore the log analytics data:

Navigate to the Log Analytics workspace.
Locate and select Logs. From this page, you can run queries against your logs.

Sample query

Following is a basic Kusto query that you can use to explore your log data. This sample query retrieves all the failed requests from the Azure Cognitive Services resource provider in the past day, grouped by error type:

AzureDiagnostics
| where TimeGenerated > ago(1d)
| where Category == "RequestResponse"
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where tostring(ResultSignature) startswith "4" or tostring(ResultSignature) startswith "5"
| summarize NumberOfFailedRequests = count() by ResultSignature

Compartilhar via