Compartilhar via


Use Unstructured clinical notes enrichment in healthcare data solutions (preview)

[This article is prerelease documentation and is subject to change.]

Unstructured clinical notes enrichment is a capability that uses Azure AI Language's Text Analytics for health service to extract key Fast Healthcare Interoperability Resources (FHIR) entities from unstructured clinical notes and create structured data from these clinical notes. This structured data can then be analyzed to derive insights, predictions, and quality measures aimed at enhancing patient health outcomes.

To learn more about the capability and understand how to deploy and configure it, go to:

Unstructured clinical notes enrichment is an optional capability with healthcare data solutions in Microsoft Fabric (preview), which deploys the healthcare#_msft_silver_ta4h notebook within your Fabric workspace. However, the capability has a direct dependency on the Healthcare data foundations capability. Ensure that you successfully deploy, configure, and execute the Healthcare data foundations pipelines first.

Prerequisites

You need to ensure you have the following requirements before executing the healthcare#_msft_silver_ta4h notebook:

NLP ingestion service

The healthcare#_msft_silver_ta4h notebook executes the NLPIngestionService module in the healthcare data solutions (preview) library to invoke the Text Analytics for health service. This service extracts unstructured clinical notes from the FHIR resource DocumentReference.Content to create a flattened output.

You can review the notebook configuration in Configure the healthcare#_msft_silver notebook.

Data storage in silver layer

Post the natural language processing (NLP) API analysis, the structured and flattened output is then stored in the following native tables within the healthcare#_msft_silver lakehouse:

  • nlpentity: Contains the flattened entities extracted from the unstructured clinical notes. Each row is a single term extracted from the unstructured text after performing the text analysis.
  • nlprelationship: Provides the relationship between the extracted entities.
  • nlpfhir: Contains the FHIR output bundle as a JSON string.

To track the last updated timestamp, the NLPIngestionService uses the parent_meta_lastUpdated field in all the three silver lakehouse tables. This tracking ensures that the source document DocumentReference, which is the parent resource, is first stored to maintain referential integrity. This process helps prevent inconsistencies in the data and orphaned resources.

Important

Currently, Text Analytics for health returns vocabularies listed in the UMLS Metathesaurus Vocabulary Documentation. For guidance on these vocabularies, see Import data from UMLS.

For the preview release, we'll use the SNOMED-CT (Systematized Nomenclature of Medicine - Clinical Terms), LOINC (Logical Observation Identifiers, Names, and Codes), and RxNorm terminologies that are included with the sample dataset based on guidance from Observational Health Data Sciences and Informatics (OHDSI).

OMOP transformation

Healthcare data solutions (preview) also provide another capability for Observational Medical Outcomes Partnership (OMOP) analytics. When you execute this capability, the underlying transformation from the silver lakehouse to the OMOP gold lakehouse also transforms the structured and flattened output of the unstructured clinical notes analysis. The transformation reads from the nlpentity table in the silver lakehouse and maps the output to the NOTE_NLP table in the OMOP gold lakehouse.

For more information, go to Overview of OMOP analytics.

Here's the schema for the structured NLP outputs, with the corresponding NOTE_NLP column mapping to the OMOP common data model:

Flattened document reference Description Note_NLP mapping Sample data
id Unique identifier for the entity. Composite key of parent_id, offset, and length. note_nlp_id 1380
parent_id A foreign key to the flattened documentreferencecontent text the term was extracted from. note_id 625
text Entity text as appears in the document. lexical_variant No Known Allergies
Offset Character offset of the extracted term in the input documentreferencecontent text. offset 294
data_source_entity_id ID of the entity in the given source catalog. note_nlp_concept_id and note_nlp_source_concept_id 37396387
nlp_last_executed The date of the documentreferencecontent text analysis processing. nlp_date_time and nlp_date 2023-05-17T00:00:00.0000000
model Name and version of the NLP system (Name of the Text Analytics for health NLP system and the version). nlp_system MSFT TA4H

An image displaying the schema with sample NLP data.

Service limits for Text Analytics for health

  • Maximum number of characters per document is limited to 125,000.
  • Maximum size of documents contained in the entire request is limited to 1 MB.
  • Maximum number of documents per request is limited to:
    • 25 for the web-based API.
    • 1000 for the container.

Enable logs

Follow these steps to enable request and response logging for the Text Analytics for health API:

  • Enable the diagnostic settings for your Azure Language service resource using the instructions in Enable diagnostic logging for Azure AI services. This resource is the same language service that you created during the Set up Azure Language service deployment step.

    • Enter a diagnostic setting name.
    • Set the category to Request and Response Logs.
    • For destination details, select Send to Log Analytics workspace, and select the workspace deployed with the Azure Marketplace offer for healthcare data solutions (preview). For more information, go to Deploy Azure Marketplace offer.
    • Save the settings.

    A screenshot displaying the language service diagnostic settings.

  • Navigate to the NLP Config section in the NLP ingestion service notebook. Update the value of the configuration parameter enable_text_analytics_logs to True. For more information about this configuration, see Configure the healthcare#_msft_silver_ta4h notebook.

View logs in Azure Log Analytics

To explore the log analytics data:

  • Navigate to the Log Analytics workspace.
  • Locate and select Logs. From this page, you can run queries against your logs.

Sample query

Following is a basic Kusto query that you can use to explore your log data. This sample query retrieves all the failed requests from the Azure Cognitive Services resource provider in the past day, grouped by error type:

AzureDiagnostics
| where TimeGenerated > ago(1d)
| where Category == "RequestResponse"
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where tostring(ResultSignature) startswith "4" or tostring(ResultSignature) startswith "5"
| summarize NumberOfFailedRequests = count() by ResultSignature

See also