Tutorial: Debug a skillset using Debug Sessions

Skillsets coordinate a series of actions that analyze or transform content, where the output of one skill becomes the input of another. When inputs depend on outputs, mistakes in skillset definitions and field associations can result in missed operations and data.

Debug sessions in the Azure portal provides a holistic visualization of a skillset. Using this tool, you can drill down to specific steps to easily see where an action might be falling down.

In this article, you'll use Debug sessions to find and fix missing inputs and outputs. The tutorial is all-inclusive. It provides sample data, a Postman collection that creates objects, and instructions for debugging problems in the skillset.

Prerequisites

Before you begin, have the following prerequisites in place:

Note

This tutorial also uses Azure Cognitive Services for language detection, entity recognition, and key phrase extraction. Because the workload is so small, Cognitive Services is tapped behind the scenes for free processing for up to 20 transactions. This means that you can complete this exercise without having to create a billable Cognitive Services resource.

Set up your data

This section creates the sample data set in Azure Blob Storage so that the indexer and skillset have content to work with.

  1. Download sample data (clinical-trials-pdf-19), consisting of 19 files.

  2. Create an Azure storage account or find an existing account.

    • Choose the same region as Azure Cognitive Search to avoid bandwidth charges.

    • Choose the StorageV2 (general purpose V2) account type.

  3. Navigate to the Azure Storage services pages in the portal and create a Blob container. Best practice is to specify the access level "private". Name your container clinicaltrialdataset.

  4. In container, click Upload to upload the sample files you downloaded and unzipped in the first step.

  5. While in the portal, get and save off the connection string for Azure Storage. You'll need it for the REST API calls that index data. You can get the connection string from Settings > Access Keys in the portal.

Get a key and URL

REST calls require the service URL and an access key on every request. A search service is created with both, so if you added Azure Cognitive Search to your subscription, follow these steps to get the necessary information:

  1. Sign in to the Azure portal, and in your search service Overview page, get the URL. An example endpoint might look like https://mydemo.search.windows.net.

  2. In Settings > Keys, get an admin key for full rights on the service. There are two interchangeable admin keys, provided for business continuity in case you need to roll one over. You can use either the primary or secondary key on requests for adding, modifying, and deleting objects.

    Get an HTTP endpoint and access key

All requests require an api-key on every request sent to your service. Having a valid key establishes trust, on a per request basis, between the application sending the request and the service that handles it.

Create data source, skillset, index, and indexer

In this section, Postman and a provided collection are used to create the Cognitive Search data source, skillset, index, and indexer. If you're unfamiliar with Postman, see this quickstart.

You will need the Postman collection created for this tutorial to complete this task.

  1. Start Postman and import the "DebugSessions.postman_collection.json" collection. Under Files > New, select the collection.

  2. After the collection is imported, expand the actions list (...).

  3. Select Edit to set variables used in each request, and then Save.

    Current value Description
    searchService The name of your search service (for example, if the endpoint is https://mydemo.search.windows.net, then the service name is "mydemo".
    apiKey The primary or secondary key obtained from the Keys page of your search service.
    storageConnectionString The connection string obtained from the Access Keys page of your Azure Storage account.
    containerName The name of the container you created for the sample data.
  4. Verify that the collection you imported contains four REST calls, used to create objects in this tutorial.

    • CreateDataSource adds clinical-trials-ds
    • CreateSkillset adds clinical-trials-ss
    • CreateIndex adds clinical-trials
    • CreateIndexer adds clinical-trials-idxr
  5. Open each request in turn, and select Send to send each request to the search service. The last one will take several minutes to complete.

  6. Close Postman and return to the Azure portal.

Check results in the portal

The sample code intentionally creates a buggy index as a consequence of problems that occurred during skillset execution. The problem is that the index is missing data.

  1. In Azure portal, on the search service Overview page, select the Indexes tab.

  2. Select clinical-trials.

  3. Enter this query string: $select=metadata_storage_path, organizations, locations&$count=true to return fields for specific documents (identified by the unique metadata_storage_path field).

  4. Select Search to run the query. You should see empty values for "organizations" and "locations".

These fields should have been populated through the skillset's Entity Recognition skill, used to detect organizations and locations anywhere within the blob's content. In the next exercise, you'll debug the skillset to determine what went wrong.

Another way to investigate errors and warnings is through the Azure portal.

  1. Open the Indexers tab and select clinical-trials-idxr.

    Notice that while the indexer job succeeded overall, there were warnings.

  2. Select Success to view the warnings (if there were mostly errors, the detail link would be Failed). You'll see a long list of every warning emitted by the indexer.

    Screenshot of view warnings.

Start your debug session

  1. From the search service Overview page, click the Debug sessions tab.

  2. Select + New Debug Session.

  3. Give the session a name.

  4. Connect the session to your storage account.

  5. In Indexer template, provide the indexer name. The indexer has references to the data source, the skillset, and index.

  6. Accept the default document choice for the first document in the collection. A debug session only works with a single document. You can choose which document to debug, or just use the first one.

  7. Save the session. Saving the session will kick off the enrichment pipeline as defined by the skillset for the selected document.

    Screenshot of configuring a new debug session.

  8. When the debug session has finished initializing, the session defaults to the AI Enrichments tab, highlighting the Skill Graph. The Skill Graph provides a visual hierarchy of the skillset and its order of execution sequentially and in parallel.

    Screenshot of Debug Session visual editor.

Find issues with the skillset

Any issues reported by the indexer can be found in the adjacent Errors/Warnings tab.

Screenshot of the errors and warnings tab.

Notice that the Errors/Warnings tab will provide a much smaller list than the one displayed earlier because this list is only detailing the errors for a single document. Like the list displayed by the indexer, you can click on a warning message and see the details of this warning.

Select Errors/Warnings to review the notifications. You should see four:

  • "Could not execute skill because one or more skill input was invalid. Required skill input is missing. Name: 'text', Source: '/document/content'."

  • "Could not map output field 'locations' to search index. Check the 'outputFieldMappings' property of your indexer. Missing value '/document/merged_content/locations'."

  • "Could not map output field 'organizations' to search index. Check the 'outputFieldMappings' property of your indexer. Missing value '/document/merged_content/organizations'."

  • "Skill executed but may have unexpected results because one or more skill input was invalid. Optional skill input is missing. Name: 'languageCode', Source: '/document/languageCode'. Expression language parsing issues: Missing value '/document/languageCode'."

Many skills have a "languageCode" parameter. By inspecting the operation, you can see that this language code input is missing from the EntityRecognitionSkillV3.#1, which is the same Entity Recognition skill that is having trouble with 'locations' and 'organizations' output.

Because all four notifications are about this skill, your next step is to debug this skill. If possible, start by solving input issues first before moving on to output issues.

Fix missing skill input values

In the Errors/Warnings tab, there are two missing inputs for an operation labeled EntityRecognitionSkillV3.#1. The detail of the first error explains that a required input for 'text' is missing. The second indicates a problem with an input value "/document/languageCode".

  1. In AI Enrichments > Skill Graph, select the skill labeled #1 to display its details in the right pane.

  2. Select the Executions tab and locate the input for "text".

  3. Select the </> symbol to pop open the Expression Evaluator. The displayed result for this input doesn’t look like a text input. It looks like a series of new line characters \n \n\n\n\n instead of text. The lack of text means that no entities can be identified, so either this document fails to meet the prerequisites of the skill, or there is another input that should be used instead.

    Screenshot of Expression Evaluator for the text input.

  4. Switch the left pane to Enriched Data Structure and scroll down the list of enrichment nodes for this document. Notice the \n \n\n\n\n for "content" has no originating source, but another value for "merged_content" has OCR output. Although there is no indication, the content of this PDF appears to be a JPEG file, as evidenced by the extracted and processed text in "merged_content".

    Screenshot of Enriched Data Structure.

  5. In the right pane, select Executions for the #1 skill and open the Expression Evaluator </> for the input "text".

  6. Change the expression from /document/content to /document/merged_content, and then select Evaluate. Notice that the content is now a chunk of text, and thus actionable for entity recognition.

    Screenshot of Expression Evaluator for fixed merged_content input.

  7. Switch to Skill JSON Editor.

  8. Change /document/content to /document/merged_content.

  9. Select Save in the Skill Details pane.

    Screenshot of the Save command for skillset details.

  10. Select Run in the session's window menu. This will kick off another execution of the skillset using the document.

  11. Once the debug session execution completes, check the Errors/Warnings tab and it will show that the error for text input is gone, but the other warnings remain. The next step is to address the warning about "languageCode".

    Screenshot of updated errors and warnings.

  12. Select the Executions tab and locate the input for "languageCode".

  13. Select the </> symbol to pop open the Expression Evaluator. Notice the confirmation that the "languageCode" property is not a valid input.

    Screenshot of Expression Evaluator for the language input.

There are two ways to research this error. The first is to look at where the input is coming from - what skill in the hierarchy is supposed to produce this result? The Executions tab in the skill details pane should display the source of the input. If there is no source, this indicates a field mapping error.

  1. In the Executions tab, check the INPUTS and find "languageCode". There is no source for this input listed.

  2. Switch the left pane to Enriched Data Structure. Scroll down the list of enrichment nodes for this document. Notice that there is no "languageCode" node, but there is one for "language". So, there is a typo in the skill settings.

    Screenshot of Enriched Data Structure, with language highlighted.

  3. Still in the Enriched Data Structure, open the Expression Evaluator </> for the "language" node and copy the expression /document/language.

  4. In the right pane, select Skill Settings for the #1 skill and open the Expression Evaluator </> for the input "languageCode".

  5. Paste the new value, /document/language into the Expression box and click Evaluate. It should display the correct input "en".

  6. Select Save.

  7. Select Run.

After the debug session execution completes, check the Errors/Warnings tab and it will show that all of the input warnings are gone. There now remain just the two warnings about output fields for organizations and locations.

Fix missing skill output values

The messages say to check the 'outputFieldMappings' property of your indexer, so lets start there.

  1. Go to Skill Graph and select Output Field Mappings. The mappings are actually correct, but normally you would check the index definition to ensure that fields exist for "locations" and "organizations".

    Screenshot of the output field mappings.

  2. If there is no problem with the index, the next step is to check skill outputs. As before, select the Enriched Data Structure, and scroll the nodes to find "locations" and "organizations". Notice that the parent is "content" instead of "merged_content". The context is wrong.

    Screenshot of Enriched Data Structure with wrong context.

  3. Switch back to Skill Graph and select the entity recognition skill.

  4. Navigate the Skill Settings to find "context."

    Screenshot of the context correction in skill setting.

  5. Double-click the setting for "context" and edit it to read '/document/merged_content'.

  6. Select Save.

  7. Select Run.

All of the errors have been resolved.

Commit changes to the skillset

When the debug session was initiated, the search service created a copy of the skillset. This was done to protect the original skillset on your search service. Now that you have finished debugging your skillset, the fixes can be committed (overwrite the original skillset).

Alternatively, if you aren't ready to commit changes, you can save the debug session and reopen it later.

  1. Select Commit changes in the main Debug sessions menu.

  2. Select OK to confirm that you wish to update your skillset.

  3. Close Debug session and select the Indexers tab.

  4. Open your 'clinical-trials-idxr'.

  5. Select Reset.

  6. Select Run. Select OK to confirm.

When the indexer has finished running, there should be a green checkmark and the word Success next to the time stamp for the latest run in the Execution history tab. To ensure that the changes have been applied:

  1. In the search Overview page, select the Index tab.

  2. Open the 'clinical-trials' index and in the Search explorer tab, enter this query string: $select=metadata_storage_path, organizations, locations&$count=true to return fields for specific documents (identified by the unique metadata_storage_path field).

  3. Select Search.

The results should show that organizations and locations are now populated with the expected values.

Clean up resources

When you're working in your own subscription, it's a good idea at the end of a project to identify whether you still need the resources you created. Resources left running can cost you money. You can delete resources individually or delete the resource group to delete the entire set of resources.

You can find and manage resources in the portal, using the All resources or Resource groups link in the left-navigation pane.

If you are using a free service, remember that you are limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.

Next steps

This tutorial touched on various aspects of skillset definition and processing. To learn more about concepts and workflows, refer to the following articles: