Best practices for conversational language understanding

Use the following guidelines to create the best possible projects in conversational language understanding.

Choose a consistent schema

Schema is the definition of your intents and entities. There are different approaches you could take when defining what you should create as an intent versus an entity. There are some questions you need to ask yourself:

  • What actions or queries am I trying to capture from my user?
  • What pieces of information are relevant in each action?

You can typically think of actions and queries as intents, while the information required to fulfill those queries as entities.

For example, assume you want your customers to cancel subscriptions for various products that you offer through your chatbot. You can create a Cancel intent with various examples like "Cancel the Contoso service," or "stop charging me for the Fabrikam subscription." The user's intent here is to cancel, the Contoso service or Fabrikam subscription are the subscriptions they would like to cancel. Therefore, you can create an entity for subscriptions. You can then model your entire project to capture actions as intents and use entities to fill in those actions. This allows you to cancel anything you define as an entity, such as other products. You can then have intents for signing up, renewing, upgrading, etc. that all make use of the subscriptions and other entities.

The above schema design makes it easy for you to extend existing capabilities (canceling, upgrading, signing up) to new targets by creating a new entity.

Another approach is to model the information as intents and actions as entities. Let's take the same example, allowing your customers to cancel subscriptions through your chatbot. You can create an intent for each subscription available, such as Contoso with utterances like "cancel Contoso," "stop charging me for contoso services," "Cancel the Contoso subscription." You would then create an entity to capture the action, cancel. You can define different entities for each action or consolidate actions as one entity with a list component to differentiate between actions with different keys.

This schema design makes it easy for you to extend new actions to existing targets by adding new action entities or entity components.

Make sure to avoid trying to funnel all the concepts into just intents, for example don't try to create a Cancel Contoso intent that only has the purpose of that one specific action. Intents and entities should work together to capture all the required information from the customer.

You also want to avoid mixing different schema designs. Do not build half of your application with actions as intents and the other half with information as intents. Ensure it is consistent to get the possible results.

Balance training data

You should try to keep your schema well balanced when it comes to training data. Including large quantities of one intent, and very few of another will result in a model that is heavily biased towards particular intents.

To address this, you can you may need to downsample your training set, or add to it. Downsampling can be done either by:

  • Getting rid of a certain percentage of the training data randomly.
  • In a more systematic manner by analyzing the dataset, and removing overrepresented duplicate entries.

You can also add to the training set by selecting Suggest Utterances in Data labeling tab in Language studio. Conversational Language Understanding will send a call to Azure OpenAI to generate similar utterances.

A screenshot showing utterance suggestion in Language Studio.

You should also look for unintended "patterns" in the training set. For example, if the training set for a particular intent is all lowercase, or starts with a particular phrase. In such cases, the model you train might learn these unintended biases in the training set instead of being able to generalize.

We recommend introducing casing and punctuation diversity in the training set. If your model is expected to handle variations, be sure to have a training set that also reflects that diversity. For example, include some utterances in proper casing, and some in all lowercase.

Clearly label utterances

  • Ensure that the concepts that your entities refer to are well defined and separable. Check if you can easily determine the differences reliably. If you can't, this may be an indication that the learned component will also have difficulty.

  • If there's a similarity between entities ensure that there's some aspect of your data that provides a signal for the difference between them.

    For example, if you built a model to book flights, a user might use an utterance like "I want a flight from Boston to Seattle." The origin city and destination city for such utterances would be expected to be similar. A signal to differentiate "Origin city" might be that it's often be preceded by the word "from."

  • Ensure that you label all instances of each entity in both your training and testing data. One approach is to use the search function to find all instances of a word or phrase in your data to check if they're correctly labeled.

  • Label test data for entities that have no learned component and also for those that do. This will help ensure that your evaluation metrics are accurate.

Use standard training before advanced training

Standard training is free and faster than Advanced training, making it useful to quickly understand the effect of changing your training set or schema while building the model. Once you are satisfied with the schema, consider using advanced training to get the best AIQ out of your model.

Use the evaluation feature

When you build an app, it's often helpful to catch errors early. It’s usually a good practice to add a test set when building the app, as training and evaluation results are very useful in identifying errors or issues in your schema.

Machine-learning components and composition

See Component types.

Using the "none" score Threshold

If you see too many false positives, such as out-of-context utterances being marked as valid intents, See confidence threshold for information on how it affects inference.

  • Non machine-learned entity components like lists and regex are by definition not contextual. If you see list or regex entities in unintended places, try labeling the list synonyms as the machine-learned component.

  • For entities, you can use learned component as the ‘Required’ component, to restrict when a composed entity should fire.

For example, suppose you have an entity called "ticket quantity" that attempts to extract the number of tickets you want to reserve for booking flights, for utterances such as "Book two tickets tomorrow to Cairo."

Typically, you would add a prebuilt component for Quantity.Number that already extracts all numbers in utterances. However if your entity was only defined with the prebuilt component, it would also extract other numbers as part of the ticket quantity entity, such as "Book two tickets tomorrow to Cairo at 3 PM."

To resolve this, you would label a learned component in your training data for all the numbers that are meant to be a ticket quantity. The entity now has two components:

  • The prebuilt component that can interpret all numbers, and
  • The learned component that predicts where the ticket quantity is in a sentence.

If you require the learned component, make sure that ticket quantity is only returned when the learned component predicts it in the right context. If you also require the prebuilt component, you can then guarantee that the returned ticket quantity entity is both a number and in the correct position.

Addressing casing inconsistencies

If you have poor AI quality and determine the casing used in your training data is dissimilar to the testing data, you can use the normalizeCasing project setting. This normalizes the casing of utterances when training and testing the model. If you've migrated from LUIS, you might recognize that LUIS did this by default.

{
  "projectFileVersion": "2022-10-01-preview",
    ...
    "settings": {
      "confidenceThreshold": 0.5,
      "normalizeCasing": true
    }
...

Addressing model overconfidence

Customers can use the LoraNorm recipe version in case the model is being incorrectly overconfident. An example of this can be like the below (note that the model predicts the incorrect intent with 100% confidence). This makes the confidence threshold project setting unusable.

Text Predicted intent Confidence score
"Who built the Eiffel Tower?" Sports 1.00
"Do I look good to you today?" QueryWeather 1.00
"I hope you have a good evening." Alarm 1.00

To address this, use the 2023-04-15 configuration version that normalizes confidence scores. The confidence threshold project setting can then be adjusted to achieve the desired result.

curl --location 'https://<your-resource>.cognitiveservices.azure.com/language/authoring/analyze-conversations/projects/<your-project>/:train?api-version=2022-10-01-preview' \
--header 'Ocp-Apim-Subscription-Key: <your subscription key>' \
--header 'Content-Type: application/json' \
--data '{
      "modelLabel": "<modelLabel>",
      "trainingMode": "advanced",
      "trainingConfigVersion": "2023-04-15",
      "evaluationOptions": {
            "kind": "percentage",
            "testingSplitPercentage": 0,
            "trainingSplitPercentage": 100
      }
}

Once the request is sent, you can track the progress of the training job in Language Studio as usual.

Note

You have to retrain your model after updating the confidenceThreshold project setting. Afterwards, you'll need to republish the app for the new threshold to take effect.

Normalization in model version 2023-04-15

Model version 2023-04-15, conversational language understanding provides normalization in the inference layer that doesn't affect training.

The normalization layer normalizes the classification confidence scores to a confined range. The range selected currently is from [-a,a] where "a" is the square root of the number of intents. As a result, the normalization depends on the number of intents in the app. If there is a very low number of intents, the normalization layer has a very small range to work with. With a fairly large number of intents, the normalization is more effective.

If this normalization doesn’t seem to help intents that are out of scope to the extent that the confidence threshold can be used to filter out of scope utterances, it might be related to the number of intents in the app. Consider adding more intents to the app, or if you are using an orchestrated architecture, consider merging apps that belong to the same domain together.

Debugging composed entities

Entities are functions that emit spans in your input with an associated type. The function is defined by one or more components. You can mark components as needed, and you can decide whether to enable the combine components setting. When you combine components, all spans that overlap will be merged into a single span. If the setting isn't used, each individual component span will be emitted.

To better understand how individual components are performing, you can disable the setting and set each component to "not required". This lets you inspect the individual spans that are emitted, and experiment with removing components so that only problematic components are generated.

Evaluate a model using multiple test sets

Data in a conversational language understanding project can have two data sets. A "testing" set, and a "training" set. If you want to use multiple test sets to evaluate your model, you can:

  • Give your test sets different names (for example, "test1" and "test2").
  • Export your project to get a JSON file with its parameters and configuration.
  • Use the JSON to import a new project, and rename your second desired test set to "test".
  • Train the model to run the evaluation using your second test set.

Custom parameters for target apps and child apps

If you are using orchestrated apps, you may want to send custom parameter overrides for various child apps. The targetProjectParameters field allows users to send a dictionary representing the parameters for each target project. For example, consider an orchestrator app named Orchestrator orchestrating between a conversational language understanding app named CLU1 and a custom question answering app named CQA1. If you want to send a parameter named "top" to the question answering app, you can use the above parameter.

curl --request POST \
   --url 'https://<your-language-resource>.cognitiveservices.azure.com/language/:analyze-conversations?api-version=2022-10-01-preview' \
   --header 'ocp-apim-subscription-key: <your subscription key>' \
   --data '{
     "kind": "Conversation",
     "analysisInput": {
         "conversationItem": {
             "id": "1",
             "text": "Turn down the volume",
             "modality": "text",
             "language": "en-us",
             "participantId": "1"
         }
     },
     "parameters": {
         "projectName": "Orchestrator",
         "verbose": true,
         "deploymentName": "std",
         "stringIndexType": "TextElement_V8",
"targetProjectParameters": {
            "CQA1": {
                "targetProjectKind": "QuestionAnswering",
                "callingOptions": {
                    "top": 1
                }
             }
         }
     }
 }'

Copy projects across language resources

Often you can copy conversational language understanding projects from one resource to another using the copy button in Azure Language Studio. However in some cases, it might be easier to copy projects using the API.

First, identify the:

  • source project name
  • target project name
  • source language resource
  • target language resource, which is where you want to copy it to.

Call the API to authorize the copy action, and get the accessTokens for the actual copy operation later.

curl --request POST \ 
  --url 'https://<target-language-resource>.cognitiveservices.azure.com//language/authoring/analyze-conversations/projects/<source-project-name>/:authorize-copy?api-version=2023-04-15-preview' \ 
  --header 'Content-Type: application/json' \ 
  --header 'Ocp-Apim-Subscription-Key: <Your-Subscription-Key>' \ 
  --data '{"projectKind":"Conversation","allowOverwrite":false}' 

Call the API to complete the copy operation. Use the response you got earlier as the payload.

curl --request POST \ 
  --url 'https://<source-language-resource>.cognitiveservices.azure.com/language/authoring/analyze-conversations/projects/<source-project-name>/:copy?api-version=2023-04-15-preview' \ 
  --header 'Content-Type: application/json' \ 
  --header 'Ocp-Apim-Subscription-Key: <Your-Subscription-Key>\ 
  --data '{ 
"projectKind": "Conversation", 
"targetProjectName": "<target-project-name>", 
"accessToken": "<access-token>", 
"expiresAt": "<expiry-date>", 
"targetResourceId": "<target-resource-id>", 
"targetResourceRegion": "<target-region>" 
}'