Label your utterances in Language Studio
Once you have built a schema for your project, you should add training utterances to your project. The utterances should be similar to what your users will use when interacting with the project. When you add an utterance, you have to assign which intent it belongs to. After the utterance is added, label the words within your utterance that you want to extract as entities.
Data labeling is a crucial step in development lifecycle; this data will be used in the next step when training your model so that your model can learn from the labeled data. If you already have labeled utterances, you can directly import it into your project, but you need to make sure that your data follows the accepted data format. See create project to learn more about importing labeled data into your project. Labeled data informs the model how to interpret text, and is used for training and evaluation.
Before you can label your data, you need:
- A successfully created project.
See the project development lifecycle for more information.
Data labeling guidelines
After building your schema and creating your project, you will need to label your data. Labeling your data is important so your model knows which words will be associated with the entities you need to extract. You will want to spend time labeling your utterances - introducing and refining the data that will be used to in training your models.
As you add utterances and label them, keep in mind:
The machine learning models generalize based on the labeled examples you provide it; the more examples you provide, the more data points the model has to make better generalizations.
The precision, consistency and completeness of your labeled data are key factors to determining model performance.
- Label precisely: Label each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your labels.
- Label consistently: The same entity should have the same label across all the utterances.
- Label completely: Label all the instances of the entity in all your utterances.
For Multilingual projects, adding utterances in other languages increases the model's performance in these languages, but avoid duplicating your data across all the languages you would like to support. For example, to improve a calender bot's performance with users, a developer might add examples mostly in English, and a few in Spanish or French as well. They might add utterances such as:
- "Set a meeting with Matt and Kevin tomorrow at 12 PM." (English)
- "Reply as tentative to the weekly update meeting." (English)
- "Cancelar mi próxima reunión." (Spanish)
How to label your utterances
Use the following steps to label your utterances:
Go to your project page in Language Studio.
From the left side menu, select Data labeling. In this page, you can start adding your utterance and labeling them. You can also upload your utterance directly by clicking on Upload utterance file from the top menu, make sure it follows the accepted format.
From the top pivots, you can change the view to be training set or testing set. Learn more about training and testing sets and how they're used for model training and evaluation.
If you are planning on using Automatically split the testing set from training data splitting, add all your utterances to the training set.
From the Select intent dropdown menu, select one of the intents, the language of the utterance (for multilingual projects), and the utterance itself. Press the enter key in the utterance's text box to add the utterance.
You have two options to label entities in an utterance:
Option Description Label using a brush Select the brush icon next to an entity in the right pane, then highlight the text in the utterance you want to label. Label using inline menu Highlight the word you want to label as an entity, and a menu will appear. Select the entity you want to label these words with.
In the right side pane, under the Labels pivot, you can find all the entity types in your project and the count of labeled instances per each.
Under the Distribution pivot you can view the distribution across training and testing sets. You have two options for viewing:
- Total instances per labeled entity where you can view count of all labeled instances of a specific entity.
- Unique utterances per labeled entity where each utterance is counted if it contains at least one labeled instance of this entity.
- Utterances per intent where you can view count of utterances per intent.
list and prebuilt components are not shown in the data labeling page, and all labels here only apply to the learned component.
To remove a label:
- From within your utterance, select the entity you want to remove a label from.
- Scroll through the menu that appears, and select Remove label.
To delete or rename an entity:
- Select the entity you want to edit in the right side pane.
- Click on the three dots next to the entity, and select the option you want from the drop-down menu.
Submit and view feedback for