Quickstart: custom summarization (preview)

Use this article to get started with creating a custom Summarization project where you can train custom models on top of Summarization. A model is artificial intelligence software that's trained to do a certain task. For this system, the models summarize text and are trained by learning from imported data.

In this article, we use Language Studio to demonstrate key concepts of custom summarization. As an example we’ll build a custom summarization model to extract the Facility or treatment location from short discharge notes.

Prerequisites

Create a new Azure AI Language resource and Azure storage account

Before you can use custom Summarization, you'll need to create an Azure AI Language resource, which will give you the credentials that you need to create a project and start training a model. You'll also need an Azure storage account, where you can upload your dataset that will be used to build your model.

Important

To get started quickly, we recommend creating a new Azure AI Language resource using the steps provided in this article. Using the steps in this article will let you create the Language resource and storage account at the same time, which is easier than doing it later.

Create a new resource from the Azure portal

  1. Go to the Azure portal to create a new Azure AI Language resource.

  2. In the window that appears, select this service from the custom features. Select Continue to create your resource at the bottom of the screen.

    A screenshot showing custom text classification & custom named entity recognition in the Azure portal.

  3. Create a Language resource with following details.

    Name Description
    Subscription Your Azure subscription.
    Resource group A resource group that will contain your resource. You can use an existing one, or create a new one.
    Region The region for your Language resource. For example, "West US 2".
    Name A name for your resource.
    Pricing tier The pricing tier for your Language resource. You can use the Free (F0) tier to try the service.

    Note

    If you get a message saying "your login account is not an owner of the selected storage account's resource group", your account needs to have an owner role assigned on the resource group before you can create a Language resource. Contact your Azure subscription owner for assistance.

  4. In this service's section, select an existing storage account or select New storage account. These values are to help you get started, and not necessarily the storage account values you’ll want to use in production environments. To avoid latency during building your project connect to storage accounts in the same region as your Language resource.

    Storage account value Recommended value
    Storage account name Any name
    Storage account type Standard LRS
  5. Make sure the Responsible AI Notice is checked. Select Review + create at the bottom of the page, then select Create.

Download sample data

If you need sample data, we've provided some for document summarization and conversation summarization scenarios for the purpose of this quickstart.

Upload sample data to blob container

  1. Locate the files to upload to your storage account

  2. In the Azure portal, navigate to the storage account you created, and select it.

  3. In your storage account, select Containers from the left menu, located below Data storage. On the screen that appears, select + Container. Give the container the name example-data and leave the default Public access level.

    A screenshot showing the main page for a storage account.

  4. After your container has been created, select it. Then select Upload button to select the .txt and .json files you downloaded earlier.

    A screenshot showing the button for uploading files to the storage account.

Create a custom summarization project

Once your resource and storage account are configured, create a new custom Summarization project. A project is a work area for building your custom ML models based on your data. Your project can only be accessed by you and others who have access to the Language resource being used.

  1. Sign into the Language Studio. A window will appear to let you select your subscription and Language resource. Select the Language resource you created in the above step.

  2. Select the feature you want to use in Language Studio.

  3. Select Create new project from the top menu in your projects page. Creating a project lets you label data, train, evaluate, improve, and deploy your models.

    A screenshot of the project creation page.

  4. Enter the project information, including a name, description, and the language of the files in your project. If you're using the example dataset, select English. You can't change the name of your project later. Select Next

    Tip

    Your dataset doesn't have to be entirely in the same language. You can have multiple documents, each with different supported languages. If your dataset contains documents of different languages or if you expect text from different languages during runtime, select enable multi-lingual dataset option when you enter the basic information for your project. This option can be enabled later from the Project settings page.

  5. After you select Create new project, a window will appear to let you connect your storage account. If you've already connected a storage account, you will see the storage accounted connected. If not, choose your storage account from the dropdown that appears and select Connect storage account; this will set the required roles for your storage account. This step will possibly return an error if you are not assigned as owner on the storage account.

    Note

    • You only need to do this step once for each new resource you use.
    • This process is irreversible, if you connect a storage account to your Language resource you cannot disconnect it later.
    • You can only connect your Language resource to one storage account.
  6. Select the container where you have uploaded your dataset.

  7. If you have already labeled data make sure it follows the supported format and select Yes, my files are already labeled and I have formatted JSON labels file and select the labels file from the drop-down menu. Select Next. If you are using the dataset from the QuickStart, there is no need to review the formatting of the JSON labels file.

  8. Review the data you entered and select Create Project.

Train your model

After you create a project, you go ahead and start training your model.

To start training your model from within the Language Studio:

  1. Select Training jobs from the left side menu.

  2. Select Start a training job from the top menu.

  3. Select Train a new model and type in the model name in the text box. You can also overwrite an existing model by selecting this option and choosing the model you want to overwrite from the dropdown menu. Overwriting a trained model is irreversible, but it won't affect your deployed models until you deploy the new model.

    Create a new training job

  4. By default, the system will split your labeled data between the training and testing sets, according to specified percentages. If you have documents in your testing set, you can manually split the training and testing data.

  5. Select the Train button.

  6. If you select the Training Job ID from the list, a side pane will appear where you can check the Training progress, Job status, and other details for this job.

    Note

    • Only successfully completed training jobs will generate models.
    • Training can take some time between a couple of minutes and several hours based on the size of your labeled data.
    • You can only have one training job running at a time. You can't start other training job within the same project until the running job is completed.

Deploy your model

Generally after training a model you would review its evaluation details and make improvements if necessary. In this quickstart, you will just deploy your model, and make it available for you to try in Language studio.

To deploy your model from within the Language Studio:

  1. Select Deploying a model from the left side menu.

  2. Select Add deployment to start a new deployment job.

    A screenshot showing the deployment button

  3. Select Create new deployment to create a new deployment and assign a trained model from the dropdown below. You can also Overwrite an existing deployment by selecting this option and select the trained model you want to assign to it from the dropdown below.

    Note

    Overwriting an existing deployment doesn't require changes to your prediction API call but the results you get will be based on the newly assigned model.

    A screenshot showing the deployment screen

  4. Select Deploy to start the deployment job.

  5. After deployment is successful, an expiration date will appear next to it. Deployment expiration is when your deployed model will be unavailable to be used for prediction, which typically happens twelve months after a training configuration expires.

Test your model

For this quickstart, you will use the Language Studio to submit the custom summarization task and visualize the results. In the sample dataset you downloaded earlier, you can find some test documents that you can use in this step.

To test your deployed models from within the Language Studio:

  1. Select Testing deployments from the left side menu.

  2. Select the deployment you want to test. You can only test models that are assigned to deployments.

  3. For multilingual projects, from the language dropdown, select the language of the text you are testing.

  4. Select the deployment you want to query/test from the dropdown.

  5. You can enter the text you want to submit to the request or upload a .txt file to use.

  6. Select Run the test from the top menu.

  7. In the Result tab, you can see the extracted entities from your text and their types. You can also view the JSON response under the JSON tab.

A screenshot showing the model test results.

Clean up resources

When you don't need your project anymore, you can delete your project using Language Studio. Select the feature you're using in the top, and then select the project you want to delete. Select Delete from the top menu to delete the project.

Next steps