Quickstart: Analyze multimodal content (preview)
The Multimodal API analyzes materials containing both image content and text content to help make applications and services safer from harmful user-generated or AI-generated content. Analyzing an image and its associated text content together can preserve context and provide a more comprehensive understanding of the content.
For more information on the way content is filtered, see the Harm categories concept page. For API input limits, see the Input requirements section of the Overview.
Important
This feature is only available in certain Azure regions. See Region availability.
Prerequisites
- An Azure subscription - Create one for free
- Once you have your Azure subscription, create a Content Safety resource in the Azure portal to get your key and endpoint. Enter a unique name for your resource, select your subscription, and select a resource group, supported region, and supported pricing tier. Then select Create.
- The resource takes a few minutes to deploy. After it finishes, Select go to resource. In the left pane, under Resource Management, select Subscription Key and Endpoint. Copy the endpoint and either of the key values to a temporary location for later use.
- One of the following installed:
- cURL for REST API calls.
- Python 3.x installed
Analyze image with text
The following section walks through a sample multimodal moderation request with cURL.
Prepare a sample image
Choose a sample image to analyze, and download it to your device.
See Input requirements for the image limitations. If your format is animated, the service will extract the first frame to do the analysis.
You can input your image by one of two methods: local filestream or blob storage URL.
- Local filestream (recommended): Encode your image to base64. You can use a website like codebeautify to do the encoding. Then save the encoded string to a temporary location.
- Blob storage URL: Upload your image to an Azure Blob Storage account. Follow the blob storage quickstart to learn how to do this. Then open Azure Storage Explorer and get the URL to your image. Save it to a temporary location.
Analyze image with text
Paste the command below into a text editor, and make the following changes.
- Replace
<endpoint>
with your resource endpoint URL. - Replace
<your_subscription_key>
with your key. - Populate the
"image"
field in the body with either a"content"
field or a"blobUrl"
field. For example:{"image": {"content": "<base_64_string>"}
or{"image": {"blobUrl": "<your_storage_url>"}
. - Optionally replace the value of the
"text"
field with your own text you'd like to analyze.
curl --location '<endpoint>/contentsafety/imageWithText:analyze?api-version=2024-09-15-preview ' \
--header 'Ocp-Apim-Subscription-Key: <your_subscription_key>' \
--header 'Content-Type: application/json' \
--data '{
"image": {
"content": "<base_64_string>"
},
"categories": ["Hate","Sexual","Violence","SelfHarm"],
"enableOcr": true,
"text": "I want to kill you"
}'
Note
If you're using a blob storage URL, the request body should look like this:
{
"image": {
"blobUrl": "<your_storage_url>"
}
}
The below fields must be included in the URL:
Name | Required? | Description | Type |
---|---|---|---|
API Version | Required | This is the API version to be checked. Current version is: api-version=2024-09-15 . Example: <endpoint>/contentsafety/imageWithText:analyze?api-version=2024-09-15 |
String |
The parameters in the request body are defined in this table:
Name | Description | Type |
---|---|---|
content or blobUrl | (Required) The content or blob URL of the image. I can be either base64-encoded bytes or a blob URL. If both are given, the request is refused. The maximum allowed size of the image is 7,200 x 7,200 pixels, and the maximum file size is 4 MB. The minimum size of the image is 50 pixels x 50 pixels. | String |
text | (Optional) The text attached to the image. We support at most 1000 characters (unicode code points) in one text request. | String |
enableOcr | (Required) When set to true, our service will perform OCR and analyze the detected text with input image at the same time. We will recognize at most 1000 characters (unicode code points) from input image. The others will be truncated. | Boolean |
categories | (Optional) This is assumed to be an array of category names. See the Harm categories guide for a list of available category names. If no categories are specified, all four categories are used. We use multiple categories to get scores in a single request. | Enum |
Open a command prompt window and run the cURL command.
Output
You should see the image and text moderation results displayed as JSON data in the console. For example:
{
"categoriesAnalysis": [
{
"category": "Hate",
"severity": 2
},
{
"category": "SelfHarm",
"severity": 0
},
{
"category": "Sexual",
"severity": 0
},
{
"category": "Violence",
"severity": 0
}
]
}
The JSON fields in the output are defined here:
Name | Description | Type |
---|---|---|
categoriesAnalysis | Each output class that the API predicts. Classification can be multi-labeled. For example, when an image is uploaded to the image moderation model, it could be classified as both sexual content and violence. Harm categories | String |
Severity | The severity level of the flag in each harm category. Harm categories | Integer |