How to return images in a chunked Azure AI search index

Question

As title.

I used "import and vectorized data" to creat index and the index be automatically chunk.

Index schema like;


 "value": [
    {
      "@search.score": 
      "chunk_id": "",
      "chunk": "",
      "title": "",
      "image": ""
    },

Referring to the official documentation, I used "/document/normalized_images/*/data" to retrieve the base64 data of the normalized images, and then processed it using a program to convert it into image files. However, my objective is to obtain the base64 data corresponding to each chunk. Therefore, I modified the skillset as follows, but it resulted in error messages:

"One or more index projection selectors are invalid. Details: There is no matching index field for input 'image' in index 'name'."

  "indexProjections": {
    "selectors": [
      {
        "targetIndexName": "name",
        "parentKeyFieldName": "parent_id",
        "sourceContext": "/document/pages/*",
        "mappings": [
          {
            "name": "chunk",
            "source": "/document/pages/*",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "vector",
            "source": "/document/pages/*/vector",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "title",
            "source": "/document/metadata_storage_name",
            "sourceContext": null,
            "inputs": []
          },
          {
            "name": "image",
            "sourceContext":"/document/pages/*",
            "inputs": [
							{
								"source":"/document/normalized_images/*/pages/data",
								"name":"imagedata"
							}
						]
		
          }
        ]
      }
    ]

How can I adjust this approach or explore alternative solutions?

Answer

@Alice Cheng To return images in a chunked Azure AI search index, you can modify the skillset to include a custom skill that extracts the base64 data for each chunk and returns it as a field in the search index. Here's an example of how you can modify the skillset:

{
  "name": "my-skillset",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
      "name": "my-custom-skill",
      "description": "Extracts base64 data for each chunk",
      "uri": "https://my-custom-skill.azurewebsites.net/api/extract-image-data",
      "httpMethod": "POST",
      "timeout": "PT30S",
      "batchSize": 1,
      "degreeOfParallelism": null,
      "inputs": [
        {
          "name": "chunk",
          "source": "/document/chunk"
        }
      ],
      "outputs": [
        {
          "name": "imageData",
          "targetName": "imageData"
        }
      ]
    }
  ],
  "cognitiveServices": {
    "imageAnalysis": {
      "apiKey": "my-image-analysis-api-key",
      "endpoint": "https://my-image-analysis-endpoint.cognitiveservices.azure.com/"
    }
  },
  "knowledgeStore": {
    "storageConnectionString": "my-storage-connection-string"
  },
  "defaultBatchSize": null,
  "defaultDegreeOfParallelism": null,
  "defaultTimeout": "PT30S",
  "textExtractionAlgorithm": null,
  "imageAnalysisSkill": null,
  "imageSkills": null,
  "mergeSkill": null,
  "entityRecognitionSkill": null,
  "customWebApiSkills": null,
  "odataContext": null
}

In this example, the skillset includes a custom skill called "my-custom-skill" that extracts the base64 data for each chunk and returns it as a field called "imageData". The custom skill is implemented as a web API that takes the chunk as input and returns the base64 data as output.

To use this skillset, you need to create a web API that implements the custom skill and deploy it to Azure. The web API should take the chunk as input, extract the base64 data for the image, and return it as output. You can then update the skillset touse the custom skill to extract the base64 data for each chunk and return it as a field in the search index.

Once you have deployed the web API and updated the skillset, you can use the search API to query the search index and retrieve the base64 data for each chunk. You can then process the base64 data to convert it into image files or display it in your application.

Share via

How to return images in a chunked Azure AI search index

1 answer