Custom Skill Set not Populating in final Index Document

Question

Dear Community, need your urgent support here.

Objective

I am trying to create a custom skill in the Azure AI search. The skill is a functions all written in python. In the skill, we are using the metadata_storage_name and using this as a key (both partition and row) to extract dates from Azure Storage Table.

Issue

My functions app is generating the required skill information in the desired Output JSON - Pasting here for your reference:

{"values": [{"recordId": "a1", "data": {"from_date": "2023-07-01T00:00:00+00:00", "to_date": "9999-12-31T00:00:00+00:00"}, "errors": [], "warnings": []}]}

But this information is not getting populated in the final search index.

My Skillset definition and indexer output mapping are as follows:

Skillset Definition

{
      "@odata.type": "#Microsoft.Skills.Custom.WebApiSkill",
      "name": "gettofromdate",
      "description": "custom skill to get to date and from date for each document",
      "context": "/document",
      "uri": "https://envestdates.azurewebsites.net/api/tofromdate?code=functionskey",
      "httpMethod": "POST",
      "timeout": "PT30S",
      "batchSize": 1,
      "degreeOfParallelism": 1,
      "authResourceId": null,
      "inputs": [
        {
          "name": "pds_name",
          "source": "/document/metadata_storage_name"
        }
      ],
      "outputs": [
        {
          "name": "from_date",
          "targetName": "fromdate"
        },
        {
          "name": "to_date",
          "targetName": "todate"
        }
      ],
      "httpHeaders": {},
      "authIdentity": null
    }

Indexer Configuration - Output Field Mapping

"outputFieldMappings": [
    {
      "sourceFieldName": "/document/from_date",
      "targetFieldName": "from_date"
    },
    {
      "sourceFieldName": "/document/to_date",
      "targetFieldName": "to_date"
    }
  ],

Please suggest what am I doing wrong here?

Also, when I run a debug session, I don't get any error or warning and the skill also populates the fields

Thanks for the support

Ujjwal Dalmia

Answer

Hello Ujjwal Dalmia, and Harini Chopperla,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand that you're having similar issue that A Custom Skill in Azure AI Search, using a Python-based Azure Function, the skill generates a valid Output JSON, and the Indexer completes without error, but the search index is not being populated with the data (from_date, to_date).

The issue appears to be a combination of missing function invocations and incremental indexing behavior and if you could implement the steps below here, you should resolve the problem.

Firstly, confirm the function invocation, and make sure the function is invoked:

Check Azure Monitor logs for the function under Monitor > Logs > Invocations.
Ensure the correct function URL and key are used in the skill configuration:

 "httpHeaders": {
         "x-functions-key": "your_function_key"
     }

However, if no invocations are logged, verify the URI in the Skillset definition and test the function independently using tools like Postman to ensure it's reachable and returns the expected output.

Secondly, you will need to reset the indexer to force re-indexing all documents by ensuring the indexer processes all documents:

Use the "resetIndexer" API to reset the indexer state:

 POST https://.search.windows.net/indexers//reset?api-version=2023-10-01-Preview

Alternatively, configure the indexer to always refresh all documents:

  {
       "parameters": {
         "batchSize": 1,
         "maxFailedItems": 0,
         "maxFailedItemsPerBatch": 0,
         "base64EncodeKeys": false,
         "skipDocumentDeletion": false
       }
     }

Thirdly, you will need to validate field mappings and log details for debugging by ensuring output field mappings are aligned:

Skill Output Mapping:

   "outputs": [
          {
            "name": "from_date", 
            "targetName": "from_date"
          },
          {
            "name": "to_date",
            "targetName": "to_date"
          }
        ]

Indexer Output Field Mapping:

   "outputFieldMappings": [
          {
            "sourceFieldName": "/document/from_date",
            "targetFieldName": "from_date"
          },
          {
            "sourceFieldName": "/document/to_date",
            "targetFieldName": "to_date"
          }
        ]

It would be a very great gain to verify that these fields (from_date, to_date) are defined in your search index schema with the correct types (Edm.DateTimeOffset for dates).

Finally, this is an addition to add robust logging to your function and to help you capture function invocation details and spot any discrepancies during execution.

 import logging
   def main(req: func.HttpRequest) -> func.HttpResponse:
       logging.info(f"Received request: {req.get_json()}")
       # Your processing logic...
       return func.HttpResponse(json.dumps(response), mimetype="application/json")

I hope this is helpful! Do not hesitate to let me know if you have any other questions.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

Custom Skill Set not Populating in final Index Document

1 answer

Your answer