Mapping custom skill output to complex type not working as expected using shaper skill

Das Gupta, Abhijeet 105 Reputation points
2024-08-03T08:56:29.26+00:00

problem

I have tried three approaches to map the output of of my custom skill to populate Edm.complex type field in my search index. None seem to populate the field. The need is that each document in the search index contains the following chunk_object field.

index field definition


{"name": "chunk_object",

      "type": "Edm.ComplexType",

      "fields": [

        {

          "name": "chunk_content",

          "type": "Edm.String",

          "searchable": true,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": "standard.lucene",

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        },

        {

          "name": "page_start",

          "type": "Edm.Int64",

          "searchable": false,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": null,

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        },

        {

          "name": "page_end",

          "type": "Edm.Int64",

          "searchable": false,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": null,

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        },

        {

          "name": "chunk_idx",

          "type": "Edm.String",

          "searchable": true,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": "standard.lucene",

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        }

      ]

}

custom skill output

The output of custom skill is mapped to /document/jsonChunks/* . jsonChunks contains 239 objects.


{

    "values": [

        {

            "recordId": "1",

            "data": {

                "jsonChunks": [

                    {

                        "chunk": "this is chunk 1",

                        "page_start": 1,

                        "page_end": 1,

                        "chunk_idx": "#1-file.pdf'"

                    },

                    {

                        "chunk": "this is chunk 2",

                        "page_start": 1,

                        "page_end": 1,

                        "chunk_idx": "#1-file.pdf'"

                    }

                ]

            }

        }

    ]

}

in-memory output of custom skill


-/jsonChunks Object[239]

  -/*

    -/chunk_content

    -/page_start

    -/page_end

    -/chunk_idx

The in-memory enriched data structure for /document/jsonChunks/* is as follows

my approach

I will share the shaper skill definition and the in-memory enriched structure for each approach.

approach 1

skill definition


{

    "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

    "name": "#2",

    "description": "",

    "context": "/document",

    "inputs": [

        {

            "name": "chunk_content",

            "source": "/document/jsonChunks/*/chunk"

        },

        {

            "name": "page_start",

            "source": "/document/jsonChunks/*/page_start"

        },

        {

            "name": "page_end",

            "source": "/document/jsonChunks/*/page_end"

        },

        {

            "name": "chunk_idx",

            "source": "/document/jsonChunks/*/chunk_idx"

        }

    ],

    "outputs": [

        {

            "name": "output",

            "targetName": "chunk_object"

        }

    ]

}

in-memory output


/document/chunk_object Object

  -/chunk_content Object[239]

    -/*

  -/page_start Object[239]

    -/*

  -/page_end Object[239]

    -/*

  -/chunk_idx Object[239]

    -/*

approach 2

skill defintion


{

  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

  "name": "#2",

  "description": "",

  "context": "/document",

  "inputs": [

    {

      "name": "jsonChunk",

      "source": "/document/jsonChunks/*"

    }

  ],

  "outputs": [

    {

      "name": "output",

      "targetName": "chunk_object"

    }

  ]

}

in-memory output


/document/chunk_object Object

  -/jsonChunk Object[239]

    -/*

      -/chunk_content

      -/page_start

      -/page_end

      -/chunk_idx

approach 3

skill defintion


{

  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

  "name": "#2",

  "description": "",

  "context": "/document",

  "inputs": [

    {

      "name": "jsonChunk",

      "sourceContext": "/document/jsonChunks/*",

      "inputs": [

        {

          "name": "chunk_object",

          "source": "/document/jsonChunks/*/chunk"

        },

        {

          "name": "page_start",

          "source": "/document/jsonChunks/*/page_start"

        },

        {

          "name": "page_end",

          "source": "/document/jsonChunks/*/page_end"

        },

        {

          "name": "chunk_idx",

          "source": "/document/jsonChunks/*/chunk_idx"

        }

      ]

    }

  ],

  "outputs": [

    {

      "name": "output",

      "targetName": "chunk_object"

    }

  ]

}

in-memory output


/document/chunk_object Object

  -/jsonChunk Object[239]

    -/*

      -/chunk_content

      -/page_start

      -/page_end

      -/chunk_idx

None of my approaches above are working and the index field remains unpopulated. Can anyone please suggest any pointers or the right approach here? TIA

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Konstantinos Passadis 19,586 Reputation points MVP
    2024-08-04T15:10:04.57+00:00

    Hello @Das Gupta, Abhijeet

    The ShaperSkill must correctly shape the output to match the structure expected by the index. From your description, it appears that the main issue lies in how the ShaperSkill is structuring the output

    You need to structure the ShaperSkill so that it correctly maps the fields from jsonChunks to the chunk_object array. Here's how you can define it:{

    "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
    
    "name": "#2",
    
    "description": "Shapes the output into chunk_object format",
    
    "context": "/document",
    
    "inputs": [
    
        {
    
            "name": "chunks",
    
            "source": "/document/jsonChunks/*"
    
        }
    
    ],
    
    "outputs": [
    
        {
    
            "name": "chunk_object",
    
            "targetName": "chunk_object"
    
        }
    
    ]
    

    }

    Expected In-Memory Output

    The in-memory enriched data structure after applying this ShaperSkill should look like this:

    /document/chunk_object Array[239]

    -/*

    -/chunk_content
    
    -/page_start
    
    -/page_end
    
    -/chunk_idx
    

    Important Notes :

    1. Ensure that the output of your custom skill (jsonChunks) has all the necessary fields (chunk, page_start, page_end, chunk_idx) correctly populated.
    2. If the custom skill's output JSON is nested differently, you may need to adjust the source paths in the ShaperSkill accordingly.
    3. The Edm.ComplexType field in the index must be defined to match the exact structure and field names produced by the ShaperSkill.

    Kindly let us know how it went !

    --

    I hope this helps!

    The answer or portions of it may have been assisted by AI Source: ChatGPT Subscription

    Kindly mark the answer as Accepted and Upvote in case it helped!

    Regards


  2. Konstantinos Passadis 19,586 Reputation points MVP
    2024-08-05T15:57:29.6333333+00:00

    Hello @Das Gupta, Abhijeet

    Lets try 2 approaches :

    Approach 1: Adjust the ShaperSkill to Create a Single Complex Object

    If the chunk_object field in your index is meant to hold a single object:

    jsonCopy code
    {
    

    This configuration assumes that all chunk_content, page_start, page_end, and chunk_idx values from the jsonChunks array should be aggregated into a single object. If multiple jsonChunks should be aggregated, additional logic will be required.

    Approach 2: Update the Index Field to Accept a Collection of Complex Types

    If your chunk_object field is meant to hold multiple chunks (i.e., an array):

    1. Update the Index Definition:
      • Change the chunk_object field type to Collection(Edm.ComplexType) instead of Edm.ComplexType.
      Example:
         jsonCopy code
         {
      
    2. Use ShaperSkill to Map Each Object Correctly:
      • Map each object from jsonChunks directly to the corresponding complex object field in the collection.

    --

    The answer or portions of it may have been assisted by AI Source: ChatGPT Subscription

    Kindly mark the answer as Accepted and Upvote in case it helped or post your feedback to help !

    Regards


  3. Konstantinos Passadis 19,586 Reputation points MVP
    2024-08-05T17:09:03.04+00:00

    Hello @Das Gupta, Abhijeet

    Yes i am sorry

    Approach 1 . This configuration assumes that all chunk_content, page_start, page_end, and chunk_idx values from the jsonChunks array should be aggregated into a single object. If multiple jsonChunks should be aggregated, additional logic will be required.

    {

    "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

    "name": "ShaperSkill",

    "description": "Shapes the output into a single chunk_object",

    "context": "/document",

    "inputs": [

    {

    "name": "chunk_content",

    "source": "/document/jsonChunks/*/chunk"

    },

    {

    "name": "page_start",

    "source": "/document/jsonChunks/*/page_start"

    },

    {

    "name": "page_end",

    "source": "/document/jsonChunks/*/page_end"

    },

    {

    "name": "chunk_idx",

    "source": "/document/jsonChunks/*/chunk_idx"

    }

    ],

    "outputs": [

    {

    "name": "chunk_object",

    "targetName": "chunk_object"

    }

    ]

    }

    **Approach 2. Example (**If your chunk_object field is meant to hold multiple chunks (i.e., an array):

    1. ) :

    {

    "name": "chunk_object",

    "type": "Collection(Edm.ComplexType)",

    "fields": [

    {

    "name": "chunk_content",

    "type": "Edm.String",

    "searchable": true,

    "filterable": true,

    "retrievable": true,

    "stored": true,

    "sortable": true,

    "facetable": true,

    "key": false

    },

    {

    "name": "page_start",

    "type": "Edm.Int64",

    "searchable": false,

    "filterable": true,

    "retrievable": true,

    "stored": true,

    "sortable": true,

    "facetable": true,

    "key": false

    },

    {

    "name": "page_end",

    "type": "Edm.Int64",

    "searchable": false,

    "filterable": true,

    "retrievable": true,

    "stored": true,

    "sortable": true,

    "facetable": true,

    "key": false

    },

    {

    "name": "chunk_idx",

    "type": "Edm.String",

    "searchable": true,

    "filterable": true,

    "retrievable": true,

    "stored": true,

    "sortable": true,

    "facetable": true,

    "key": false

    }

    ]

    }

    --

    The answer or portions of it may have been assisted by AI Source: ChatGPT Subscription

    Kindly mark the answer as Accepted and Upvote in case it helped or post your feedback to help !

    Regards

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.