Mapping custom skill output to complex type not working as expected using shaper skill

Question

Mapping custom skill output to complex type not working as expected using shaper skill

Das Gupta, Abhijeet 105

problem

I have tried three approaches to map the output of of my custom skill to populate Edm.complex type field in my search index. None seem to populate the field. The need is that each document in the search index contains the following chunk_object field.

index field definition


{"name": "chunk_object",

      "type": "Edm.ComplexType",

      "fields": [

        {

          "name": "chunk_content",

          "type": "Edm.String",

          "searchable": true,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": "standard.lucene",

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        },

        {

          "name": "page_start",

          "type": "Edm.Int64",

          "searchable": false,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": null,

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        },

        {

          "name": "page_end",

          "type": "Edm.Int64",

          "searchable": false,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": null,

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        },

        {

          "name": "chunk_idx",

          "type": "Edm.String",

          "searchable": true,

          "filterable": true,

          "retrievable": true,

          "stored": true,

          "sortable": true,

          "facetable": true,

          "key": false,

          "indexAnalyzer": null,

          "searchAnalyzer": null,

          "analyzer": "standard.lucene",

          "normalizer": null,

          "dimensions": null,

          "vectorSearchProfile": null,

          "vectorEncoding": null,

          "synonymMaps": []

        }

      ]

}

custom skill output

The output of custom skill is mapped to /document/jsonChunks/* . jsonChunks contains 239 objects.


{

    "values": [

        {

            "recordId": "1",

            "data": {

                "jsonChunks": [

                    {

                        "chunk": "this is chunk 1",

                        "page_start": 1,

                        "page_end": 1,

                        "chunk_idx": "#1-file.pdf'"

                    },

                    {

                        "chunk": "this is chunk 2",

                        "page_start": 1,

                        "page_end": 1,

                        "chunk_idx": "#1-file.pdf'"

                    }

                ]

            }

        }

    ]

}

in-memory output of custom skill


-/jsonChunks Object[239]

  -/*

    -/chunk_content

    -/page_start

    -/page_end

    -/chunk_idx

The in-memory enriched data structure for /document/jsonChunks/* is as follows

my approach

I will share the shaper skill definition and the in-memory enriched structure for each approach.

approach 1

skill definition


{

    "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

    "name": "#2",

    "description": "",

    "context": "/document",

    "inputs": [

        {

            "name": "chunk_content",

            "source": "/document/jsonChunks/*/chunk"

        },

        {

            "name": "page_start",

            "source": "/document/jsonChunks/*/page_start"

        },

        {

            "name": "page_end",

            "source": "/document/jsonChunks/*/page_end"

        },

        {

            "name": "chunk_idx",

            "source": "/document/jsonChunks/*/chunk_idx"

        }

    ],

    "outputs": [

        {

            "name": "output",

            "targetName": "chunk_object"

        }

    ]

}

in-memory output


/document/chunk_object Object

  -/chunk_content Object[239]

    -/*

  -/page_start Object[239]

    -/*

  -/page_end Object[239]

    -/*

  -/chunk_idx Object[239]

    -/*

approach 2

skill defintion


{

  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

  "name": "#2",

  "description": "",

  "context": "/document",

  "inputs": [

    {

      "name": "jsonChunk",

      "source": "/document/jsonChunks/*"

    }

  ],

  "outputs": [

    {

      "name": "output",

      "targetName": "chunk_object"

    }

  ]

}

in-memory output


/document/chunk_object Object

  -/jsonChunk Object[239]

    -/*

      -/chunk_content

      -/page_start

      -/page_end

      -/chunk_idx

approach 3

skill defintion


{

  "@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

  "name": "#2",

  "description": "",

  "context": "/document",

  "inputs": [

    {

      "name": "jsonChunk",

      "sourceContext": "/document/jsonChunks/*",

      "inputs": [

        {

          "name": "chunk_object",

          "source": "/document/jsonChunks/*/chunk"

        },

        {

          "name": "page_start",

          "source": "/document/jsonChunks/*/page_start"

        },

        {

          "name": "page_end",

          "source": "/document/jsonChunks/*/page_end"

        },

        {

          "name": "chunk_idx",

          "source": "/document/jsonChunks/*/chunk_idx"

        }

      ]

    }

  ],

  "outputs": [

    {

      "name": "output",

      "targetName": "chunk_object"

    }

  ]

}

in-memory output


/document/chunk_object Object

  -/jsonChunk Object[239]

    -/*

      -/chunk_content

      -/page_start

      -/page_end

      -/chunk_idx

None of my approaches above are working and the index field remains unpopulated. Can anyone please suggest any pointers or the right approach here? TIA

3 answers

Your answer

Answer 1

Konstantinos Passadis 19,586 MVP

Hello @Das Gupta, Abhijeet

The ShaperSkill must correctly shape the output to match the structure expected by the index. From your description, it appears that the main issue lies in how the ShaperSkill is structuring the output

You need to structure the ShaperSkill so that it correctly maps the fields from jsonChunks to the chunk_object array. Here's how you can define it:{

"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

"name": "#2",

"description": "Shapes the output into chunk_object format",

"context": "/document",

"inputs": [

    {

        "name": "chunks",

        "source": "/document/jsonChunks/*"

    }

],

"outputs": [

    {

        "name": "chunk_object",

        "targetName": "chunk_object"

    }

]

}

Expected In-Memory Output

The in-memory enriched data structure after applying this ShaperSkill should look like this:

/document/chunk_object Array[239]

-/*

-/chunk_content

-/page_start

-/page_end

-/chunk_idx

Important Notes :

Ensure that the output of your custom skill (jsonChunks) has all the necessary fields (chunk, page_start, page_end, chunk_idx) correctly populated.
If the custom skill's output JSON is nested differently, you may need to adjust the source paths in the ShaperSkill accordingly.
The Edm.ComplexType field in the index must be defined to match the exact structure and field names produced by the ShaperSkill.

Kindly let us know how it went !

--

I hope this helps!

The answer or portions of it may have been assisted by AI Source: ChatGPT Subscription

Kindly mark the answer as Accepted and Upvote in case it helped!

Regards

Das Gupta, Abhijeet 105 Reputation points

2024-08-05T14:49:05.6666667+00:00

@Konstantinos Passadis thanks for the reply. Can you please correct the shaper skill definition, it looks incomplete. TIA
Konstantinos Passadis 19,586 Reputation points MVP

2024-08-05T15:03:50.3033333+00:00

Hello @Das Gupta, Abhijeet

Done !

I hope it will hep you if not come back to see what we can do !

Regards
Das Gupta, Abhijeet 105 Reputation points

2024-08-05T15:41:09.3366667+00:00
@Konstantinos Passadis thank you for updating your answer. However, the chunk_object index field still shows null. The in-memory enriched structure looks like this:

This is same issue I faced with approaches 2 and 3 in my post.

Since the in-memory is not as expected, I have tried using outputFieldMappings to map the shaper skill output to chunk_object index field which resulted in an error during indexing.

Can you please help with this?

Error

Could not index document because some of the data in the document was not valid. The data field 'chunk_object' in the document with key 'aHR0cHM6Ly9ibG9iNGluZGV4aW5nLmJsb2IuY29yZS53aW5kb3dzLm5ldC9jbS1jaHVua2luZy1ibG9iL1Byb2NlZHVyZSUyMDcwNTAlMjAtJTIwQ3VzdG9tZXIlMjBJZGVudGlmaWNhdGlvbiUyMFByb2dyYW0ucGRm0' has an invalid value of type 'Collection(Edm.ComplexType)' ('JSON arrays with element type 'Object' map to Collection(Edm.ComplexType)'). The expected type was 'Edm.ComplexType'.

Answer 2

Hello @Das Gupta, Abhijeet

Lets try 2 approaches :

Approach 1: Adjust the ShaperSkill to Create a Single Complex Object

If the chunk_object field in your index is meant to hold a single object:

jsonCopy code
{

This configuration assumes that all chunk_content, page_start, page_end, and chunk_idx values from the jsonChunks array should be aggregated into a single object. If multiple jsonChunks should be aggregated, additional logic will be required.

Approach 2: Update the Index Field to Accept a Collection of Complex Types

If your chunk_object field is meant to hold multiple chunks (i.e., an array):

Update the Index Definition:
- Change the chunk_object field type to Collection(Edm.ComplexType) instead of Edm.ComplexType.
Example:
```
   jsonCopy code
   {
```
Use ShaperSkill to Map Each Object Correctly:
- Map each object from jsonChunks directly to the corresponding complex object field in the collection.

--

The answer or portions of it may have been assisted by AI Source: ChatGPT Subscription

Kindly mark the answer as Accepted and Upvote in case it helped or post your feedback to help !

Regards

Das Gupta, Abhijeet 105 Reputation points

2024-08-05T16:02:42.5066667+00:00

@Konstantinos Passadis there seems to be some issue with shaper skill definitions, it looks in complete, can you please update the definitions? appreciate your assistance with all this!

Answer 3

Hello @Das Gupta, Abhijeet

Yes i am sorry

Approach 1 . This configuration assumes that all chunk_content, page_start, page_end, and chunk_idx values from the jsonChunks array should be aggregated into a single object. If multiple jsonChunks should be aggregated, additional logic will be required.

{

"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",

"name": "ShaperSkill",

"description": "Shapes the output into a single chunk_object",

"context": "/document",

"inputs": [

{

"name": "chunk_content",

"source": "/document/jsonChunks/*/chunk"

},

{

"name": "page_start",

"source": "/document/jsonChunks/*/page_start"

},

{

"name": "page_end",

"source": "/document/jsonChunks/*/page_end"

},

{

"name": "chunk_idx",

"source": "/document/jsonChunks/*/chunk_idx"

}

],

"outputs": [

{

"name": "chunk_object",

"targetName": "chunk_object"

}

]

}

**Approach 2. Example (**If your chunk_object field is meant to hold multiple chunks (i.e., an array):

) :

{

"name": "chunk_object",

"type": "Collection(Edm.ComplexType)",

"fields": [

{

"name": "chunk_content",

"type": "Edm.String",

"searchable": true,

"filterable": true,

"retrievable": true,

"stored": true,

"sortable": true,

"facetable": true,

"key": false

},

{

"name": "page_start",

"type": "Edm.Int64",

"searchable": false,

"filterable": true,

"retrievable": true,

"stored": true,

"sortable": true,

"facetable": true,

"key": false

},

{

"name": "page_end",

"type": "Edm.Int64",

"searchable": false,

"filterable": true,

"retrievable": true,

"stored": true,

"sortable": true,

"facetable": true,

"key": false

},

{

"name": "chunk_idx",

"type": "Edm.String",

"searchable": true,

"filterable": true,

"retrievable": true,

"stored": true,

"sortable": true,

"facetable": true,

"key": false

}

]

}

--

The answer or portions of it may have been assisted by AI Source: ChatGPT Subscription

Kindly mark the answer as Accepted and Upvote in case it helped or post your feedback to help !

Regards

Share via

Mapping custom skill output to complex type not working as expected using shaper skill

problem

index field definition

custom skill output

my approach

approach 1

approach 2

approach 3

3 answers

Your answer