AzureOpenAIEmbeddingSkill is not executing

Question

AzureOpenAIEmbeddingSkill is not executing

Nishanth S 0

I'm trying to extract the content from the PDF Using document extraction skill.
This is working as expected.
I'm trying to split that with Split Skillsetskills and pass the output to Azure Open AI Embedding Skillset. But I'm getting the below error.
User's image

Skill Set JSON:

{
    "name": "ccc-bsldata-poc-skillset-ss",
    "description": "Skillset for extracting text from documents",
    "skills": [
        {
            "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
            "context": "/document",
            "inputs": [
                {
                    "name": "file_data",
                    "source": "/document/file_data"
                }
            ],
            "outputs": [
                {
                    "name": "content",
                    "targetName": "/document/content"
                }
            ]
        },
        {
            "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
            "textSplitMode": "pages",
            "maximumPageLength": 2000,
            "defaultLanguageCode": "en",
            "inputs": [
                {
                    "name": "text",
                    "source": "/document/content"
                }
            ],
            "outputs": [
                {
                    "name": "textItems",
                    "targetName": "pages"
                }
            ]
        },
        {
            "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
            "description": "Connects a deployed embedding model.",
            "resourceUri": "https://openai-anticipation-dev-01.openai.azure.com/",
            "deploymentId": "text-embedding-ada-002",
            "apiKey": "bd92e38f9b7c483a82dd26738189491c",
            "inputs": [
                {
                    "name": "text",
                    "source": "/document/pages"
                }
            ],
            "outputs": [
                {
                    "name": "embedding",
                    "targetName": "Contentvector"
                }
            ]
        }
    ]
}

In the first skill set, I'm trying to extract the document text.
In the second skill set, I'm trying to pass the output of the first skill set and split it.
In the third skill set, I'm trying to pass the second skill set output and create embedding.
Indexer JSON:

{
    "name": "
    "dataSourceName": "
    "skillsetName": "
    "targetIndexName": "
    "parameters": {
        "batchSize": null,
        "maxFailedItems": null,
        "maxFailedItemsPerBatch": null,
        "base64EncodeKeys": null,
        "configuration": {
            "indexedFileNameExtensions": ".pdf,.docx",
            "excludedFileNameExtensions": ".png,.jpeg",
            "dataToExtract": "contentAndMetadata",
            "parsingMode": "default"
        }
    },
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_path",
            "targetFieldName": "metadata_storage_path",
            "mappingFunction": {
                "name": "base64Encode"
            }
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/content",
            "targetFieldName": "Content"
        },
        {
            "sourceFieldName": "/document/pages",
            "targetFieldName": "chunks"
        }
    ]
}

Index JSON:

{
    "name": "
    "fields": [
        {
            "name": "Id",
            "type": "Edm.String",
            "filterable": true,
            "sortable": false,
            "facetable": false,
            "key": true
        },
        {
            "name": "Content",
            "type": "Edm.String",
            "filterable": false,
            "sortable": false,
            "facetable": false
        },
        {
            "name": "chunks",
            "type": "Edm.String",
            "filterable": false,
            "sortable": false,
            "facetable": false
        },
        {
            "name": "Contentvector",
            "type": "Collection(Edm.Single)",
            "searchable": true,
            "retrievable": true,
            "dimensions": 1536,
            "vectorSearchProfile": "my-vector-profile"
        }
    ],
    "vectorSearch": {
        "algorithms": [
            {
                "name": "my-vector-config",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "efSearch": 500,
                    "metric": "cosine"
                }
            }
        ],
        "profiles": [
            {
                "name": "my-vector-profile",
                "algorithm": "my-vector-config"
            }
        ]
    },
    "semantic": {
        "configurations": [
            {
                "name": "my-vector-config",
                "prioritizedFields": {
                    "prioritizedContentFields": [
                        {
                            "fieldName": "Content"
                        }
                    ]
                }
            }
        ]
    }
}

Error in executing the skillset:
User's image

Warning: User's image

Ajay Kumar N 28,261 Reputation points Microsoft Employee Moderator

2024-01-29T20:25:55.6166667+00:00

@Nishanth S , Just checking in to see if you had got a chance to see the previous response. If the answer helped (pointed, you in the right direction) > please click Accept Answer Or please share the requested/more info to help you better.

1 answer

Your answer

Ajay Kumar N 28,261 Reputation points Microsoft Employee Moderator

2024-01-29T20:25:55.6166667+00:00

@Nishanth S , Just checking in to see if you had got a chance to see the previous response. If the answer helped (pointed, you in the right direction) > please click Accept Answer Or please share the requested/more info to help you better.

Answer 1

Thanks for posting this question.

To isolate the issue, kindly verify the URL of doc added in Document Key filed, since it shows error parsing the document. Try with different doc/URLs.

Here is the example on how to set a placeholder value when nothing exists.

The error message suggests that the input to the AzureOpenAIEmbeddingSkill is not of the expected type ‘string’. The source of the ‘text’ input is specified as $(/document/pages/*).

As outlined in this documentation, at this time, Azure OpenAI Embedding Skill feature is supported in API version 2023-10-01-Preview. Please ensure the API version is updated.

Kindly let us know, I'll follow-up with you further.

Share via

AzureOpenAIEmbeddingSkill is not executing

1 answer

Your answer