Share via

AzureOpenAIEmbeddingSkill is not executing

Nishanth S 0 Reputation points
2024-01-24T07:33:13.1633333+00:00

I'm trying to extract the content from the PDF Using document extraction skill.
This is working as expected.
I'm trying to split that with Split Skillsetskills and pass the output to Azure Open AI Embedding Skillset. But I'm getting the below error.
User's image

Skill Set JSON:

{
    "name": "ccc-bsldata-poc-skillset-ss",
    "description": "Skillset for extracting text from documents",
    "skills": [
        {
            "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
            "context": "/document",
            "inputs": [
                {
                    "name": "file_data",
                    "source": "/document/file_data"
                }
            ],
            "outputs": [
                {
                    "name": "content",
                    "targetName": "/document/content"
                }
            ]
        },
        {
            "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
            "textSplitMode": "pages",
            "maximumPageLength": 2000,
            "defaultLanguageCode": "en",
            "inputs": [
                {
                    "name": "text",
                    "source": "/document/content"
                }
            ],
            "outputs": [
                {
                    "name": "textItems",
                    "targetName": "pages"
                }
            ]
        },
        {
            "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
            "description": "Connects a deployed embedding model.",
            "resourceUri": "https://openai-anticipation-dev-01.openai.azure.com/",
            "deploymentId": "text-embedding-ada-002",
            "apiKey": "bd92e38f9b7c483a82dd26738189491c",
            "inputs": [
                {
                    "name": "text",
                    "source": "/document/pages"
                }
            ],
            "outputs": [
                {
                    "name": "embedding",
                    "targetName": "Contentvector"
                }
            ]
        }
    ]
}

In the first skill set, I'm trying to extract the document text.
In the second skill set, I'm trying to pass the output of the first skill set and split it.
In the third skill set, I'm trying to pass the second skill set output and create embedding.
Indexer JSON:

{
    "name": "
    "dataSourceName": "
    "skillsetName": "
    "targetIndexName": "
    "parameters": {
        "batchSize": null,
        "maxFailedItems": null,
        "maxFailedItemsPerBatch": null,
        "base64EncodeKeys": null,
        "configuration": {
            "indexedFileNameExtensions": ".pdf,.docx",
            "excludedFileNameExtensions": ".png,.jpeg",
            "dataToExtract": "contentAndMetadata",
            "parsingMode": "default"
        }
    },
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_path",
            "targetFieldName": "metadata_storage_path",
            "mappingFunction": {
                "name": "base64Encode"
            }
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/content",
            "targetFieldName": "Content"
        },
        {
            "sourceFieldName": "/document/pages",
            "targetFieldName": "chunks"
        }
    ]
}

Index JSON:

{
    "name": "
    "fields": [
        {
            "name": "Id",
            "type": "Edm.String",
            "filterable": true,
            "sortable": false,
            "facetable": false,
            "key": true
        },
        {
            "name": "Content",
            "type": "Edm.String",
            "filterable": false,
            "sortable": false,
            "facetable": false
        },
        {
            "name": "chunks",
            "type": "Edm.String",
            "filterable": false,
            "sortable": false,
            "facetable": false
        },
        {
            "name": "Contentvector",
            "type": "Collection(Edm.Single)",
            "searchable": true,
            "retrievable": true,
            "dimensions": 1536,
            "vectorSearchProfile": "my-vector-profile"
        }
    ],
    "vectorSearch": {
        "algorithms": [
            {
                "name": "my-vector-config",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "efSearch": 500,
                    "metric": "cosine"
                }
            }
        ],
        "profiles": [
            {
                "name": "my-vector-profile",
                "algorithm": "my-vector-config"
            }
        ]
    },
    "semantic": {
        "configurations": [
            {
                "name": "my-vector-config",
                "prioritizedFields": {
                    "prioritizedContentFields": [
                        {
                            "fieldName": "Content"
                        }
                    ]
                }
            }
        ]
    }
}

Error in executing the skillset:
User's image

Warning:User's image

Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

Azure OpenAI in Foundry Models
Foundry Tools
Foundry Tools

Formerly known as Azure AI Services or Azure Cognitive Services is a unified collection of prebuilt AI capabilities within the Microsoft Foundry platform


1 answer

Sort by: Most helpful
  1. Ajay Kumar N 28,261 Reputation points Microsoft Employee Moderator
    2024-01-25T11:26:14.52+00:00

    Thanks for posting this question.

    To isolate the issue, kindly verify the URL of doc added in Document Key filed, since it shows error parsing the document. Try with different doc/URLs.

    Here is the example on how to set a placeholder value when nothing exists.

    The error message suggests that the input to the AzureOpenAIEmbeddingSkill is not of the expected type ‘string’. The source of the ‘text’ input is specified as $(/document/pages/*).

    As outlined in this documentation, at this time, Azure OpenAI Embedding Skill feature is supported in API version 2023-10-01-Preview. Please ensure the API version is updated.

    Kindly let us know, I'll follow-up with you further.

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.