AzureOpenAIEmbeddingSkill is not executing

Nishanth S 0 Reputation points
2024-01-24T07:33:13.1633333+00:00

I'm trying to extract the content from the PDF Using document extraction skill.
This is working as expected.
I'm trying to split that with Split Skillsetskills and pass the output to Azure Open AI Embedding Skillset. But I'm getting the below error.
User's image

Skill Set JSON:

{
    "name": "ccc-bsldata-poc-skillset-ss",
    "description": "Skillset for extracting text from documents",
    "skills": [
        {
            "@odata.type": "#Microsoft.Skills.Util.DocumentExtractionSkill",
            "context": "/document",
            "inputs": [
                {
                    "name": "file_data",
                    "source": "/document/file_data"
                }
            ],
            "outputs": [
                {
                    "name": "content",
                    "targetName": "/document/content"
                }
            ]
        },
        {
            "@odata.type": "#Microsoft.Skills.Text.SplitSkill",
            "textSplitMode": "pages",
            "maximumPageLength": 2000,
            "defaultLanguageCode": "en",
            "inputs": [
                {
                    "name": "text",
                    "source": "/document/content"
                }
            ],
            "outputs": [
                {
                    "name": "textItems",
                    "targetName": "pages"
                }
            ]
        },
        {
            "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",
            "description": "Connects a deployed embedding model.",
            "resourceUri": "https://openai-anticipation-dev-01.openai.azure.com/",
            "deploymentId": "text-embedding-ada-002",
            "apiKey": "bd92e38f9b7c483a82dd26738189491c",
            "inputs": [
                {
                    "name": "text",
                    "source": "/document/pages"
                }
            ],
            "outputs": [
                {
                    "name": "embedding",
                    "targetName": "Contentvector"
                }
            ]
        }
    ]
}

In the first skill set, I'm trying to extract the document text.
In the second skill set, I'm trying to pass the output of the first skill set and split it.
In the third skill set, I'm trying to pass the second skill set output and create embedding.
Indexer JSON:

{
    "name": "
    "dataSourceName": "
    "skillsetName": "
    "targetIndexName": "
    "parameters": {
        "batchSize": null,
        "maxFailedItems": null,
        "maxFailedItemsPerBatch": null,
        "base64EncodeKeys": null,
        "configuration": {
            "indexedFileNameExtensions": ".pdf,.docx",
            "excludedFileNameExtensions": ".png,.jpeg",
            "dataToExtract": "contentAndMetadata",
            "parsingMode": "default"
        }
    },
    "fieldMappings": [
        {
            "sourceFieldName": "metadata_storage_path",
            "targetFieldName": "metadata_storage_path",
            "mappingFunction": {
                "name": "base64Encode"
            }
        }
    ],
    "outputFieldMappings": [
        {
            "sourceFieldName": "/document/content",
            "targetFieldName": "Content"
        },
        {
            "sourceFieldName": "/document/pages",
            "targetFieldName": "chunks"
        }
    ]
}

Index JSON:

{
    "name": "
    "fields": [
        {
            "name": "Id",
            "type": "Edm.String",
            "filterable": true,
            "sortable": false,
            "facetable": false,
            "key": true
        },
        {
            "name": "Content",
            "type": "Edm.String",
            "filterable": false,
            "sortable": false,
            "facetable": false
        },
        {
            "name": "chunks",
            "type": "Edm.String",
            "filterable": false,
            "sortable": false,
            "facetable": false
        },
        {
            "name": "Contentvector",
            "type": "Collection(Edm.Single)",
            "searchable": true,
            "retrievable": true,
            "dimensions": 1536,
            "vectorSearchProfile": "my-vector-profile"
        }
    ],
    "vectorSearch": {
        "algorithms": [
            {
                "name": "my-vector-config",
                "kind": "hnsw",
                "hnswParameters": {
                    "m": 4,
                    "efConstruction": 400,
                    "efSearch": 500,
                    "metric": "cosine"
                }
            }
        ],
        "profiles": [
            {
                "name": "my-vector-profile",
                "algorithm": "my-vector-config"
            }
        ]
    },
    "semantic": {
        "configurations": [
            {
                "name": "my-vector-config",
                "prioritizedFields": {
                    "prioritizedContentFields": [
                        {
                            "fieldName": "Content"
                        }
                    ]
                }
            }
        ]
    }
}

Error in executing the skillset:
User's image

Warning:User's image

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,179 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,639 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,111 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ajkuma 27,946 Reputation points Microsoft Employee
    2024-01-25T11:26:14.52+00:00

    Thanks for posting this question.

    To isolate the issue, kindly verify the URL of doc added in Document Key filed, since it shows error parsing the document. Try with different doc/URLs.

    Here is the example on how to set a placeholder value when nothing exists.

    The error message suggests that the input to the AzureOpenAIEmbeddingSkill is not of the expected type ‘string’. The source of the ‘text’ input is specified as $(/document/pages/*).

    As outlined in this documentation, at this time, Azure OpenAI Embedding Skill feature is supported in API version 2023-10-01-Preview. Please ensure the API version is updated.

    Kindly let us know, I'll follow-up with you further.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.