Azure AI Search Indexer + Embedding and SplitSkill not working

Andrew Satanovsky 0 Reputation points
2024-05-20T20:58:31.3666667+00:00

Hello!

When I run my indexer, I get the following error:

"Skill input 'text' was '25281' tokens, which is greater then the maximum allowed '8000' tokens. Consider chunking the text with the SplitSkill in order to be able to generate embeddings for it."

So I added the SplitSkill to my skillset, but now I get this additional error:

"Optional skill input is missing or empty. Name: 'languageCode', Source: '$(/document/language)'.

Expression language parsing issues: Missing or empty value '/document/language'."

My Dataset is reading off my Sharepoint site, and this is what my Skillset JSON looks like (below):

{

  "@odata.context": "https://ch1search.search.windows.net/$metadata#skillsets/$entity",

  "@odata.etag": "\"0x8DC790E2EA903A9\"",

  "name": "skillset1715404472716",

  "description": "",

  "skills": [

    {

      "@odata.type": "#Microsoft.Skills.Text.SplitSkill",

      "name": "#2",

      "description": null,

      "context": "/document",

      "defaultLanguageCode": "en",

      "textSplitMode": "pages",

      "maximumPageLength": 1000,

      "pageOverlapLength": 0,

      "maximumPagesToTake": 0,

      "inputs": [

        {

          "name": "text",

          "source": "/document/content"

        },

        {

          "name": "languageCode",

          "source": "/document/language"

        }

      ],

      "outputs": [

        {

          "name": "textItems",

          "targetName": "pages"

        }

      ]

    },

    {

      "@odata.type": "#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill",

      "name": "EmbeddingSkill",

      "description": "Connects a deployed embedding model.",

      "context": "/document",

      "resourceUri": "https://ch1-chatbot-1025.openai.azure.com",

      "apiKey": "<redacted>",

      "deploymentId": "text-ada-002",

      "inputs": [

        {

          "name": "text",

          "source": "/document/content"

        }

      ],

      "outputs": [

        {

          "name": "embedding",

          "targetName": "vector"

        }

      ],

      "authIdentity": null

    }

  ],

  "cognitiveServices": {

    "@odata.type": "#Microsoft.Azure.Search.DefaultCognitiveServices",

    "description": null

  },

  "knowledgeStore": null,

  "indexProjections": null,

  "encryptionKey": null

}

These are the fields I have in my Index:

User's image

I'm not exactly sure how to proceed with troubleshooting. Any help would be appreciated!

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
767 questions
Azure
Azure
A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.
1,006 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Grmacjon-MSFT 16,956 Reputation points
    2024-05-23T00:37:10.53+00:00

    Hi @Andrew Satanovsky looking at your Skillset JSON, it seems like you have correctly set up the SplitSkill and AzureOpenAIEmbeddingSkill. However, you might want to ensure that the source fields in your inputs arrays correctly correspond to the fields in your documents. For instance, if your documents do not have a content field, "/document/content" would not return any value, leading to errors.

    Also, If your documents do not have a language field, you might need to hardcode the languageCode in your SplitSkill to ‘en’ (for English) or the appropriate language code since you this error "Optional skill input is missing or empty. Name: 'languageCode', Source: '$(/document/language)'.Expression language parsing issues: Missing or empty value '/document/language'."

    When it comes to debugging/troubleshooting, Azure provides a tool called Debug Sessions that can help you identify and resolve errors in your skillset. This tool provides a visualization of your skillset and allows you to drill down to specific steps to see where an action might be failing.

    Hope that helps. Please let us know if you have further questions.

    -Grace

    0 comments No comments