base64Encode and base64EncodeKeys doesn't work for key field?

Yi Zeng (CAFE) 25 Reputation points Microsoft Employee
2024-08-12T11:42:03.21+00:00

I'm using a template and needs the value of key field to be base64 string. But with text chunking step in Skillset, it automatically generate a value for key field by splicing the id of the parent (that is, the id of the original file before the chunk) and the chunk id, a value in the following format is generated which is not a base64 string:

"Id": "baa70f183116_aHR0cHM6Ly9taXJhZG9jLmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2NzL1JvdXRpbmctQWNjdXJhY3ktcmVncmVzc2lvbi1kZWJ1Z2dpbmcuZG9jeA2_pages_1"

I tried setting base64EncodeKeys as true and adding base64Encode in fieldMappings/outputFieldMappings in indexer. But none of them worked.

And the allowed format in the portal(sourceFieldName needs to start with /document which is from data source) looks different from the official doc(https://learn.microsoft.com/en-us/azure/search/search-indexer-field-mappings?tabs=rest) and video(https://www.youtube.com/watch?v=41DSwTYRENs)——In official doc and video, the sourceFieldName doesn't need to tart with /document and could be column name in the index schema. I also tried "/document/Id" but didn't work. What should I do to make my "Id" base64? Thanks for your help!!!image (94)

image (95)

My index schema is (Key is "Id"):

image (93)

My Skillset:

{

  ``"name"``: ``"my-skillset"``,

  ``"description"``: ``"xxx"``,

  ``"skills"``: [

    {

      ``"@odata.type"``: ``"#Microsoft.Skills.Text.LanguageDetectionSkill"``,

      ``"name"``: ``"#1"``,

      ``"description"``: ``null``,

      ``"context"``: ``"/document"``,

      ``"defaultCountryHint"``: ``null``,

      ``"modelVersion"``: ``null``,

      ``"inputs"``: [

        {

          ``"name"``: ``"text"``,

          ``"source"``: ``"/document/content"

        }

      ],

      ``"outputs"``: [

        {

          ``"name"``: ``"languageCode"``,

          ``"targetName"``: ``"languageCode"

        }

      ]

    },

    {

      ``"@odata.type"``: ``"#Microsoft.Skills.Text.SplitSkill"``,

      ``"name"``: ``"#2"``,

      ``"description"``: ``"Split skill to chunk documents"``,

      ``"context"``: ``"/document"``,

      ``"defaultLanguageCode"``: ``"en"``,

      ``"textSplitMode"``: ``"pages"``,

      ``"maximumPageLength"``: ``2000``,

      ``"pageOverlapLength"``: ``500``,

      ``"maximumPagesToTake"``: ``0``,

      ``"inputs"``: [

        {

          ``"name"``: ``"text"``,

          ``"source"``: ``"/document/content"

        }

      ],

      ``"outputs"``: [

        {

          ``"name"``: ``"textItems"``,

          ``"targetName"``: ``"pages"

        }

      ]

    },

    {

      ``"@odata.type"``: ``"#Microsoft.Skills.Text.KeyPhraseExtractionSkill"``,

      ``"name"``: ``"#3"``,

      ``"description"``: ``""``,

      ``"context"``: ``"/document/pages/*"``,

      ``"defaultLanguageCode"``: ``"en"``,

      ``"maxKeyPhraseCount"``: ``null``,

      ``"modelVersion"``: ``""``,

      ``"inputs"``: [

        {

          ``"name"``: ``"text"``,

          ``"source"``: ``"/document/pages/*"

        },

        {

          ``"name"``: ``"languageCode"``,

          ``"source"``: ``"/document/languageCode"

        }

      ],

      ``"outputs"``: [

        {

          ``"name"``: ``"keyPhrases"``,

          ``"targetName"``: ``"keyPhrases"

        }

      ]

    },

    {

      ``"@odata.type"``: ``"#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill"``,

      ``"name"``: ``"#4"``,

      ``"description"``: ``null``,

      ``"context"``: ``"/document/pages/*"``,

      ``"resourceUri"``: ``"https://miraonboardingopenai.openai.azure.com"``,

      ``"apiKey"``: ``null``,

      ``"deploymentId"``: ``"text-embedding-ada-002"``,

      ``"dimensions"``: ``1536``,

      ``"modelName"``: ``"text-embedding-ada-002"``,

      ``"inputs"``: [

        {

          ``"name"``: ``"text"``,

          ``"source"``: ``"/document/pages/*"

        }

      ],

      ``"outputs"``: [

        {

          ``"name"``: ``"embedding"``,

          ``"targetName"``: ``"Embedding"

        }

      ],

      ``"authIdentity"``: ``null

    }

  ],

  ``"cognitiveServices"``: {

    ``"@odata.type"``: ``"#Microsoft.Azure.Search.CognitiveServicesByKey"``,

    ``"description"``: ``null``,

    ``"key"``: ``null

  },

  ``"knowledgeStore"``: ``null``,

  ``"indexProjections"``: {

    ``"selectors"``: [

      {

        ``"targetIndexName"``: ``"mirabotnew"``,

        ``"parentKeyFieldName"``: ``"ExternalSourceName"``,

        ``"sourceContext"``: ``"/document/pages/*"``,

        ``"mappings"``: [

          {

            ``"name"``: ``"Embedding"``,

            ``"source"``: ``"/document/pages/*/Embedding"``,

            ``"sourceContext"``: ``null``,

            ``"inputs"``: []

          },

          {

            ``"name"``: ``"Text"``,

            ``"source"``: ``"/document/pages/*"``,

            ``"sourceContext"``: ``null``,

            ``"inputs"``: []

          },

          {

            ``"name"``: ``"AdditionalMetadata"``,

            ``"source"``: ``"/document/metadata_storage_path"``,

            ``"sourceContext"``: ``null``,

            ``"inputs"``: []

          },

          {

            ``"name"``: ``"Description"``,

            ``"source"``: ``"/document/pages/*/keyPhrases"``,

            ``"sourceContext"``: ``null``,

            ``"inputs"``: []

          }

        ]

      }

    ],

    ``"parameters"``: {

      ``"projectionMode"``: ``"skipIndexingParentDocuments"

    }

  },

  ``"encryptionKey"``: ``null

}

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
{count} votes

Accepted answer
  1. ajkuma 28,036 Reputation points Microsoft Employee Moderator
    2024-09-02T18:10:39.75+00:00

    To benefit the community, post our offline discussions:

    Ask: I want a base64 identity for each child item(chunk after text-chunking) instead of base64 identity for parent item(original text). Is storage_name64 here is unique for each parent item but not each child item.

    That is correct. Your scenario is only available today if you implement this functionality via a custom skill: Custom Web API skill in skillsets - Azure AI Search | Microsoft Learn.

    You can add some custom code through the skill so it outputs to your desired functionality and add the input of your chunk after text-chunking to the custom skill and create the transformation and send the output to the index with the index projections.


    If my answer helped (pointed, you in the right direction) > please click Accept Answer - it will benefit the community/users to find the answer quickly.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. ajkuma 28,036 Reputation points Microsoft Employee Moderator
    2024-08-23T13:07:20.3566667+00:00

    edit:

    Yi Zeng (CAFE), Summarizing the discussions offline - to benefit the community, posting an answer.

    The key steps involve using the indexer to map and transform the field, and then the index projection to assign the transformed value to the final field in the index.

    If a field in the parent document requires a transformation (using the mapping functions such as encoding) and needs to be mapped to the parent and/or "child" documents:

    • Apply the transformation using field mappings' functions in the indexer.
    • Use index projections in the skillset to map the transformed field to the "child" documents.

    To achieve the base64 encoding of a field from a parent document and send it to the index:

    1. Use the Indexer to map the original field to a temporary field and apply the base64Encode transformation.
    2. In the Index Projection, map the temporary field to the final field in the index.

    Reference: Please review Map fields in indexers - Azure AI Search | Microsoft Learn

    --Thanks for the follow-up and sharing the workaround. I'm also following up on this internally.


    If my answer helped (pointed, you in the right direction) > please click Accept Answer - it will benefit the community/users to find the answer quickly.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.