base64Encode and base64EncodeKeys doesn't work for key field?

Question

base64Encode and base64EncodeKeys doesn't work for key field?

Yi Zeng (CAFE) 25 Microsoft Employee

I'm using a template and needs the value of key field to be base64 string. But with text chunking step in Skillset, it automatically generate a value for key field by splicing the id of the parent (that is, the id of the original file before the chunk) and the chunk id, a value in the following format is generated which is not a base64 string:

"Id": "baa70f183116_aHR0cHM6Ly9taXJhZG9jLmJsb2IuY29yZS53aW5kb3dzLm5ldC9kb2NzL1JvdXRpbmctQWNjdXJhY3ktcmVncmVzc2lvbi1kZWJ1Z2dpbmcuZG9jeA2_pages_1"

I tried setting base64EncodeKeys as true and adding base64Encode in fieldMappings/outputFieldMappings in indexer. But none of them worked.

And the allowed format in the portal(sourceFieldName needs to start with /document which is from data source) looks different from the official doc(https://learn.microsoft.com/en-us/azure/search/search-indexer-field-mappings?tabs=rest) and video(https://www.youtube.com/watch?v=41DSwTYRENs)——In official doc and video, the sourceFieldName doesn't need to tart with /document and could be column name in the index schema. I also tried "/document/Id" but didn't work. What should I do to make my "Id" base64? Thanks for your help!!! image (94)

image (95)

My index schema is (Key is "Id"):

image (93)

My Skillset:

{

``"name"``: ``"my-skillset"``,

``"description"``: ``"xxx"``,

``"skills"``: [

{

``"@odata.type"``: ``"#Microsoft.Skills.Text.LanguageDetectionSkill"``,

``"name"``: ``"#1"``,

``"description"``: ``null``,

``"context"``: ``"/document"``,

``"defaultCountryHint"``: ``null``,

``"modelVersion"``: ``null``,

``"inputs"``: [

{

``"name"``: ``"text"``,

``"source"``: ``"/document/content"

}

],

``"outputs"``: [

{

``"name"``: ``"languageCode"``,

``"targetName"``: ``"languageCode"

}

]

},

{

``"@odata.type"``: ``"#Microsoft.Skills.Text.SplitSkill"``,

``"name"``: ``"#2"``,

``"description"``: ``"Split skill to chunk documents"``,

``"context"``: ``"/document"``,

``"defaultLanguageCode"``: ``"en"``,

``"textSplitMode"``: ``"pages"``,

``"maximumPageLength"``: ``2000``,

``"pageOverlapLength"``: ``500``,

``"maximumPagesToTake"``: ``0``,

``"inputs"``: [

{

``"name"``: ``"text"``,

``"source"``: ``"/document/content"

}

],

``"outputs"``: [

{

``"name"``: ``"textItems"``,

``"targetName"``: ``"pages"

}

]

},

{

``"@odata.type"``: ``"#Microsoft.Skills.Text.KeyPhraseExtractionSkill"``,

``"name"``: ``"#3"``,

``"description"``: ``""``,

``"context"``: ``"/document/pages/*"``,

``"defaultLanguageCode"``: ``"en"``,

``"maxKeyPhraseCount"``: ``null``,

``"modelVersion"``: ``""``,

``"inputs"``: [

{

``"name"``: ``"text"``,

``"source"``: ``"/document/pages/*"

},

{

``"name"``: ``"languageCode"``,

``"source"``: ``"/document/languageCode"

}

],

``"outputs"``: [

{

``"name"``: ``"keyPhrases"``,

``"targetName"``: ``"keyPhrases"

}

]

},

{

``"@odata.type"``: ``"#Microsoft.Skills.Text.AzureOpenAIEmbeddingSkill"``,

``"name"``: ``"#4"``,

``"description"``: ``null``,

``"context"``: ``"/document/pages/*"``,

``"resourceUri"``: ``"https://miraonboardingopenai.openai.azure.com"``,

``"apiKey"``: ``null``,

``"deploymentId"``: ``"text-embedding-ada-002"``,

``"dimensions"``: ``1536``,

``"modelName"``: ``"text-embedding-ada-002"``,

``"inputs"``: [

{

``"name"``: ``"text"``,

``"source"``: ``"/document/pages/*"

}

],

``"outputs"``: [

{

``"name"``: ``"embedding"``,

``"targetName"``: ``"Embedding"

}

],

``"authIdentity"``: ``null

}

],

``"cognitiveServices"``: {

``"@odata.type"``: ``"#Microsoft.Azure.Search.CognitiveServicesByKey"``,

``"description"``: ``null``,

``"key"``: ``null

},

``"knowledgeStore"``: ``null``,

``"indexProjections"``: {

``"selectors"``: [

{

``"targetIndexName"``: ``"mirabotnew"``,

``"parentKeyFieldName"``: ``"ExternalSourceName"``,

``"sourceContext"``: ``"/document/pages/*"``,

``"mappings"``: [

{

``"name"``: ``"Embedding"``,

``"source"``: ``"/document/pages/*/Embedding"``,

``"sourceContext"``: ``null``,

``"inputs"``: []

},

{

``"name"``: ``"Text"``,

``"source"``: ``"/document/pages/*"``,

``"sourceContext"``: ``null``,

``"inputs"``: []

},

{

``"name"``: ``"AdditionalMetadata"``,

``"source"``: ``"/document/metadata_storage_path"``,

``"sourceContext"``: ``null``,

``"inputs"``: []

},

{

``"name"``: ``"Description"``,

``"source"``: ``"/document/pages/*/keyPhrases"``,

``"sourceContext"``: ``null``,

``"inputs"``: []

}

]

}

],

``"parameters"``: {

``"projectionMode"``: ``"skipIndexingParentDocuments"

}

},

``"encryptionKey"``: ``null

}

ajkuma 28,036 Reputation points Microsoft Employee Moderator

2024-08-22T18:07:20.8233333+00:00

Yi Zeng (CAFE) , Apologies for the delayed response.

While I'm checking on this, please confirm if the issue still persist.

Yes, as outlined in this doc - 'Base64EncodeKeys property is obsolete. Please create a field mapping using 'FieldMapping.Base64Encode' instead.'
Yi Zeng (CAFE) 25 Reputation points Microsoft Employee

2024-08-23T09:42:37.0733333+00:00

I tried 'FieldMapping.Base64Encode'. You can see the screenshot of the error report. Here I cannot use the data source field as "source" because this is the attribute before textchunking and does not have the chunk's own identity.
Yi Zeng (CAFE) 25 Reputation points Microsoft Employee

2024-08-23T09:44:35.5766667+00:00

I solve this problem by using a workaround: a custom skillset web api to generate a base64 string for each chunk, store each base64string and its chunk in a json file. Then desterilize it in another indexer
ajkuma 28,036 Reputation points Microsoft Employee Moderator

2024-08-30T09:08:42.16+00:00

Thanks for the follow-up and update. We are discussing on this further offline.

Accepted answer

1 additional answer

Your answer

ajkuma 28,036 Reputation points Microsoft Employee Moderator

2024-08-22T18:07:20.8233333+00:00

Yi Zeng (CAFE) , Apologies for the delayed response.

While I'm checking on this, please confirm if the issue still persist.

Yes, as outlined in this doc - 'Base64EncodeKeys property is obsolete. Please create a field mapping using 'FieldMapping.Base64Encode' instead.'
Yi Zeng (CAFE) 25 Reputation points Microsoft Employee

2024-08-23T09:42:37.0733333+00:00

I tried 'FieldMapping.Base64Encode'. You can see the screenshot of the error report. Here I cannot use the data source field as "source" because this is the attribute before textchunking and does not have the chunk's own identity.
Yi Zeng (CAFE) 25 Reputation points Microsoft Employee

2024-08-23T09:44:35.5766667+00:00

I solve this problem by using a workaround: a custom skillset web api to generate a base64 string for each chunk, store each base64string and its chunk in a json file. Then desterilize it in another indexer
ajkuma 28,036 Reputation points Microsoft Employee Moderator

2024-08-30T09:08:42.16+00:00

Thanks for the follow-up and update. We are discussing on this further offline.

Answer 1

To benefit the community, post our offline discussions:

Ask: I want a base64 identity for each child item(chunk after text-chunking) instead of base64 identity for parent item(original text). Is storage_name64 here is unique for each parent item but not each child item.

That is correct. Your scenario is only available today if you implement this functionality via a custom skill: Custom Web API skill in skillsets - Azure AI Search | Microsoft Learn.

You can add some custom code through the skill so it outputs to your desired functionality and add the input of your chunk after text-chunking to the custom skill and create the transformation and send the output to the index with the index projections.

If my answer helped (pointed, you in the right direction) > please click Accept Answer - it will benefit the community/users to find the answer quickly.

Answer 2

edit:

Yi Zeng (CAFE), Summarizing the discussions offline - to benefit the community, posting an answer.

The key steps involve using the indexer to map and transform the field, and then the index projection to assign the transformed value to the final field in the index.

If a field in the parent document requires a transformation (using the mapping functions such as encoding) and needs to be mapped to the parent and/or "child" documents:

Apply the transformation using field mappings' functions in the indexer.
Use index projections in the skillset to map the transformed field to the "child" documents.

To achieve the base64 encoding of a field from a parent document and send it to the index:

Use the Indexer to map the original field to a temporary field and apply the base64Encode transformation.
In the Index Projection, map the temporary field to the final field in the index.

Reference: Please review Map fields in indexers - Azure AI Search | Microsoft Learn.

--Thanks for the follow-up and sharing the workaround. I'm also following up on this internally.

If my answer helped (pointed, you in the right direction) > please click Accept Answer - it will benefit the community/users to find the answer quickly.

Share via

base64Encode and base64EncodeKeys doesn't work for key field?

1 additional answer

Your answer