In my case:
While using the Data Import Wizard in Azure AI Search to create an indexer from Azure Blob Storage, I encountered the following error:
statusCode: 400
name: DocumentExtraction.azureblob.mg-demo-ds
errorMessage: Could not parse document. Invalid document key: 'https://mgdemostorage.blob.core.windows.net/mgdemocontainer/201801.pdf'.
Keys can only contain letters, digits, underscore (_), dash (-), or equal sign (=).
documentationLink: https://docs.microsoft.com/azure/search/search-howto-indexing-azure-blob-storage#DocumentKeys
details: Target field 'metadata_storage_path' is either not present, doesn't have a value set, or no data could be extracted from the document for it.
This occurred because the wizard-generated indexer did not apply a base64Encode
transformation to the metadata_storage_path
, which is required to convert the document path into a valid key.
Expected Behavior:
The import wizard should include the following field mapping in the generated indexer:
"fieldMappings": [
{
"sourceFieldName": "metadata_storage_path",
"targetFieldName": "metadata_storage_path",
"mappingFunction": {
"name": "base64Encode",
"parameters": null
}
}
]
Workaround / How I Solved It:
To resolve the issue, I manually updated the indexer using the Azure CLI or REST API and added the missing field mapping. After including the base64Encode
function for the metadata_storage_path
, the documents were successfully indexed without errors.
If the answer is helpful, please click Accept Answer so that other people who faces similar issue may get benefitted from it.