Index File content using Azure Cognitive Search

teja_98666 36 Reputation points
2022-01-28T16:36:53.44+00:00

Hi,

I'm exploring the capability of Azure congnitive search to index file content.
Looking at the documentation i could see that Azure Cognitive search can index files that are stored in Azure Files, Azure Blob storage but it can only index predefined meta data properties only.

search-file-storage-integration
search-howto-indexing-azure-blob-storage
search-blob-metadata-properties

What i'm looking for is an api that would index the file given its location (in Azure Blob or file storage) along with additional properties that i'm interested in. This way i can query the file content and get the custom meta data properties which i would want to use to get the related information from other applications.
Do we have any such apis?

Also, at the moment ACS supports indexing of specific set of file formats how do we index a custom file format that is not specified in the list?

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
{count} vote

1 answer

Sort by: Most helpful
  1. ajkuma 28,036 Reputation points Microsoft Employee Moderator
    2022-02-01T11:24:28.953+00:00

    @teja_98666 ,

    1) Pull indexers for pre-defined data sources are a tool you could leverage to index right away with the existent capabilities.

    You may customize what data you want in the index when using a pull indexer using skillset with custom skills. Also, please check if any of the pre-defined skills would be useful your requirement.

    cognitive-search-predefined-skills
    cognitive-search-defining-skillset

    --Additionally - However, you also have the option of creating your own indexer from scratch using push API or SDK to push data into the index as needed as defined here:
    search-what-is-data-import

    2) For file types that are not supported by the Search service for indexing. You may check the possibility of converting the format, into a supported format such as JSON or similar. As currently, only the listed file types are supported.

    Thanks for your feedback and cooperation!


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.