Azure search word document hyperlinks

Daniel Jonker 1 Reputation point
2022-08-03T11:50:51.417+00:00

Hello everyone,
this post is regarding the Azure search functions.

I have been experimenting with searching through hyperlinks contained in Word documents, which are all stored in the Azure blob storage.
Searching through these hyperlinks would entail Azure digging through the structure of a Word document and finding the hyperlink objects, as that is not what is happening (with the simple search at least) right now.

I have tried to read through the documentation online, but I have not been able to find a way to do this so far, however I assume that it must be possible, since Azure is able to search through metadata of Word documents already, as it is able to reveal the page count (which is something that is stored in a Word document).

My question is:
Is it possible, at all, to search through hyperlink objects in Word using Azure search?
And if so, how is it possible?

Thanks for reading!
Greets, Daniel

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,000 questions
{count} votes

1 answer

Sort by: Most helpful
  1. ajkuma 26,136 Reputation points Microsoft Employee
    2022-08-05T08:45:12.087+00:00

    @Daniel Jonker , following-up on this -

    Using BM25, in your index definition any searchable field you configure will yield results. For example, if your hyperlink objects are indexed and inside of the response object, then they will be searchable. You may have to do some pre-processing to extract the hyperlinks out of the Word Docs and add these as a separate field in your index and mark it as searchable.

    Checkout these docs: Indexes in Azure Cognitive Search
    Configure index
    Field mappings and transformations using Azure Cognitive Search indexers


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.