Azure Cognitive Search indexing and external website

Ireland, Robert (IT Services) 15 Reputation points
2023-05-30T22:38:39.7066667+00:00

Is it possible to index an external website using Azure Cognitive Search?

I'm looking for some guides on how this should be setup but most of them relate to setting up Cognitive Search indexes to index Azure services.

Any pointers welcome.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,339 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Grmacjon-MSFT 19,151 Reputation points Moderator
    2023-05-31T00:07:58+00:00

    Hi @Ireland, Robert (IT Services) , Yes I believe it is possible to index an external website using Azure Cognitive Search. You can use the "Custom Skill" feature of Cognitive Search to extract data from external websites and add it to your search index.

    Here's how you can set it up:

    1. Create a Cognitive Search service in the Azure portal if you haven't already done so.
    2. Create a search index and define the schema for the data you want to index.
    3. Create a data source for the external website you want to index. You can use the "Web" data source type and specify the URL of the website.
    4. Create a skillset that defines the custom skill you want to use to extract data from the website. You can use an open-source library like BeautifulSoup or Scrapy to extract data from the HTML of the website.
    5. Add the custom skill to the skillset and configure it to extract the data you want to index.
    6. Create an indexer that uses the data source, skillset, and search index you created earlier. The indexer will automatically extract data from the website and add it to your search index.

    Here are some resources that can help you get started:

    Another way to achieve this is by using Azure Cognitive Search + Azure Functions + Azure Blob Storage. Please see more details in this similar Stack Overflow thread for more details.

    Hope that helps.

    1 person found this answer helpful.

  2. Rafael Fernández Domínguez 0 Reputation points
    2023-10-05T07:47:21.5466667+00:00

    Hello @Ireland, Robert (IT Services),

    You can't directly, these are the Supported data sources for now:

    Indexers crawl data stores on Azure and outside of Azure.

    In the other hand, you have Data Sources from partners: https://learn.microsoft.com/en-us/azure/search/search-data-sources-gallery

    Maybe this project might be interesting to you: https://github.com/thomas11/AzureSearchCrawler

    Good Luck!!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.