Put the external url as a data provider for azure ai search

Mansi Gusain 105 Reputation points
2024-07-05T04:32:14.4433333+00:00

Hi all, my use case is that I have my data in the form of a url for accessing the content . Now I wish to make this url as a data provider to my azure ai search how can i do that? Which type of storge account and provider would be the best fit for this use case? I do not want to first scrap out the data from the url rather have that url act as a data provider and then i can access the data from it. Please suggest some method. Thanks.

Azure Storage Explorer
Azure Storage Explorer
An Azure tool that is used to manage cloud storage resources on Windows, macOS, and Linux.
266 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
3,220 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Sina Salam 12,011 Reputation points
    2024-07-06T21:22:36.14+00:00

    Hello Mansi Gusain,

    Welcome to the Microsoft Q&A and thank you for posting your questions here.

    Problem

    I understand that you would like to use external URL as a data provider for azure ai search.

    Solution

    To achieve your goal of using a URL as a data provider for Azure AI Search without scraping the data first. I will provide the answer based on your questions:

    My use case is that I have my data in the form of a url for accessing the content . Now I wish to make this url as a data provider to my azure ai search how can i do that?

    Azure Cognitive Search can ingest data from various sources, but directly indexing content from a URL without scraping is not a built-in feature. However, you can set up a process where the content from the URL is ingested and indexed in a way that suits your needs.

    https://docs.microsoft.com/en-us/azure/search/search-create-index-portal and https://docs.microsoft.com/en-us/azure/search/search-indexer-overview

    Which type of storge account and provider would be the best fit for this use case?

    For storing the data fetched from the URL, Azure Blob Storage is the best fit. It is scalable, cost-effective, and well-integrated with Azure Cognitive Search.

    • Storage Account Type: Use a General-purpose v2 (GPv2) storage account.
    • Blob Container: Create a container within the storage account to store the fetched data.

    https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction and https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-portal

    Please suggest some method.

    Put into consideration the followings:

    1. Data Ingestion
    2. Blob Storage Organization
    3. Indexing Configuration
    4. Security

    Accept Answer

    I hope this is helpful! Do not hesitate to let me know if you have any other questions.

    ** Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful ** so that others in the community facing similar issues can easily find the solution.

    Best Regards,

    Sina Salam

    0 comments No comments

  2. Nehruji R 8,146 Reputation points Microsoft Vendor
    2024-07-08T06:34:02.5933333+00:00

    Hello Mansi Gusain,

    Greetings! Welcome to Microsoft Q&A Platform.

    I understand that you would like to use a URL as a data provider for Azure AI Search without scraping the data and leverage Azure’s capabilities to directly index and search the content.

    Azure AI Search is a vector and full text information retrieval solution for the enterprise, and for traditional and generative AI scenarios. The easiest way to create a service is using the Azure portal, which is covered in this article.

    In Azure AI Search, configure an indexer to pull data from the Blob Storage. The indexer will read the URLs and index the content. Define the data source in Azure AI Search to point to your Blob Storage container refer article 1 for detailed steps.

    Store the URLs in Azure Blob Storage. Azure AI Search can index content from Azure Blob Storage directly. This method allows you to manage and update URLs easily. Create a container in Azure Blob Storage and upload a file containing the URLs.

    Alternatively, you can use the Azure AI Search Push API to programmatically push the URLs and their content into the search index. This method provides more control over the indexing process refer article 2.

    Recommended Storage Account and Provider,

    Azure Blob Storage: This is the most suitable storage account for your use case. It supports large-scale data storage and integrates seamlessly with Azure AI Search.

    Data Source Configuration: Use the Azure AI Search data source configuration to connect to your Blob Storage. This setup allows you to index and search the content directly from the URLs stored in the blobs.

    Hope this answer helps! Please let us know if you have any further queries. I’m happy to assist you further.


    Please "Accept the answer” and “up-vote” wherever the information provided helps you, this can be beneficial to other community members.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.