Ingesting webpage URL for the open AI web app in Azure

Jalali, Hadi 40 Reputation points
2024-02-26T19:15:09.8766667+00:00

Hi there. In the Azure open AI studio, there is an option for defining webpage URL when you add data for the app but based on the requirements in the Microsoft website, it can only extract text up tp 20 sublinks and also I can only put one URL in it. After deploying the open AI Web app, what is the best way to ingest or define the webpages for the app. I do not want to extract the text myself and put the extracted data in the blob storage. Is there any other way that the app/search automatically extracts info from URLs and I put the URLs that I need somewhere?

Azure AI Speech
Azure AI Speech
An Azure service that integrates speech processing into apps and services.
1,768 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,219 questions
{count} votes

Accepted answer
  1. Saurabh Sharma 23,816 Reputation points Microsoft Employee
    2024-02-27T22:41:38.5633333+00:00

    Hi @Jalali, Hadi , I have received the confirmation that increasing the 20 URL/web address limit when using "on your data" from a URL/web address is not supported. This feature is in backlog but there no ETA for the same. You may have to create a custom solution for your scenario. You can try implementing a solution which uses a web crawler like "Beautiful Soup" which extracts contents from any web page. Then you can upload the extracted content on a Blob Storage Container and use Azure Search Documents SDK for indexing your documents programmatically. Please refer to the below documents for additional details -

    Please let me know if you have any other questions.

    Thanks

    Saurabh

    Please 'Accept as answer' and Upvote if it helped so that it can help others in the community looking for help on similar topics.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.