How to ingest web url based on user input, via code, for a RAG scenario?

Bosko Kalinic 0 Reputation points
2024-02-20T12:30:36.5466667+00:00

Hello, The question is based on the documentation found here: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/use-your-data?tabs=web-pages#ingesting-your-data I am trying to understand how to do a web url ingestion via code, more specifically from an Angular application. Based on my readings, ingesting a URL is possible via the Azure AI Studio but I need to be able to do this via code. Here is the workflow: User inputs a URL, this is processed, indexed etc by Azure AI Search and then there is a prompt for the user who can converse with an AI (RAG with cognitive search under the hood) about the URL content. Is such a thing possible and if so which library would I use and can you point me to an example maybe. Another related question: how to upload a chunked PDF (for example) for the same purpose, also via code. I want to avoid heavy chunking computations, is there an out of the box solution which does this for us? Thank you very much. Boško

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
2,920 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. AshokPeddakotla-MSFT 32,946 Reputation points
    2024-02-20T14:40:16.0333333+00:00

    Bosko Kalinic Greetings & Welcome to Microsoft Q&A forum!

    I haven't found a direct way to achieve this in Angular.

    This scenario needs a custom solution.

    Did you check Custom RAG pattern for Azure AI Search ?

    Also, see How does RAG work and Azure Cognitive Search REST API to index the content of the web page.

    Another related question: how to upload a chunked PDF (for example) for the same purpose, also via code. I want to avoid heavy chunking computations, is there an out of the box solution which does this for us?

    AFAIK, By default most blobs are indexed as a single search document in the index, including blobs with structured content. See Search over Azure Blob Storage content and Index data from Azure Blob Storage to understand more.

    Do let me know if that helps.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.