Could Large Language model used to query from datalake

Anonymous
2023-10-16T17:26:22.46+00:00

Data is stored in the data lake and needs to be extracted from data lake when queried by a customer via chat. Could this be done using SparkNLP library on Azure Synapse?

Azure AI Bot Service
Azure AI Bot Service
An Azure service that provides an integrated environment for bot development.
945 questions
Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,562 questions
Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
5,379 questions
Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
4,101 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
3,632 questions
{count} votes

Accepted answer
  1. Vahid Ghafarpour 23,385 Reputation points Volunteer Moderator
    2023-10-16T20:13:47.74+00:00

    For sure, you can do that,

    I hope this article helps you to find the steps:

    https://learn.microsoft.com/en-us/azure/architecture/ai-ml/idea/large-scale-custom-natural-language-processing

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Pramod Valavala 20,656 Reputation points Microsoft Employee Moderator
    2023-10-17T14:52:30.7133333+00:00

    @Nara Kanga While it looks like you can use SparkNLP for including NLP into your pipelines to extract semantic information from data, I don't think it alone can be used to power an LLM-based chat bot.

    These chatbots use a pattern called Retrieval Augmented Generation (RAG) which typically involves using a vector database to perform semantic search on an index dataset to feed into Generative AI models for the final response.

    The Azure OpenAI on your Data feature builds this upon Azure Cognitive Search, which can ingest and index data using OpenAI generated embeddings, but you could technically use a vector database for this approach.

    There are a few possible architectures that you could explore like the Embedding Approach mentioned which uses Redis as the vector database, and instead of an Azure Function, you could use SparkNLP to generate the embeddings on your data in the data lake.

    Your chatbot would then try to retrieve semantically similar information from Redis and use the same generate responses after feeding that information to an LLM model.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.