Could Large Language model used to query from datalake

Question

Could Large Language model used to query from datalake

Anonymous

Data is stored in the data lake and needs to be extracted from data lake when queried by a customer via chat. Could this be done using SparkNLP library on Azure Synapse?

Pramod Valavala 20,656 Reputation points Microsoft Employee Moderator

2023-10-19T15:13:43.3+00:00

@Nara Kanga Just following up here to see if the response here helps

Accepted answer

1 additional answer

Your answer

Pramod Valavala 20,656 Reputation points Microsoft Employee Moderator

2023-10-19T15:13:43.3+00:00

@Nara Kanga Just following up here to see if the response here helps

Answer 1

Vahid Ghafarpour 23,385 Volunteer Moderator

For sure, you can do that,

I hope this article helps you to find the steps:

https://learn.microsoft.com/en-us/azure/architecture/ai-ml/idea/large-scale-custom-natural-language-processing

Anonymous

2023-10-30T17:47:33.5866667+00:00

Thank you. I went through it.

Answer 2

@Nara Kanga While it looks like you can use SparkNLP for including NLP into your pipelines to extract semantic information from data, I don't think it alone can be used to power an LLM-based chat bot.

These chatbots use a pattern called Retrieval Augmented Generation (RAG) which typically involves using a vector database to perform semantic search on an index dataset to feed into Generative AI models for the final response.

The Azure OpenAI on your Data feature builds this upon Azure Cognitive Search, which can ingest and index data using OpenAI generated embeddings, but you could technically use a vector database for this approach.

There are a few possible architectures that you could explore like the Embedding Approach mentioned which uses Redis as the vector database, and instead of an Azure Function, you could use SparkNLP to generate the embeddings on your data in the data lake.

Your chatbot would then try to retrieve semantically similar information from Redis and use the same generate responses after feeding that information to an LLM model.

Share via

Could Large Language model used to query from datalake

1 additional answer

Your answer