How to create a chatbot that dynamically selects between database and PDF search based on user input?

Question

How to create a chatbot that dynamically selects between database and PDF search based on user input?

Paritosh Raval 5

I have developed two separate chatbots: one that searches data from my database and another that searches from PDF documents. Now, I want to create a unified chatbot that intelligently decides where to search based on the user's input. Could someone please suggest approaches or algorithms for implementing this decision-making functionality in the chatbot? As per example:

User Question --> Do some analysis --> (Search from PDF OR Search from DB) --> Provide output.

Divakarkumar-3696 375 Reputation points

2024-02-19T19:35:41.75+00:00

Hi, For your case, you would require function calling to support database fetch and search from PDF documents.

Please refer here https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling

Sample example: https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca203328

Please 'Accept as answer' if it helped so that it can help others in the community looking for help on similar topics.
Paritosh Raval 5 Reputation points

2024-02-20T04:48:13.4433333+00:00

Thanks, @Divakarkumar-3696. However, how will it determine which case to call for database data and which case to call for PDF documents?
Divakarkumar-3696 375 Reputation points

2024-02-20T05:20:16.53+00:00
Hi,

We should define the functions with a good description, so that model will be able to determine which function to call.

Note: Not all models are capable of function calling.. Please refer here to see list of models (latest versions of gpt-35-turbo and gpt-4) that supports function calling

First you define list of functions, in your case you need one function for fetching records from datbase and another function to search from PDF. Provide the description of function about what it does along with parameters to be passed to it.

tools =[ { "name": "search_database", "description": "Retrieve data from database" "parameters":{}, "required":[] }, { "name": "search_pdf", "description": "Search from PDF files" "parameters":{}, "required": [] } ]

Then In the completion call make sure to pass these functions defined in the tools parameter and choice to be auto, to let the model determine which function to call.

response = client.chat.completions.create( model="<REPLACE_WITH_YOUR_MODEL_DEPLOYMENT_NAME>", messages=messages, tools=tools, # YOUR LIST OF FUNCTIONS tool_choice="auto", # DEFAULT - YOU CAN ALSO BE EXPLICIT HERE )

Please 'Accept as answer' if it helped so that it can help others in the community looking for help on similar topics.
Paritosh Raval 5 Reputation points

2024-02-20T10:31:05.3166667+00:00
Thanks again @Divakarkumar-3696 for detailed answer,

I get how the code works, but I'm wondering how it knows which function to use for the JSON response.

Here's an example:

I keep info about products in a database.

But rules and regulations, policies are in PDFs.

The database is in one place, like an SQL server, while the PDFs are somewhere else, like blob storage or on my computer. So, if I ask something like "How many days off can an employee take in a month?"—how does it know to use the "search_pdf" function? Do I have to set description or properties for it ? If I do, adding lots of details might get complicated.
Paritosh Raval 5 Reputation points

2024-02-20T10:31:22.74+00:00

Thanks again @Divakarkumar-3696 for detailed answer,I get how the code works, but I'm wondering how it knows which function to use for the JSON response. Here's an example: I keep info about products in a database.

But rules and regulations, policies are in PDFs.

The database is in one place, like an SQL server, while the PDFs are somewhere else, like blob storage or on my computer. So, if I ask something like "How many days off can an employee take in a month?"—how does it know to use the "search_pdf" function? Do I have to set description or properties for it ? If I do, adding lots of details might get complicated.Thanks for your answer, I am little curious so asking it.
Divakarkumar-3696 375 Reputation points

2024-02-20T12:01:48.48+00:00

The latest models (gpt-3.5-turbo-0125 and gpt-4-turbo-preview) have been trained to both detect when a function should to be called (depending on the input) and to respond with JSON that adheres to the function signature more closely than previous models

Reference: https://platform.openai.com/docs/guides/function-calling

As you stated, it is important to provide meaningful description to the functions and it's properties for the model to better determine. Not sure if you got a chance to take a look at this sample example : https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca203328. Here in the example, they have 2 functions, one to retrieve sources from the Azure Cognitive Search index and other to retrieve azure sdk related issues from github.

PS: It is the not the model by itself executes the function,it just determines the function to be called based on the inputs. It should be our responsibility to make the function call

Paritosh Raval 5

Thanks @Divakarkumar-3696, problem is I have 20 + pdfs and a large db. I can not set everything in description or in properties.

As per example, I have created few questions and asked as a description.but if I add any new question it is predicting wrong.

tools = [
  {
    "type": "function",
    "function": {
        "name": "search_database",
        "description": "Retrieve data from database",
        "parameters": {"type": "object", "properties": {}} ,
        "required": []
    }
  },
  {
    "type": "function",
    "function": {
        "name": "search_pdf",
        "description": """Retrieve data from PDF files
        Query string to retrieve documents from pdf eg: 
            'question1', 'question2'
        """,
            "search_query": {
                "type": "string",
                "description": "Query string to retrieve documents from azure search eg: 'How to debug compute issues'",
            },

        # "parameters": {"type": "object", "properties": {}} ,
        "required": ["search_query"]
    }
  }
]

Paritosh Raval 5 Reputation points

2024-02-21T12:21:50.6233333+00:00

test test
Divakarkumar-3696 375 Reputation points

2024-02-23T06:49:17.98+00:00

Hi, Sorry for delay in response. As you mentioned, You don't need to set everything in the description but it is the actual function that does the job for you. When you say "if I add any new question it is predicting wrong." , were you getting wrong response? Can you please help us with the code you have defined for this search_pdf, sample question and answer you got from the model, to help you better.?

Paritosh Raval 5

import json
import os
from openai import AzureOpenAI

def searchFromDatabase():
        print("Searching from database...")
def searchFromDocuments():
        print("Searching from documents...")



client = AzureOpenAI(
  api_key="my_key",  
  api_version="version",
  azure_endpoint="my_endpoint"
)

messages= [
    {"role": "user", "content": "How many leaves an employee can take in one month?"}
]

tools = [
  {
    "type": "function",
    "function": {
        "name": "search_database",
        "description": """Retrieve data from database
        eg. who requested order number xyz?
        Which products are shipped in Order Number xyz?
        give me the carrier & tracking number for order number related information
        """,
        "parameters": {"type": "object", "properties": {}} ,
        "required": []
    }
  },
  {
    "type": "function",
    "function": {
        "name": "search_pdf",
        "description": """Retrive data from PDF files
        Query string to retrieve documents from pdf eg: 
            'What are the potential consequences of violating the xyz Policy?', 
            'What actions can the company take if an employee refuses to cooperate with xyz?'
            'What is xyz Policy within the company?'
      
        """,
            "search_query": {
                "type": "string",
                "description": "Query string to retrieve documents ",
            }

        # "parameters": {"type": "object", "properties": {}} ,
        "required": ["search_query"]
    }
  }
]


response = client.chat.completions.create(
    model="my-model-name",
    messages= messages,
    tools= tools,
    tool_choice="auto",
)

print(response.choices[0].message.model_dump_json(indent=2))
tool_call = response.choices[0].message.tool_calls[0]


print(tool_call)

if tool_call.type == 'function':
    # Check the name of the function
    if tool_call.function.name == 'search_database':
        searchFromDatabase()
    elif tool_call.function.name == 'search_pdf':
        searchFromDocuments()
    else:
        print("Unknown function")
else:
    print("Unknown tool call type")

@Divakarkumar-3696, Added code above.

so now my logic for serach pdf from blob storage will go into the search_pdf function and to retrive information from db will go to the search_database function.

Your answer

Divakarkumar-3696 375 Reputation points

2024-02-19T19:35:41.75+00:00

Hi, For your case, you would require function calling to support database fetch and search from PDF documents.

Please refer here https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/function-calling

Sample example: https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca203328

Please 'Accept as answer' if it helped so that it can help others in the community looking for help on similar topics.
Paritosh Raval 5 Reputation points

2024-02-20T04:48:13.4433333+00:00

Thanks, @Divakarkumar-3696. However, how will it determine which case to call for database data and which case to call for PDF documents?
Divakarkumar-3696 375 Reputation points

2024-02-20T05:20:16.53+00:00

Hi,

We should define the functions with a good description, so that model will be able to determine which function to call.

Note: Not all models are capable of function calling.. Please refer here to see list of models (latest versions of gpt-35-turbo and gpt-4) that supports function calling

First you define list of functions, in your case you need one function for fetching records from datbase and another function to search from PDF. Provide the description of function about what it does along with parameters to be passed to it.

tools =[ { "name": "search_database", "description": "Retrieve data from database" "parameters":{}, "required":[] }, { "name": "search_pdf", "description": "Search from PDF files" "parameters":{}, "required": [] } ]

Then In the completion call make sure to pass these functions defined in the tools parameter and choice to be auto, to let the model determine which function to call.

response = client.chat.completions.create( model="<REPLACE_WITH_YOUR_MODEL_DEPLOYMENT_NAME>", messages=messages, tools=tools, # YOUR LIST OF FUNCTIONS tool_choice="auto", # DEFAULT - YOU CAN ALSO BE EXPLICIT HERE )

Please 'Accept as answer' if it helped so that it can help others in the community looking for help on similar topics.
Paritosh Raval 5 Reputation points

2024-02-20T10:31:05.3166667+00:00

Thanks again @Divakarkumar-3696 for detailed answer,

I get how the code works, but I'm wondering how it knows which function to use for the JSON response.

Here's an example:

I keep info about products in a database.

But rules and regulations, policies are in PDFs.

The database is in one place, like an SQL server, while the PDFs are somewhere else, like blob storage or on my computer. So, if I ask something like "How many days off can an employee take in a month?"—how does it know to use the "search_pdf" function? Do I have to set description or properties for it ? If I do, adding lots of details might get complicated.
Paritosh Raval 5 Reputation points

2024-02-20T10:31:22.74+00:00

Thanks again @Divakarkumar-3696 for detailed answer,I get how the code works, but I'm wondering how it knows which function to use for the JSON response. Here's an example: I keep info about products in a database.

But rules and regulations, policies are in PDFs.

The database is in one place, like an SQL server, while the PDFs are somewhere else, like blob storage or on my computer. So, if I ask something like "How many days off can an employee take in a month?"—how does it know to use the "search_pdf" function? Do I have to set description or properties for it ? If I do, adding lots of details might get complicated.Thanks for your answer, I am little curious so asking it.
Divakarkumar-3696 375 Reputation points

2024-02-20T12:01:48.48+00:00

The latest models (gpt-3.5-turbo-0125 and gpt-4-turbo-preview) have been trained to both detect when a function should to be called (depending on the input) and to respond with JSON that adheres to the function signature more closely than previous models

Reference: https://platform.openai.com/docs/guides/function-calling

As you stated, it is important to provide meaningful description to the functions and it's properties for the model to better determine. Not sure if you got a chance to take a look at this sample example : https://gist.github.com/pamelafox/a3fdea186b687509c02cb186ca203328. Here in the example, they have 2 functions, one to retrieve sources from the Azure Cognitive Search index and other to retrieve azure sdk related issues from github.

PS: It is the not the model by itself executes the function,it just determines the function to be called based on the inputs. It should be our responsibility to make the function call
Paritosh Raval 5 Reputation points

2024-02-21T12:17:45.14+00:00

Thanks @Divakarkumar-3696, problem is I have 20 + pdfs and a large db. I can not set everything in description or in properties.

As per example, I have created few questions and asked as a description.but if I add any new question it is predicting wrong.

tools = [ { "type": "function", "function": { "name": "search_database", "description": "Retrieve data from database", "parameters": {"type": "object", "properties": {}} , "required": [] } }, { "type": "function", "function": { "name": "search_pdf", "description": """Retrieve data from PDF files Query string to retrieve documents from pdf eg: 'question1', 'question2' """, "search_query": { "type": "string", "description": "Query string to retrieve documents from azure search eg: 'How to debug compute issues'", }, # "parameters": {"type": "object", "properties": {}} , "required": ["search_query"] } } ]
Paritosh Raval 5 Reputation points

2024-02-21T12:21:50.6233333+00:00

test test
Divakarkumar-3696 375 Reputation points

2024-02-23T06:49:17.98+00:00

Hi, Sorry for delay in response. As you mentioned, You don't need to set everything in the description but it is the actual function that does the job for you. When you say "if I add any new question it is predicting wrong." , were you getting wrong response? Can you please help us with the code you have defined for this search_pdf, sample question and answer you got from the model, to help you better.?

Share via

How to create a chatbot that dynamically selects between database and PDF search based on user input?

Your answer