create embeddings and search in Azure openai

Question

create embeddings and search in Azure openai

sai 5

@YutongTie-MSFT

how to create embeddings and perform document search on a PDF document on Azure openai?

I have a PDF input(just one document saved in local as I'm only testing for now).

How can I create embeddings for that and perform search so I can do Q&A on that data file?

I'm using Azure API Key and openai end points

https://github.com/openai/openai-cookbook/blob/main/examples/How_to_format_inputs_to_ChatGPT_models.ipynb

https://github.com/openai/openai-cookbook/blob/main/examples/vector_databases/pinecone/Gen_QA.ipynb

have seen above 2 examples- but not sure how I can apply the same for PDF. please help, still new and learning

Thanks

sai 5 Reputation points

2023-03-21T19:47:55.5133333+00:00

@Ramr-msft @YutongTie-MSFT
Sai Yannakula 0 Reputation points

2023-03-29T05:19:38.2733333+00:00

hey @sai Were you able to find the solution to the question you asked? If so, please share your findings with me.

1 answer

Your answer

sai 5 Reputation points

2023-03-21T19:47:55.5133333+00:00

@Ramr-msft @YutongTie-MSFT
Sai Yannakula 0 Reputation points

2023-03-29T05:19:38.2733333+00:00

hey @sai Were you able to find the solution to the question you asked? If so, please share your findings with me.

Answer 1

YutongTie-MSFT 53,971 Moderator

Hello @sai

Thanks for reaching out to us. For your question how to create create embeddings and search for PDF document, unfortunately, there is no straightforward solution for that. Azure OpenAI does not provide built-in support for reading PDF documents.

As a workaround, you can use an OCR (Optical Character Recognition) tool to extract the text from the PDF, and then feed it to the Azure OpenAI API to generate embeddings.

Here's a high-level overview of the steps you can follow:

Use an OCR tool to extract the text from the PDF document. There are various OCR tools available, such as Azure Cognitive Services- Computer Vision Read API, Azure Form Recognizer if your PDF contains form format data.
Once you have the text, you can use the OpenAI API to generate embeddings for each sentence or paragraph in the document, something like the code sample you shared.
Store the embeddings in a vector database like Pinecone, where you can search for similar documents based on their embeddings.
To perform a search on the document, you can use a question-answering (Q&A) model like OpenAI's GPT-3/ GPT-3.5. You can pass the question and the embeddings of the document to the Q&A model to generate an answer.

I hope this helps, let me know if you have any question regarding to above. We are looking forwarding to the OCR feature happens in Azure OpenAI too, but it need some time.

Regards,

Yutong

-Please kindly accept the answer and Vote 'Yes' if you feel helpful to support the community, thanks a lot.

sai 5 Reputation points

2023-03-22T00:02:17.7233333+00:00

is there a sample code-base for these steps you mentioned
Newstart 20 Reputation points

2023-03-22T14:09:53.72+00:00

Hi @YutongTie-MSFT can I have your mail? want to ask something about commercial uses.
YutongTie-MSFT 53,971 Reputation points Moderator

2023-03-29T05:53:42.7933333+00:00

I am working with product team and will let you know once we figure out some working examples. Sorry for the delay.
Patrick Ng 0 Reputation points

2023-05-16T18:15:25.5466667+00:00

Alternate on Step 1: how about PyPDF?
PG-Khasriya, Mandeep 0 Reputation points

2023-08-13T13:30:20.5033333+00:00
Hi, is there a guide on how to;

Read PDF documents using the Azure Form Recogniser PDF

Embed this text using Open AI

I do not know how to access the output of the text reading and thus send this to Open AI for embeddings

Share via

create embeddings and search in Azure openai

1 answer

Your answer