Hello @sai
Thanks for reaching out to us. For your question how to create create embeddings and search for PDF document, unfortunately, there is no straightforward solution for that. Azure OpenAI does not provide built-in support for reading PDF documents.
As a workaround, you can use an OCR (Optical Character Recognition) tool to extract the text from the PDF, and then feed it to the Azure OpenAI API to generate embeddings.
Here's a high-level overview of the steps you can follow:
- Use an OCR tool to extract the text from the PDF document. There are various OCR tools available, such as Azure Cognitive Services- Computer Vision Read API, Azure Form Recognizer if your PDF contains form format data.
- Once you have the text, you can use the OpenAI API to generate embeddings for each sentence or paragraph in the document, something like the code sample you shared.
- Store the embeddings in a vector database like Pinecone, where you can search for similar documents based on their embeddings.
- To perform a search on the document, you can use a question-answering (Q&A) model like OpenAI's GPT-3/ GPT-3.5. You can pass the question and the embeddings of the document to the Q&A model to generate an answer.
I hope this helps, let me know if you have any question regarding to above. We are looking forwarding to the OCR feature happens in Azure OpenAI too, but it need some time.
Regards,
Yutong
-Please kindly accept the answer and Vote 'Yes' if you feel helpful to support the community, thanks a lot.