Hi Microsoft team, is it still needed to use an external OCR tool like Poppler? I am also trying to create an agent able to run OCR automatically on some scanned files but it completely fails.
thanks
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Dear Team,
Currently using Copilot Studio with the GPT-4 model to build an agent that connects to a SharePoint knowledge base. The knowledge base contains a mix of JPEG scanned documents uploaded by users. However, the agent is currently unable to read or respond to queries based on the content in these JPEG files, as it appears that OCR (Optical Character Recognition) is not enabled or supported by default.
I would like to know:
Your guidance on how to proceed would be greatly appreciated.
Hi Microsoft team, is it still needed to use an external OCR tool like Poppler? I am also trying to create an agent able to run OCR automatically on some scanned files but it completely fails.
thanks
Hello @W C H Bagya Perera,
Currently, Copilot Studio (GPT-4) does not natively support OCR (Optical Character Recognition) for extracting text from JPEG or other image files in SharePoint knowledge bases. The agent can only process text-based content by default.
You must extract text from images using an external OCR process and make that text available to the agent, either as metadata or via a custom extension.
Reference Document-https://learn.microsoft.com/en-us/ai-builder/prebuilt-text-recognition
Thanks,
Sayali
*************************************************************************
If the response is helpful, please click on "upvote" button. You can share your feedback via Microsoft Copilot Developer Feedback link. Click here to escalate.