need to analyze the docx files using the Azure AI Document Intelligence client library for Python - version 1.0.0b4

S,A 20 Reputation points
2024-10-04T15:27:05.8+00:00

Hi Team,

I need to analyze the docx files using the Azure AI Document Intelligence client library for Python - version 1.0.0b4.

What I have observed is that when I convert the docx file to a pdf I am able to get better metadata of that pdf file including page numbers.
Below attached image is result of a pdf file.

User's image

This is not the same for the docx files where I get only one document element which includes entire docx file information.
I am looking for an API that can convert MS-DOCX file to PDF on linux machines(Azure app service).
I tried libreoffice(soffice) but there is mismatch in page format for MS-DOCX and libreoffice converted documents.
In the Azure AI | Document Intelligence Studio there is an option "print to PDF" for non pdf files. I was wondering if I could get any Azure API service to convert docx file to pdf.
User's image

Please help with any solution.

Word
Word
A family of Microsoft word processing software products for creating web, email, and print documents.
878 questions
Office
Office
A suite of Microsoft productivity software that supports common business tasks, including word processing, email, presentations, and data management and analysis.
1,705 questions
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,710 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,895 questions
{count} votes

Accepted answer
  1. Vahid Ghafarpour 21,725 Reputation points
    2024-10-04T15:41:12.0066667+00:00

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.