How to get word-by-word geometry from Document Intelligence?

Arman 45 Reputation points
2024-04-14T10:43:32.33+00:00

We are currently evaluating Azure Document Intelligence (DI) against AWS Textract. One feature which our project relies on is the ability to outline individual words within a document for users to select them individually. With Textract we are able to get boundary boxes for every recognized word. With DI it seems boundary boxes are only provided per line and not for individual words.

Is this actually a limitation of DI or is there a way to ask the API to include boundary boxes for individual words?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,377 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,382 questions
{count} votes

Accepted answer
  1. navba-MSFT 17,120 Reputation points Microsoft Employee
    2024-04-15T06:20:07.5233333+00:00

    @Arman Welcome to Microsoft Q&A Forum, Thank you for posting your query here!

    Azure Document Intelligence (DI) does indeed provide the capability to extract word-by-word geometry. The Document Intelligence layout model extracts print and handwritten style text as lines and words. The styles collection includes any handwritten style for lines if detected along with the spans pointing to the associated text.

    More info here. In the below response you can see While content has been detected.

    `"words": [
    
        {
    
            "content": "While",
    
            "polygon": [],
    
            "confidence": 0.997,
    
            "span": {}
    
        },
    
    ],
    
    "lines": [
    
        {
    
            "content": "While healthcare is still in the early stages of its Al journey, we",
    
            "polygon": [],
    
            "spans": [],
    
        }
    
    ]
    

    In Document Intelligence, a word is defined as a sequence of adjacent characters, with whitespace separating words from one another. For languages that don’t use space separators between words, each character is returned as a separate word, even if it doesn’t represent a semantic word unit.

    More info here.

    Please test from the Document Intelligence Studio layout model and check if that helps.

    If you have any follow-up questions, please let me know. I would be happy to help.

    1 person found this answer helpful.

0 additional answers

Sort by: Most helpful