Share via

Document Intelligence API returning incomplete KeyValuePairs (incorrectly detected)

Sergio Ricardo de Freitas Oliveira 20 Reputation points
2025-02-24T17:41:47.99+00:00

Hello all.

I have a scanned PDF which is correctly extracted to text by AI Document Intelligence, but an existing string that appears in its pages 1 and 2 (a string that is correctly extrated to text from both pages) only appears one time in the KeyValuePair data structure (it appears only for page 2, even though it's the very same string that was correctly extracted to text from both pages).

The document is a generic-type PDF, not from a specific form.

Any hints?

Thank you.

-SR

Azure Document Intelligence in Foundry Tools

Answer accepted by question author

  1. Vikram Singh 2,590 Reputation points Microsoft Employee Moderator
    2025-03-17T10:31:33.6666667+00:00

    Hello @Sergio Ricardo de Freitas Oliveira,

    Thanks for sharing and getting back!

    I noticed that the text on page 2 contains a colon ':' which allows the model to identify and extract it as a key-value pair. However, on page 1, the text is a simple string without any special characters, so the model doesn't recognize it as a key-value pair and treats it as regular text in a paragraph. You can verify this by checking the content on both pages.User's imageSince this is a prebuilt model, customization options are limited. If you need to achieve this functionality consistently, I highly recommend building custom models tailored to your specific data. Custom models can be trained to recognize and extract key-value pairs more accurately based on your document structure.

    I hope you understand. If you have any further query do let us know.

    Thanks

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.