OpenAI o1 unable to understand documents with lines connecting data (leader lines, elbow connectors, etc.)

Question

OpenAI o1 unable to understand documents with lines connecting data (leader lines, elbow connectors, etc.)

Tyler Suard 155

See the attached image . Upload it to o1 and try asking it, "What are the possible options for the rightmost empty box?" This is easy for a human to do, but o1 can't do it, even with very detailed prompting. page_2

2 answers

Your answer

Answer 1

Hi Tyler Suard,

I Understood that your issue with OpenAI's o1 model (likely referring to GPT-4-turbo or a similar model in Azure OpenAI) not understanding leader lines and elbow connectors in documents like the one you uploaded is primarily due to how the model processes images and text. Here’s a step-by-step approach to solving this issue:

Issue Breakdown:

1.Complex Layouts with Leader Lines

The document has connecting lines linking data points instead of a straightforward table format.

OCR (Optical Character Recognition) extracts text but may not capture relationships properly.

2.GPT-Based Models Struggle with Structure

While GPT models can process OCR text, they may not infer relationships based on leader lines.

Without explicit annotations or structured text, it is hard for the model to determine the correct connections.

Solutions:

1.Preprocess Image Before Feeding into OpenAI's Model

Convert Image to Text + Structured Format

Use OCR tools like Tesseract, Azure AI Document Intelligence, or OpenAI's Vision models to extract text.

Parse the extracted text into a structured table.

Example Approach using Python + Tesseract

import pytesseract
from PIL import Image
image_path = "/mnt/data/image.png"
img = Image.open(image_path)
# Extract text
extracted_text = pytesseract.image_to_string(img)
print(extracted_text)  # Review the extracted text

Enhancing OCR with Layout Parsing (LayoutLMv3 / Azure Document Intelligence)

Use Azure AI Document Intelligence (previously Form Recognizer) to detect structured elements.

Convert results into a JSON or tabular format.

2.Reformat Data for OpenAI o1

Transform extracted text into structured input for GPT

Create a JSON or key-value format to represent relationships.

Example JSON Format for the Document

{
  "Ordering Code": "RPE3-04",
  "Solenoid Operated Directional Control Valve": {
      "Nominal size": "04(D02)",
      "Number of valve positions": ["2 positions", "3 positions"],
      "Seals": ["NBR", "FPM (Viton)"],
      "Orifice in P-Port": ["No orifice", "Ø0.8 mm", "Ø1.2 mm", "Ø1.5 mm", "Ø2.1 mm", "Ø2.7 mm"],
      "Manual override": ["Standard", "Covered with rubber protective boot"]
  },
  "Electrical Connector": {
      "K1": "Without connector",
      "K2": "Connector without rectifier with LED",
      "K3": "Connector with rectifier",
      "K4": "Connector with rectifier with LED and quenching diode",
      "K5": "Connector with integrated rectifier and LED"
  }
}

Feed this structured data to OpenAI for better understanding.

Now, o1 can process the information without struggling with leader lines.

3.Use Vision Models with Spatial Awareness

Try GPT-4V (Vision) or Azure Document Intelligence

These models can interpret relationships in structured documents with leader lines.

Use Azure Form Recognizer’s "Key-Value Pair Extraction" or "Table Extraction" to create a structured dataset.

Hope this helps. Do let us know if you any further queries.

-------------

If this answers your query, do click Accept Answer and Yes for was this answer helpful.

Thank you.

Tyler Suard 155 Reputation points

2025-03-17T16:59:05.8766667+00:00

Maybe I should have been more clear: we are not looking to build a JSON with that data. That is fairly simple and straightforward. Our major concern is exactly the positions of those fields in the product code sequence, which is not addressed by your solution.

For example, A valid ordering code would be:

RPE3-042?01200E1K1N2D1V

Where "?" refers to information that is on a separate page. The ordering code must be in a specific sequence to be valid, and o1 cannot figure out that sequence.
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-18T08:48:25.01+00:00

Hi @Tyler Suard ,

Did you try giving the system prompt saying your requirement like expected output with giving samples? please try that and let me know.

Thank you

Answer 2

JAYA SHANKAR G S 3,960 Microsoft External Staff Moderator

Hi @Tyler Suard ,

I have tried from my end with your sample image, i got the same response as you.

And i can confirm that the image processed by model till / in the ordering code is correct and after the slash symbol (/) it is giving in reverse order. That is the reason you are getting solenoid coil labels when you ask right most options.

So, i tried below prompt which extracts the things in JSON format and answers your query correctly.

Extract and map the correct option or label corresponding to an empty box in an image where a line connects the box to a label. Ignore (/) and go forward.

# Problem Statement

You are tasked to map and associate an empty box in an image with corresponding options indicated by labels. Each box in the image has a line that connects it to one or more labeled options. Your goal is to correctly identify and map these connections. Focus on precisely interpreting the information visually presented and avoid ambiguities.

# Steps

1. **Input Description**:
   - Parse or describe the given image with empty boxes connected to labeled options.
   - Identify all empty boxes present in the image.
   - Detect connecting lines between the boxes and their corresponding labeled options.

2. **Mapping Process**:
   - For each empty box, follow the line that connects it to one or more labeled options.
   - Ensure the correct association, especially in cases of overlapping or intersecting lines.
   - Capture all mappings in a structured format for ease of interpretation.

3. **Output Mapping**:
   - Present mappings in an easy-to-read format such as a table, list, or structured JSON.
   - Each mapping should clearly indicate the box identifier (if any) or position and its respective option label(s).

4. **Edge Cases**:
   - Consider scenarios where a line is broken, faint, or unclear.
   - Handle multiple connections (multi-label options) or ambiguous associations by describing the most likely mapping based on input patterns.

# Output Format

The mapping should be provided in format as follows:

json
{
  "mappings": [
    {
      "box": "Box 1", // A unique identifier or relative position
      "options": ["Option A", "Option C"] // List of connected options
    },
    {
      "box": "Box 2",
      "options": ["Option B"]
    }
  ]
}
- Replace placeholders like `Box 1` or `Option A` with appropriate identifiers from the provided input.

# Notes

- Ensure mappings are accurate and exhaustive; no box should be left unaccounted for.
- Pay attention to cases where multiple lines intersect or overlap to avoid misinterpretation.
- If the image or data includes additional visual features (e.g., color, thickness of lines), consider leveraging this information to refine mappings.
- If you encounter any ambiguity in identifying connections, provide reasoning for the determined association and clarify any assumptions made.

# Examples

### Example Mapping:

*Image Description*:  
- You have three boxes labeled "Box 1," "Box 2," and "Box 3."  
- Lines connect these boxes to options labeled "Option A," "Option B," and "Option C."

**Generated Mapping**:

json
{
  "mappings": [
    {
      "box": "Box 1",
      "options": ["Option A"]
    },
    {
      "box": "Box 2",
      "options": ["Option B"]
    },
    {
      "box": "Box 3",
      "options": ["Option C"]
    }
  ]
}
*(Note: Real-world examples may include placeholders like [Box Label] or [Option Label] to illustrate mappings specific to the input image.)*

Sample : RPE3-04 2 ? 01200 E1 K1 N2 D1 V

---

If clarification or initial analysis is being requested for line detection, ensure proper visual recognition (e.g., OCR tools, image processing libraries) are incorporated into your workflow.

Here, JSON is not the output it just stores the extracted info in this format so that it can able to answer your query properly.

Output:

enter image description here

It's not like you need to use exact prompt, change it to your requirement accordingly with more specific expected outcome.

Please do let me know if you have further query.

Thank you

JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-19T05:27:01.1133333+00:00

Hi @Tyler Suard ,

Did you try above prompt? please try it and let me know if you have any query.

If it's working, please do accept the solution and give feedback by clicking on yes.

Thank you
Tyler Suard 155 Reputation points

2025-03-19T17:31:08.9066667+00:00

@JAYA SHANKAR G S
This won't work for us unfortunately... you explicitly mentioned the correct ordering code in the prompt:

Sample : RPE3-04 2 ? 01200 E1 K1 N2 D1 V

and this helps o1 to fetch the right segment based on the example. In our case we don't have a sample order code anywhere in the document.

If we remove this Sample : RPE3-04 2 ? 01200 E1 K1 N2 D1 V from the prompt, we get the wrong results again.
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-20T07:42:53.2933333+00:00
Hi @Tyler Suard ,

It's not that you need to use the exact prompt, you create your own prompt according to requirement.

Actually, the sample code RPE3-042?01200E1K1N2D1V is given by you in earlier comment, but now you don't want to use it? Does the sample given changes according to the document?

All you have to give is proper standard prompt which extracts the image data, and answers your query.

Please try replacing the sample with below instruction.

Ordering code format: Static part RPE3-04 ,remaining empty box needs to be extracted according to document

Please let me know if you have any further query.

Thank you
Tyler Suard 155 Reputation points

2025-03-20T16:45:18.3666667+00:00

@JAYA SHANKAR G S Thank you for your help again.

We are looking to create a program that can automatically generate a valid product ordering code from any order form PDF. The image I sent to you was an example of the kind of PDF we want to perform this task on. Many of the PDFs do not have an example code attached.

The prompt you sent us worked well, but only if we include the example valid product code, which as I mentioned is often not present in any of the input PDFs.

How can we prompt o1 to correctly extract a valid example code, or a JSON containing all the product attributes, in the correct order?
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-21T05:05:34.3233333+00:00
Hi @Tyler Suard , please share you expected outcome with existing data and conditions.

Like i said you can alter the prompt to your requirement please try it.

one more thing,

In your pdf file there is no RPE3-04 and empty boxes?

Is the length of the order code changing in your documents ?

Please provide what and all the static part in your documents which can be used in prompt.

Thank you
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-24T07:54:46.57+00:00
Hi @Tyler Suard ,

I believe RPE3-04 and empty boxes are static part in all of your documents, so below instruction in prompt should work.

Ordering code format: Static part RPE3-04 , remaining empty box needs to be extracted according to document

Please try it on all of your documents and let me know.

Thank you
Tyler Suard 155 Reputation points

2025-03-24T19:11:19.1966667+00:00

@JAYA SHANKAR G S I don't think we are communicating. My company is trying to build a program that will use AI to create valid example ordering codes from product ordering sheets. This is just an example of the kind of sheet we want to use, but our program will face a wide variety of these product ordering sheets. RPE04 will only be in this particular sheet, not the thousands of other sheets from other companies. Just about the only thing these sheets have in common is the boxes with the leader lines pointing to the attributes.
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-25T11:00:34.6733333+00:00

Hi @Tyler Suard ,

At least we should have a sample or example for model to find out exact order code, if you don't want to give sample/example you need to train the model on multiple product codes so that it learns the sequence without requiring a hardcoded example in the prompt. As of now o1 model is not supported for fine tuning, so you can use gpt-4o model like mentioned here.

Thank you
JAYA SHANKAR G S 3,960 Reputation points Microsoft External Staff Moderator

2025-03-26T07:09:43.68+00:00

Hi @Tyler Suard ,

Did you tried anything on fine-tuning the model for your documents? Because for your case it's the only option left. Please check it and let me know.

Thank you

Share via

OpenAI o1 unable to understand documents with lines connecting data (leader lines, elbow connectors, etc.)

2 answers

Your answer