Hi Tyler Suard,
I Understood that your issue with OpenAI's o1 model (likely referring to GPT-4-turbo or a similar model in Azure OpenAI) not understanding leader lines and elbow connectors in documents like the one you uploaded is primarily due to how the model processes images and text. Here’s a step-by-step approach to solving this issue:
Issue Breakdown:
1.Complex Layouts with Leader Lines
The document has connecting lines linking data points instead of a straightforward table format.
OCR (Optical Character Recognition) extracts text but may not capture relationships properly.
2.GPT-Based Models Struggle with Structure
While GPT models can process OCR text, they may not infer relationships based on leader lines.
Without explicit annotations or structured text, it is hard for the model to determine the correct connections.
Solutions:
1.Preprocess Image Before Feeding into OpenAI's Model
Convert Image to Text + Structured Format
Use OCR tools like Tesseract, Azure AI Document Intelligence, or OpenAI's Vision models to extract text.
Parse the extracted text into a structured table.
Example Approach using Python + Tesseract
import pytesseract
from PIL import Image
image_path = "/mnt/data/image.png"
img = Image.open(image_path)
# Extract text
extracted_text = pytesseract.image_to_string(img)
print(extracted_text) # Review the extracted text
Enhancing OCR with Layout Parsing (LayoutLMv3 / Azure Document Intelligence)
Use Azure AI Document Intelligence (previously Form Recognizer) to detect structured elements.
Convert results into a JSON or tabular format.
2.Reformat Data for OpenAI o1
Transform extracted text into structured input for GPT
Create a JSON or key-value format to represent relationships.
Example JSON Format for the Document
{
"Ordering Code": "RPE3-04",
"Solenoid Operated Directional Control Valve": {
"Nominal size": "04(D02)",
"Number of valve positions": ["2 positions", "3 positions"],
"Seals": ["NBR", "FPM (Viton)"],
"Orifice in P-Port": ["No orifice", "Ø0.8 mm", "Ø1.2 mm", "Ø1.5 mm", "Ø2.1 mm", "Ø2.7 mm"],
"Manual override": ["Standard", "Covered with rubber protective boot"]
},
"Electrical Connector": {
"K1": "Without connector",
"K2": "Connector without rectifier with LED",
"K3": "Connector with rectifier",
"K4": "Connector with rectifier with LED and quenching diode",
"K5": "Connector with integrated rectifier and LED"
}
}
Feed this structured data to OpenAI for better understanding.
Now, o1 can process the information without struggling with leader lines.
3.Use Vision Models with Spatial Awareness
Try GPT-4V (Vision) or Azure Document Intelligence
These models can interpret relationships in structured documents with leader lines.
Use Azure Form Recognizer’s "Key-Value Pair Extraction" or "Table Extraction" to create a structured dataset.
Hope this helps. Do let us know if you any further queries.
-------------
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful.
Thank you.