App Fails to Extract Complete Content and ABAP Code from Documents Despite Successful Retrieval

Amaan Syed 20 Reputation points
2025-05-26T15:31:58.07+00:00

I'm developing a Streamlit app using LangChain and Pinecone to process and query SAP documents. The app retrieves relevant documents, but fails to extract the complete content and ABAP code snippets, returning only partial text and code despite the full document being retrieved. Why is this happening, and how can I ensure the full content and code are extracted as they appear in the documents?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
0 comments No comments
{count} votes

Accepted answer
  1. santoshkc 15,245 Reputation points Microsoft External Staff Moderator
    2025-05-28T10:31:22.64+00:00

    Hi @Amaan Syed,

    The issue where your Streamlit app retrieves SAP documents but extracts only partial content and ABAP code is likely due to limitations in the document parsing process, potentially in Azure AI Document Intelligence (if that’s the tool you’re using) and how LangChain handles text splitting. ABAP code blocks, especially those embedded in tables or with special formatting, may be misinterpreted or truncated during parsing. Additionally, LangChain’s default text splitters can unintentionally break code across chunks, resulting in incomplete or fragmented outputs. If the extracted content isn’t properly grouped before storing in Pinecone, retrieval accuracy can also suffer.

    In LangChain, apply custom chunking logic that avoids splitting code blocks, and store full logical sections (text + code) in Pinecone with appropriate metadata. This helps ensure accurate extraction and retrieval of complete content and ABAP code as they appear in the original documents.

    I hope you understand. And, if you have any further query do let us know.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful.

    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.