Using Java how do I split the pages in a document for sending it through a custom classification model in documentintelligencestudio using SplitMode class in documentintelligencestudio sdk

Question

Using Java how do I split the pages in a document for sending it through a custom classification model in documentintelligencestudio using SplitMode class in documentintelligencestudio sdk

Krishna K 0

Hi everyone! I’m working on a Spring Boot application that integrates with a Custom Classification model from Azure Document Intelligence Studio. I’m using version 1.0.0-beta.4 of the azure.ai.documentintelligence package in my Java code.

My goal is to submit a multi-page PDF (or other multi-page formats) for classification but have each page classified individually (essentially a “split by page” approach). I’ve seen references to “splitMode” or “pages” parameters in other languages (like Python) but I’m not finding a clear way to do this in the current Java library.

Has anyone done page-by-page classification using the Java v1.0.0-beta.4 APIs? If so:

Which methods or parameters did you use?
Are there any workarounds or best practices (maybe uploading page-by-page) you’d recommend?
Is there official documentation or a sample project that demonstrates this “page splitting” behavior?

Any code snippets, references, or insights would be greatly appreciated! Thank you in advance.

Nikhil Jha (Accenture International Limited) 4,335 Reputation points Microsoft External Staff Moderator

2025-10-01T07:10:15.6066667+00:00

Hi Krishna K

Good Day.
Looking into your query, will soon provide resolution.

1 answer

Your answer

Nikhil Jha (Accenture International Limited) 4,335 Reputation points Microsoft External Staff Moderator

2025-10-01T07:10:15.6066667+00:00

Hi Krishna K

Good Day.
Looking into your query, will soon provide resolution.

Answer 1

Hi Krishna K,

For your goal of classifying each page of a document individually using the azure.ai.documentintelligence library (version 1.0.0-beta.4), I have identified 2 approaches that you can try to work around.
Approach 1:
Iterate through the document's pages and submit a separate classification request for each page. This is achieved using the setPages() method on the ClassifyDocumentRequest object. looping through the page numbers of your document and create a distinct API call for each page. This treats each page as a standalone document for the purpose of classification.

Code Snippet:

import com.azure.ai.documentintelligence.DocumentIntelligenceClient;
import com.azure.ai.documentintelligence.DocumentIntelligenceClientBuilder;
import com.azure.ai.documentintelligence.models.AnalyzeResult;
import com.azure.ai.documentintelligence.models.ClassifyDocumentRequest;
import com.azure.ai.documentintelligence.models.DocumentClassifierDetails;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.polling.SyncPoller;
import java.util.HashMap;
import java.util.Map;
public class ClassifyPageByPage {
    public static void main(String[] args) {
        // Configure your client
        String endpoint = "YOUR_DOCUMENT_INTELLIGENCE_ENDPOINT";
        String key = "YOUR_DOCUMENT_INTELLIGENCE_KEY";
        String modelId = "YOUR_CUSTOM_CLASSIFIER_ID";
        String documentUrl = "URL_TO_YOUR_MULTIPAGE_DOCUMENT";
        
        // Assuming you know the page count beforehand.
        // For a dynamic approach, you could use a library like Apache PDFBox to get the page count from the PDF.
        int totalPages = 5; 
        DocumentIntelligenceClient client = new DocumentIntelligenceClientBuilder()
            .endpoint(endpoint)
            .credential(new AzureKeyCredential(key))
            .buildClient();
        // A map to store the classification result for each page
        Map<Integer, String> pageClassificationResults = new HashMap<>();
        System.out.printf("Classifying document from URL: %s%n", documentUrl);
        // Loop from page 1 to totalPages
        for (int i = 1; i <= totalPages; i++) {
            // Create a new request for each page
            ClassifyDocumentRequest request = new ClassifyDocumentRequest()
                .setUrlSource(documentUrl)
                .setPages(String.valueOf(i)); // <-- This is the key step!
            System.out.printf("Sending classification request for page %d...%n", i);
            
            // Send the request for the specific page
            SyncPoller<AnalyzeResult, AnalyzeResult> poller = client.beginClassifyDocument(modelId, request);
            AnalyzeResult result = poller.getFinalResult();
            if (result.getDocuments() != null && !result.getDocuments().isEmpty()) {
                // Get the top document type (classification) for this page
                String docType = result.getDocuments().get(0).getDocType();
                float confidence = result.getDocuments().get(0).getConfidence();
                pageClassificationResults.put(i, docType);
                System.out.printf("  - Page %d classified as: %s (Confidence: %.2f)%n", i, docType, confidence);
            } else {
                 System.out.printf("  - Page %d could not be classified.%n", i);
            }
        }
        
        System.out.println("\n--- Classification Summary ---");
        pageClassificationResults.forEach((page, docType) -> 
            System.out.printf("Page %d: %s%n", page, docType)
        );
    }
}

Key considerations with this approach:

API Calls and Cost: Be aware that this approach makes one API call per page.
Determining Page Count: The code sample above uses a hardcoded totalPages value.
Error Handling: Implement robust error handling within your loop. A failure to classify one page shouldn't necessarily stop the entire process for the remaining pages.

Approach 2:
Using SplitMode.PER_PAGE - The Java SDK does support page-level splitting through the SplitMode enumeration.
Code Snippet:

// 
import com.azure.ai.documentintelligence.DocumentIntelligenceClient;
import com.azure.ai.documentintelligence.DocumentIntelligenceClientBuilder;
import com.azure.ai.documentintelligence.models.*;
import com.azure.core.credential.AzureKeyCredential;
import com.azure.core.util.BinaryData;

@Service
public class DocumentClassificationService {
    
    private final DocumentIntelligenceClient client;
    
    public DocumentClassificationService() {
        this.client = new DocumentIntelligenceClientBuilder()
            .endpoint("your-endpoint")
            .credential(new AzureKeyCredential("your-key"))
            .buildClient();
    }
}
// 
public AnalyzeResult classifyDocumentByPages(byte[] documentBytes, String classifierModelId) {
    
    // Create analyze request with PER_PAGE split mode
    AnalyzeDocumentRequest analyzeRequest = new AnalyzeDocumentRequest()
        .setBase64Source(Base64.getEncoder().encodeToString(documentBytes));
    
    // Configure the polling operation with splitMode
    SyncPoller<AnalyzeResultOperation, AnalyzeResult> poller = client.beginAnalyzeDocument(
        classifierModelId,
        analyzeRequest,
        null, // pages - null means analyze all pages
        null, // locale
        StringIndexType.TEXT_ELEMENTS, // string index type
        null, // features
        null, // query fields
        ContentFormat.TEXT, // output content format
        SplitMode.PER_PAGE // This is the key parameter for page-level classification
    );
    
    // Wait for completion and get results
    AnalyzeResult result = poller.getFinalResult();
    return result;
}

// 
public List<PageClassificationResult> processClassificationResults(AnalyzeResult result) {
    List<PageClassificationResult> pageResults = new ArrayList<>();
    
    if (result.getDocuments() != null) {
        for (int i = 0; i < result.getDocuments().size(); i++) {
            Document doc = result.getDocuments().get(i);
            
            PageClassificationResult pageResult = new PageClassificationResult();
            pageResult.setPageNumber(i + 1);
            pageResult.setDocumentType(doc.getDocType());
            pageResult.setConfidence(doc.getConfidence());
            
            // Extract page spans to determine which page this classification covers
            if (doc.getSpans() != null && !doc.getSpans().isEmpty()) {
                pageResult.setSpans(doc.getSpans());
            }
            
            pageResults.add(pageResult);
        }
    }
    
    return pageResults;
}
// Helper class for structured results
public class PageClassificationResult {
    private int pageNumber;
    private String documentType;
    private float confidence;
    private List<DocumentSpan> spans;
    
    // Getters and setters...
}

I hope this helps, kindly Accept & Upvote the answer for remediation of other community members.
😊

Share via

Using Java how do I split the pages in a document for sending it through a custom classification model in documentintelligencestudio using SplitMode class in documentintelligencestudio sdk

1 answer

Your answer