Azure Custom Classification Model - How set "split mode" with Python?

Roberto Araujo Filho 115 Reputation points
2024-01-05T23:07:14.8266667+00:00

I have created a Custom Classification Model using Azure Document Intelligence Studio that works fine. But it classifies page by page of a document and I would like to get just one classification for the whole document.

The Document Intelligence Studio´s interface provides a buttom (Analyse options) where I can set this condition when classifying using this interface (as you can see below).

User's image

But I need to set this option to "none" inside my python code. I´ve tried some solutions like this:

poller = document_analysis_client.begin_classify_document(classifier_id, document=f, split="none")

Unfortunately the 'begin classify document' method doesn't accept this argument (split) and I couldn't find a way to configure it by looking in the Azure SDK for Python documentation.

I'll be happy if anyone can help.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,100 questions
{count} votes

Accepted answer
  1. VasaviLankipalle-MSFT 18,676 Reputation points Moderator
    2024-01-07T03:55:55.8433333+00:00

    Hello @Roberto Araujo Filho , I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to "Accept " the answer.

    Issue: Azure Custom Classification Model - How to set "split mode" with Python?

    Solution:

    The public preview version of Document Intelligence client libraries default to REST API version 2023-10-31-preview. Starting with the 2023-10-31-preview API, analyzing documents with the custom classification model won't split documents by default. You need to explicitly set the splitMode property to auto to preserve the behavior from previous releases. The default for splitMode is none. If your input file contains multiple documents, you need to enable splitting by setting the splitMode to auto. https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-custom-classifier?view=doc-intel-4.0.0

    Make sure you are using the latest SDK version and python>=3.8.

    python -m pip install azure-ai-documentintelligence
    

    This table shows the relationship between SDK versions and supported API service versions: Here is the sample code: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_classify_document.pyUser's image

    The sample SDK code to set 'split' mode using new SDK V4.0 (1.0.0b1 (preview)):

    # new code
    from azure.ai.documentintelligence import DocumentIntelligenceClient
    
    document_analysis_client = DocumentIntelligenceClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )
    
    with open(each_file, "rb") as f:
        poller = document_analysis_client.begin_classify_document(
    		classifier_id, classify_request=f, split="none", content_type="application/octet-stream"
    	)
    result = poller.result()
    

    If you have any other questions or are still running into more issues, please let me know.

    Thank you again for your time and patience throughout this issue.

    Regards,
    Vasavi

    Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.

    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.