Azure Custom Classification Model - How set "split mode" with Python?

Roberto Araujo Filho 115 Reputation points
2024-01-05T23:07:14.8266667+00:00

I have created a Custom Classification Model using Azure Document Intelligence Studio that works fine. But it classifies page by page of a document and I would like to get just one classification for the whole document.

The Document Intelligence Studio´s interface provides a buttom (Analyse options) where I can set this condition when classifying using this interface (as you can see below).

User's image

But I need to set this option to "none" inside my python code. I´ve tried some solutions like this:

poller = document_analysis_client.begin_classify_document(classifier_id, document=f, split="none")

Unfortunately the 'begin classify document' method doesn't accept this argument (split) and I couldn't find a way to configure it by looking in the Azure SDK for Python documentation.

I'll be happy if anyone can help.

Azure AI Document Intelligence
{count} votes

Answer accepted by question author
  1. VasaviLankipalle-MSFT 18,716 Reputation points Moderator
    2024-01-07T03:55:55.8433333+00:00

    Hello @Roberto Araujo Filho , I'm glad that you were able to resolve your issue and thank you for posting your solution so that others experiencing the same thing can easily reference this! Since the Microsoft Q&A community has a policy that "The question author cannot accept their own answer. They can only accept answers by others ", I'll repost your solution in case you'd like to "Accept " the answer.

    Issue: Azure Custom Classification Model - How to set "split mode" with Python?

    Solution:

    The public preview version of Document Intelligence client libraries default to REST API version 2023-10-31-preview. Starting with the 2023-10-31-preview API, analyzing documents with the custom classification model won't split documents by default. You need to explicitly set the splitMode property to auto to preserve the behavior from previous releases. The default for splitMode is none. If your input file contains multiple documents, you need to enable splitting by setting the splitMode to auto. https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/concept-custom-classifier?view=doc-intel-4.0.0

    Make sure you are using the latest SDK version and python>=3.8.

    python -m pip install azure-ai-documentintelligence
    

    This table shows the relationship between SDK versions and supported API service versions: Here is the sample code: https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/sample_classify_document.pyUser's image

    The sample SDK code to set 'split' mode using new SDK V4.0 (1.0.0b1 (preview)):

    # new code
    from azure.ai.documentintelligence import DocumentIntelligenceClient
    
    document_analysis_client = DocumentIntelligenceClient(
        endpoint=endpoint, credential=AzureKeyCredential(key)
    )
    
    with open(each_file, "rb") as f:
        poller = document_analysis_client.begin_classify_document(
    		classifier_id, classify_request=f, split="none", content_type="application/octet-stream"
    	)
    result = poller.result()
    

    If you have any other questions or are still running into more issues, please let me know.

    Thank you again for your time and patience throughout this issue.

    Regards,
    Vasavi

    Please remember to "Accept Answer" if any answer/reply helped, so that others in the community facing similar issues can easily find the solution.

    2 people found this answer helpful.

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.