BuildDocumentClassifierRequest from python SDK resulting in TrainingContentMissing: Training data is missing: Could not find any training data at the given path.

Rony Tayoun 20 Reputation points
2025-10-26T13:53:02.18+00:00

Trying to train a new classifier from python SDK, my doc_types are as follows:
{'0201': {'azureBlobSource': {'containerUrl': '...', 'prefix': "examples-de-chaque-document/examples-class1/"}}, ....

I have checked the containerUrl works (sp and sr included)

I get an output:
Model training failure

TrainingContentMissing: Training data is missing: Could not find any training data at the given path.

Help !

from azure.ai.documentintelligence.models import BuildDocumentClassifierRequest
import uuid

# Generate a unique classifier ID
classifier_id = f"top-level-classifier-{uuid.uuid4()}"

build_request = BuildDocumentClassifierRequest(
    classifier_id=classifier_id,  # mandatory
    description="Top-level classifier for Excel codes",
    doc_types=doc_types,
    allow_overwrite=True
)

# Start training
poller = admin_client.begin_build_classifier(build_request)
print(f"Training started asynchronously! Classifier ID: {classifier_id}")

Azure AI Document Intelligence
{count} votes

1 answer

Sort by: Most helpful
  1. Nikhil Jha (Accenture International Limited) 4,150 Reputation points Microsoft External Staff Moderator
    2025-10-28T08:05:05.3266667+00:00

    Hello Rony Tayoun,

    Thank you for providing such a detailed follow-up, including your new code and the persistent error message. Your thorough testing helps us pinpoint the exact problem, based on the code you've shared, your issue is a very specific and critical breaking change in the new v4.0 SDK (azure-ai-documentintelligence) that you are using.

    The root cause of the TrainingContentMissing error is that the v4.0 AzureBlobContentSource model does not have a prefix parameter. I see you are still passing a prefix parameter, just like in your original dictionary.

    The Python SDK is simply ignoring this unknown prefix argument. As a result, it is using your container_sas_url (which points to the root of your container) and looking for your training files there. Since your files are not at the root—they are in the "folder" specified by your prefix—the service correctly reports that it cannot find any training data at the given path.

    The solution is to remove the prefix parameter and instead append the folder path directly to the container_url string before the SAS token.

    Recommended Steps:

    1.Your SAS URL must point directly to the specific "folder" containing the files for that class.

    
    container_sas_url = "[YOUR_BASE_SAS_URL_WITH_TOKEN]" 
    # e.g., "https://myaccount.blob.core.windows.net/mycontainer?sv=..."
    # Append the prefix (folder path) to the container name
    
    
    url_2058a = "https://[STORAGE_NAME].blob.core.windows.net/[CONTAINER_NAME]/examples-de-chaque-document/2058-a/?[SAS_TOKEN]"
    
    url_2058b = "https://[STORAGE_NAME].blob.core.windows.net/[CONTAINER_NAME]/examples-de-chaque-document/2058-b/?[SAS_TOKEN]"
    

    Note: You must generate a SAS token at the container level, not the blob level, for this to work. Also, check the trailing slash '/' after the folder name, before the '?'

    1. Now, build your doc_types object using these new, complete URLs and no prefix parameter.
    from azure.ai.documentintelligence.models import AzureBlobContentSource
    
    doc_types = {
        '2058a': {
            'azureBlobSource': AzureBlobContentSource(
                container_url=url_2058a # Use the full path with the folder
                # NO 'prefix' parameter here
            )
        },
        '2058b': {
            'azureBlobSource': AzureBlobContentSource(
                container_url=url_2058b
            )
        }
    }
    
    1. Run Your Existing Training Code

    Your BuildDocumentClassifierRequest and begin_build_classifier code is already correct. You do not need to change it. Simply run it again using the corrected doc_types object from Step 2, and the service will now find your files.

    For more information, please refer to the official Microsoft documentation:

    AzureBlobContentSource (v4.0 SDK): Note this model only has container_url.

    ClassifierDocumentTypeDetails (v3 SDK - for comparison): This is the old model that used container_url and prefix separately. This shows the change.


    Please let us know if this helps. If yes, kindly "Accept the answer" and/or upvote, so it will be beneficial to others in the community as well.


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.