BuildDocumentClassifierRequest from python SDK resulting in TrainingContentMissing: Training data is missing: Could not find any training data at the given path

Question

BuildDocumentClassifierRequest from python SDK resulting in TrainingContentMissing: Training data is missing: Could not find any training data at the given path

Rony Tayoun 20

I am trying to create a custom classification model.
I have 2 classes '2058-a' and '2058-b'.

I made sure that the containerURL works by using it to print the existing files.



container_client = ContainerClient.from_container_url(container_sas_url)
blobs = container_client.list_blobs(name_starts_with="examples-de-chaque-document/2058-a/")
print(f"Found {len(list(blobs))} files")

container_client = ContainerClient.from_container_url(container_sas_url)
blobs = container_client.list_blobs(name_starts_with="examples-de-chaque-document/2058-b/")
print(f"Found {len(list(blobs))} files")

The files I am using hera are the same files I use in the document intelligence studio (6 pdfs for each class)

doc_types

In the next section, I try to call the begin_build_classifier. Note that in the latest SDK , BuildDocumentClassifierRequest does not accept build_mode as parameter as suggested and also begin_build_classifier accepts one parameter body. So the code is as follows :

PythonCopy



# Optional: Add model_id if building on a prebuilt classifier

classifier_id = f"new-test-{uuid.uuid4()}"

build_request = BuildDocumentClassifierRequest(

    classifier_id=classifier_id,

    description="Example classifier",

    doc_types=doc_types,  # Now using AzureBlobContentSource objects

    allow_overwrite=True,
)

try:

    poller = admin_client.begin_build_classifier(
        build_request
    )

    print(f"Training started! Classifier ID: {classifier_id}")

    

    # Poll for result with details

    result = poller.result()

    if result.status == "failed":

        print(f"Error details: {result.errors}")

    else:

        print(f"Classifier built successfully: {result.model_id}")

        

except Exception as e:

    print(f"Build failed: {e}")

Training started! Classifier ID: new-test-62c4dfb9-2765-46f2-86fc-609cc8603672 Build failed: (InvalidRequest) Invalid request. Code: InvalidRequest Message: Invalid request. Exception Details: (TrainingContentMissing) Training data is missing: Could not find any training data at the given path. Code: TrainingContentMissing Message: Training data is missing: Could not find any training data at the given path.

I tried different things like adding ClassifierDocumentTypeDetails to the doc_types, removing trailing "/" from the prefix etc.. but still no luck.

It is still failing.

Manas Mohanty 13,255 Moderator

Hi Rony Tayoun

You have to use the SAS Url along prefix locating their folder location, please have read permission from SAR Url before using.

Sample commands

 endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"]
            key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"]
            blob_container_sas_url = os.environ["DOCUMENTINTELLIGENCE_TRAINING_DATA_CLASSIFIER_SAS_URL"]

            document_intelligence_admin_client = DocumentIntelligenceAdministrationClient(
                endpoint=endpoint, credential=AzureKeyCredential(key)
            )

            poller = document_intelligence_admin_client.begin_build_classifier(
                BuildDocumentClassifierRequest(
                    classifier_id=str(uuid.uuid4()),
                    doc_types={
                        "IRS-1040-A": ClassifierDocumentTypeDetails(
                            azure_blob_source=AzureBlobContentSource(
                                container_url=blob_container_sas_url, prefix="IRS-1040-A/train"
                            )
                        ),
                        "IRS-1040-B": ClassifierDocumentTypeDetails(
                            azure_blob_source=AzureBlobContentSource(
                                container_url=blob_container_sas_url, prefix="IRS-1040-B/train"
                            )
                        ),
                    },
                )
            )
            classifier = poller.result()
            classifier_id = classifier.classifier_id

Reference- https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Python(v4.0)/Custom_model/sample_classify_document.py

Thank you.

Rony Tayoun 20

of course I already construct the doc_types as follows:

(also tried without the trailing / at the end)

doc_types = {
    "2058a": ClassifierDocumentTypeDetails(
        azure_blob_source=AzureBlobContentSource(
            container_url=container_sas_url,
            prefix="examples-de-chaque-document/2058-a/" 
        )
    ),
    "2058b": ClassifierDocumentTypeDetails(
        azure_blob_source=AzureBlobContentSource(
            container_url=container_sas_url,
            prefix="examples-de-chaque-document/2058-b/"
        )
    )
}

Manas Mohanty 13,255 Reputation points Moderator

2025-11-05T13:21:00.7266667+00:00

Hi Rony Tayoun

Thank you for testing out all suggested approaches.

I am able to replicate this issue and reviewing with product group issue as it is not working as intended any more.

Shall keep you posted as we progress.

Thank you.
Manas Mohanty 13,255 Reputation points Moderator

2025-11-11T13:46:06.0333333+00:00

Hi Rony Tayoun

I have reached product team through an engineering ticket for their attention after replicating the issue (could not reach any solution at support side after a lot of trials)

Thank you for staying patient.
Manas Mohanty 13,255 Reputation points Moderator

2025-11-13T12:41:21.22+00:00

Hi Rony Tayoun

I have got some updates from concerned team and trying to route to right channel for resolution.

Thank you for staying patient.
Manas Mohanty 13,255 Reputation points Moderator

2025-11-24T11:19:39.7766667+00:00

Hi Rony Tayoun

Good day. I have got some updates from product group on this.

Shall revert back once I get the working procedure.

Thank you.

1 answer

Your answer

Manas Mohanty 13,255 Reputation points Moderator

2025-10-27T18:35:54.3833333+00:00

Hi Rony Tayoun

You have to use the SAS Url along prefix locating their folder location, please have read permission from SAR Url before using.

Sample commands

endpoint = os.environ["DOCUMENTINTELLIGENCE_ENDPOINT"] key = os.environ["DOCUMENTINTELLIGENCE_API_KEY"] blob_container_sas_url = os.environ["DOCUMENTINTELLIGENCE_TRAINING_DATA_CLASSIFIER_SAS_URL"] document_intelligence_admin_client = DocumentIntelligenceAdministrationClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) poller = document_intelligence_admin_client.begin_build_classifier( BuildDocumentClassifierRequest( classifier_id=str(uuid.uuid4()), doc_types={ "IRS-1040-A": ClassifierDocumentTypeDetails( azure_blob_source=AzureBlobContentSource( container_url=blob_container_sas_url, prefix="IRS-1040-A/train" ) ), "IRS-1040-B": ClassifierDocumentTypeDetails( azure_blob_source=AzureBlobContentSource( container_url=blob_container_sas_url, prefix="IRS-1040-B/train" ) ), }, ) ) classifier = poller.result() classifier_id = classifier.classifier_id

Reference- https://github.com/Azure-Samples/document-intelligence-code-samples/blob/main/Python(v4.0)/Custom_model/sample_classify_document.py

Thank you.
Rony Tayoun 20 Reputation points

2025-10-27T21:01:44.3833333+00:00

of course I already construct the doc_types as follows:

(also tried without the trailing / at the end)

doc_types = { "2058a": ClassifierDocumentTypeDetails( azure_blob_source=AzureBlobContentSource( container_url=container_sas_url, prefix="examples-de-chaque-document/2058-a/" ) ), "2058b": ClassifierDocumentTypeDetails( azure_blob_source=AzureBlobContentSource( container_url=container_sas_url, prefix="examples-de-chaque-document/2058-b/" ) ) }
Manas Mohanty 13,255 Reputation points Moderator

2025-11-05T13:21:00.7266667+00:00

Hi Rony Tayoun

Thank you for testing out all suggested approaches.

I am able to replicate this issue and reviewing with product group issue as it is not working as intended any more.

Shall keep you posted as we progress.

Thank you.
Manas Mohanty 13,255 Reputation points Moderator

2025-11-11T13:46:06.0333333+00:00

Hi Rony Tayoun

I have reached product team through an engineering ticket for their attention after replicating the issue (could not reach any solution at support side after a lot of trials)

Thank you for staying patient.
Manas Mohanty 13,255 Reputation points Moderator

2025-11-13T12:41:21.22+00:00

Hi Rony Tayoun

I have got some updates from concerned team and trying to route to right channel for resolution.

Thank you for staying patient.
Manas Mohanty 13,255 Reputation points Moderator

2025-11-24T11:19:39.7766667+00:00

Hi Rony Tayoun

Good day. I have got some updates from product group on this.

Shall revert back once I get the working procedure.

Thank you.

Answer 1

Hello Rony Tayoun,

Welcome to the Microsoft Q&A and thank you for posting your questions here.

I understand about your BuildDocumentClassifierRequest from python SDK resulting in TrainingContentMissing.

Nikhil Jha (on Oct 28) have identified breaking change in v4.0 removing prefix and embedding folder path in container_url fixed it. - https://learn.microsoft.com/en-gb/answers/questions/5598410/builddocumentclassifierrequest-from-python-sdk-res and you confirmed workaround efficacy by uploading into separate folder and generating .jsonl, but reported continuing issues using SDK directly. However, both approaches were attempted. The final accepted answer clarifies that remove prefix, embed folder in SAS URL solved it.

The final advice is to remove the prefix and include the folder path directly in the container_url. This step is crucial for SDK version 4.0. Previous instructions that used prefix led to confusion. While the official documentation still needs updates, the solution provided has been tested and works. It addresses all major issues such as SAS permissions, file format requirements, and SDK changes, so you can successfully train your classifiers.

I hope this is helpful! Do not hesitate to let me know if you have any other questions or clarifications.

Please don't forget to close up the thread here by upvoting and accept it as an answer if it is helpful.

Share via

BuildDocumentClassifierRequest from python SDK resulting in TrainingContentMissing: Training data is missing: Could not find any training data at the given path

1 answer

Your answer