BuildDocumentClassifierRequest from python SDK resulting in TrainingContentMissing: Training data is missing: Could not find any training data at the given path
I am trying to create a custom classification model.
I have 2 classes '2058-a' and '2058-b'.
I made sure that the containerURL works by using it to print the existing files.
container_client = ContainerClient.from_container_url(container_sas_url)
blobs = container_client.list_blobs(name_starts_with="examples-de-chaque-document/2058-a/")
print(f"Found {len(list(blobs))} files")
container_client = ContainerClient.from_container_url(container_sas_url)
blobs = container_client.list_blobs(name_starts_with="examples-de-chaque-document/2058-b/")
print(f"Found {len(list(blobs))} files")
The files I am using hera are the same files I use in the document intelligence studio (6 pdfs for each class)
doc_types
In the next section, I try to call the begin_build_classifier. Note that in the latest SDK , BuildDocumentClassifierRequest does not accept build_mode as parameter as suggested and also begin_build_classifier accepts one parameter body. So the code is as follows :
PythonCopy
# Optional: Add model_id if building on a prebuilt classifier
classifier_id = f"new-test-{uuid.uuid4()}"
build_request = BuildDocumentClassifierRequest(
classifier_id=classifier_id,
description="Example classifier",
doc_types=doc_types, # Now using AzureBlobContentSource objects
allow_overwrite=True,
)
try:
poller = admin_client.begin_build_classifier(
build_request
)
print(f"Training started! Classifier ID: {classifier_id}")
# Poll for result with details
result = poller.result()
if result.status == "failed":
print(f"Error details: {result.errors}")
else:
print(f"Classifier built successfully: {result.model_id}")
except Exception as e:
print(f"Build failed: {e}")
Training started! Classifier ID: new-test-62c4dfb9-2765-46f2-86fc-609cc8603672 Build failed: (InvalidRequest) Invalid request. Code: InvalidRequest Message: Invalid request. Exception Details: (TrainingContentMissing) Training data is missing: Could not find any training data at the given path. Code: TrainingContentMissing Message: Training data is missing: Could not find any training data at the given path.
I tried different things like adding ClassifierDocumentTypeDetails to the doc_types, removing trailing "/" from the prefix etc.. but still no luck.
It is still failing.