How Do I Create a ModelDirectory Type FileDataset

Lee Harper 31 Reputation points
2021-05-18T00:51:15.643+00:00

I am trying to build a solution that automates part of the model deployment within the Azure ML designer. I am able to build a model with the designer, and then execute a python script block to extract the trained_model_outputs folder from the model training block. I have precisely matched the folder structure that Azure ML designer assigns to the model's FileDataset

When I register the trained_model_outputs as a FileDataset, it assigns it the type AnyDirectory. This is a problem, as when I try to build it into the inference pipeline, the designer rejects it, saying it must be a ModelDirectory, even though there shouldn't be any functional difference between the two.

I have seen that I can expose the ModelDirectory class as below, however I cannot find the API documentation online about this class anywhere, and I can't review it's source code as it isn't in the standard SDK:

from azureml.studio.core.io.model_directory import ModelDirectory

Can you provide a code snippet or similar that I can use to leverage this class when creating the FileDataset so that the model dataset gains the ModelDirectory type attribute?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,574 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,616 Reputation points
    2021-05-18T13:56:26.373+00:00

    @Lee Harper Thanks for the question. Can you please add more details about the use case.

    OutFileDatasetConfig is a control plane concept to pass data between pipeline steps. PipelineData was intended to represent "transient" data from one step to the next one, while OutputDatasetConfig was intended for capturing the final state of a dataset. PipelineData always outputs data in a folder structure like {run_id}{output_name}. OutputDatasetConfig allows to decouple the data from the run and hence it allows you to control where to land the data (although by default it will produce similar folder structure). The OutputDatasetConfig allows even to register the output as a Dataset, where getting rid of such folder structure makes sense. From the docs itself: "Represent how to copy the output of a run and be promoted as a FileDataset. The OutputFileDatasetConfig allows you to specify how you want a particular local path on the compute target to be uploaded to the specified destination".

    Please follow the below link to use the upload API.
    https://learn.microsoft.com/en-us/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory?view=azure-ml-py#upload-directory-src-dir--target--pattern-none--overwrite-false--show-progress-true-