Recommended way to get to know the location of various folders within a Docker image

Alexander Pakakis 46 Reputation points
2022-03-08T15:28:59.29+00:00

I am creating a pipeline in Azure Machine Learning Studio. The pipeline consists of various steps of the type "PythonScriptStep". In each step I need to read from the input data and write data to the defined output folder of the type "PipelineData".

Until yesterday, I used the environment variables of the build docker image to get to know various locations, e.g. the location of the "wd"-directoy.
The directory folder path of the "wd"-directoy was stored in the environment variable 'AZ_BATCHAI_JOB_TEMP'. Now, as it seems to me, the environment variable name has been changed. The directory folder path of the "wd"-directoy can now be found in the environment variable 'AZUREML_CR_DATA_CAPABILITY_PATH'.

The environment variable 'AZ_BATCHAI_JOB_MOUNT_ROOT' has been removed completely.

Since environment variable names are changing from one day to another and are not constant, I would like to ask for the recommended way to get to know the location of various folders within the Docker image.

With best regards
Alexander Pakakis

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,563 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,616 Reputation points
    2022-03-09T09:25:59.46+00:00

    @AlexanderPakakis-0994 Thanks for the question. The most basic way to achieve this is to use PipelineData and specify the output as a directory.
    from azureml.pipeline.core import PipelineData
    output_dir = PipelineData(
    name="output_dir",
    datastore=pipeline_datastore,
    pipeline_output_name="output_dir",
    is_directory=True,
    )

    OutputFileDatasetConfig very powerful, Here is how It can be used for pipelines:

    from azureml.core import ScriptRunConfig, Experiment
    from azureml.data import OutputFileDatasetConfig
    output_port = OutputFileDatasetConfig(
    destination=(def_data_store, "outputs/test_diroutputFileDatasetConfig/"), name="dir_test"
    )

    experiment = Experiment(ws, 'MyExperiment')
    config = ScriptRunConfig(source_directory='modules/test_output_dir/',
    script='copy.py',
    arguments = ['--output',
    output_port],
    compute_target="local")
    script_run = experiment.submit(config)