question

AlexanderPakakis avatar image
0 Votes"
AlexanderPakakis asked ramr-msft answered

Recommended way to get to know the location of various folders within a Docker image

I am creating a pipeline in Azure Machine Learning Studio. The pipeline consists of various steps of the type "PythonScriptStep". In each step I need to read from the input data and write data to the defined output folder of the type "PipelineData".

Until yesterday, I used the environment variables of the build docker image to get to know various locations, e.g. the location of the "wd"-directoy.
The directory folder path of the "wd"-directoy was stored in the environment variable 'AZ_BATCHAI_JOB_TEMP'. Now, as it seems to me, the environment variable name has been changed. The directory folder path of the "wd"-directoy can now be found in the environment variable 'AZUREML_CR_DATA_CAPABILITY_PATH'.

The environment variable 'AZ_BATCHAI_JOB_MOUNT_ROOT' has been removed completely.

Since environment variable names are changing from one day to another and are not constant, I would like to ask for the recommended way to get to know the location of various folders within the Docker image.


With best regards
Alexander Pakakis

azure-machine-learning
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@AlexanderPakakis-0994 Thanks, Can you please share the code/sample that you are trying.

0 Votes 0 ·

1 Answer

ramr-msft avatar image
0 Votes"
ramr-msft answered

@AlexanderPakakis-0994 Thanks for the question. The most basic way to achieve this is to use PipelineData and specify the output as a directory.
from azureml.pipeline.core import PipelineData
output_dir = PipelineData(
name="output_dir",
datastore=pipeline_datastore,
pipeline_output_name="output_dir",
is_directory=True,
)

OutputFileDatasetConfig very powerful, Here is how It can be used for pipelines:

from azureml.core import ScriptRunConfig, Experiment
from azureml.data import OutputFileDatasetConfig
output_port = OutputFileDatasetConfig(
destination=(def_data_store, "outputs/test_diroutputFileDatasetConfig/"), name="dir_test"
)


experiment = Experiment(ws, 'MyExperiment')
config = ScriptRunConfig(source_directory='modules/test_output_dir/',
script='copy.py',
arguments = ['--output',
output_port],
compute_target="local")
script_run = experiment.submit(config)



· 4
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you for your reply!

My question is not about PipelineData!

My question is about the recommended way to get to know the location of specific folders, like the "wd"-directory.
Some days ago, I used the environment variables for that.
This list helped me for this purpose.
Here is a code sample which worked until yesterday afternoon:

 import os
 dict_environment_variables = os.environ
 mount_path = dict_environment_variables['AZ_BATCHAI_JOB_MOUNT_ROOT']

Unfortunately, the code does not work anymore because there is no environment variable with the name "AZ_BATCHAI_JOB_MOUNT_ROOT" anymore.

Did the Azure ML team decided to delete or rename some environment variables?

Since environment variable names are changing from one day to another and are not constant, I would like to ask for the recommended way to get to know the location of various folders within the Docker image.

0 Votes 0 ·
ramr-msft avatar image ramr-msft AlexanderPakakis ·

@AlexanderPakakis-0994 Thanks for the details. Azure Batch AI is being retired. Can you please add more details about the steps that you performed.
https://docs.microsoft.com/en-us/previous-versions/azure/batch-ai/overview-what-happened-batch-ai

0 Votes 0 ·

sure! I need to read from the input data and write data to the defined output folder of the type "PipelineData".
I wonder how to I know the folder location of my input data and output folder.

0 Votes 0 ·
Show more comments