Azure ML Pipeline fails using attached data science ubuntu VM

Question

Azure ML Pipeline fails using attached data science ubuntu VM

Ilias Paschalis 0

I'm trying to use an attached Data Science VM as compute for an Azure ML pipeline, and the job fails even though it runs normally when using a compute instance. The same error is reproduced by running this sample using the Data Science VM for the training step, while the CPU compute is a compute instance.

Error screenshot:
User's image

azureml_logs/70_driver_log.txt:

[2024-06-28T11:45:58.539965] Entering context manager injector.
Cannot provide tracer without any exporter configured.
[2024-06-28T11:45:58.990435] context_manager_injector.py Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'Dataset:context_managers.Datasets', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError'], invocation=["mldesigner execute --source train_component.py --name train_image_classification_keras --inputs input_data='' epochs='10' --outputs output_model='DatasetOutputConfig:output_model'"])
[2024-06-28T11:45:59.288] Initialize DatasetContextManager.
Script type = COMMAND
[2024-06-28T11:45:59.318625] Command=mldesigner execute --source train_component.py --name train_image_classification_keras --inputs input_data='' epochs='10' --outputs output_model='DatasetOutputConfig:output_model'
[2024-06-28T11:45:59.319] Enter __enter__ of DatasetContextManager
[2024-06-28T11:45:59.319] SDK version: azureml-core==1.56.0 azureml-dataprep==5.1.6. Session id: e2933bd3-4f39-4c97-bac9-6c6b46edb6b6. Run id: 601231c8-72d4-487c-aaf8-681a627dd511.
[2024-06-28T11:45:59.319] Processing 'output_model'.
[2024-06-28T11:45:59.319] Mode: 'upload'.
[2024-06-28T11:45:59.319] Path on compute is specified: '/tmp/output_model_601231c8-72d4-487c-aaf8-681a627dd511_None'.
[2024-06-28T11:45:59.326] Exit __enter__ of DatasetContextManager
[2024-06-28T11:45:59.327026] Entering Run History Context Manager.
[2024-06-28T11:46:02.520854] Command Working Directory=/azureml-run
[2024-06-28T11:46:02.520897] Starting Linux command : mldesigner execute --source train_component.py --name train_image_classification_keras --inputs input_data='' epochs='10' --outputs output_model='/tmp/output_model_601231c8-72d4-487c-aaf8-681a627dd511_None'
Traceback (most recent call last):
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 362, in list_2_dict
    raise UserErrorException(f"parameter name or value missed, got {item}")
mldesigner._exceptions.UserErrorException: parameter name or value missed, got input_data=

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/bin/mldesigner", line 8, in <module>
    sys.exit(main())
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 347, in main
    _entry(command_args)
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 44, in _entry
    processed_inputs = list_2_dict(args.inputs)
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 367, in list_2_dict
    raise UserErrorException(
mldesigner._exceptions.UserErrorException: Incorrect parameter format: parameter name or value missed, got input_data=. Please make sure command arguments are like '--inputs a=1 b=2' or '--outputs a=path0 b=path1'
[2024-06-28T11:46:02.724375] Command finished with return code 1


[2024-06-28T11:46:02.724503] The experiment failed with exit code: 1. Finalizing run...
[2024-06-28T11:46:02.724524] Start FinalizingInRunHistory
[2024-06-28T11:46:02.724578] Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 8
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.3887791633605957 seconds
[2024-06-28T11:46:05.501] Enter __exit__ of DatasetContextManager
[2024-06-28T11:46:05.501] Uploading output 'output_model'.
[2024-06-28T11:46:05.501] trying to parse datastore uri for asset output with type UriFolder
[2024-06-28T11:46:06.279] Output source path /tmp/output_model_601231c8-72d4-487c-aaf8-681a627dd511_None/ does not exist, ignoring and moving on
[2024-06-28T11:46:07.656] Exit __exit__ of DatasetContextManager
Traceback (most recent call last):
  File "azureml-setup/context_manager_injector.py", line 452, in <module>
    execute_with_context(cm_objects, options.invocation)
  File "azureml-setup/context_manager_injector.py", line 236, in execute_with_context
    process_return_code(signedReturnCode)
  File "azureml-setup/context_manager_injector.py", line 353, in process_return_code
    sys.exit(returnCode)
SystemExit: 1

[2024-06-28T11:46:07.693072] Finished context manager injector with SystemExit exception.

It seems to me that the component inputs (--inputs input_data='$AZUREML_DATAREFERENCE_input_data' in 60_control_log.txt) are not propagated properly to this step (--inputs input_data='' in 70_driver_log.txt).

I have followed the documentation detailed here and the problem persists. Is there some additional environment setup that is missing?

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2024-07-01T08:43:56.25+00:00

@Ilias Paschalis AFAIK DSVM as compute target is supported only for ubuntu and it might have limited capabilities when used with designer since the above error seems to be from UI when you have used it as compute from ML designer. I would recommend using DSVM only for developmental activities and to create an Azure ML managed compute instance for training and inference of pipeline jobs. Thanks!!
Ilias Paschalis 0 Reputation points

2024-07-01T09:35:01.41+00:00

@romungi-MSFT Unfortunately, the lack of spot GPU support in AzureML compute instances makes them not ideal for our use case, which is why I had been looking into using attached VMs for the pipelines.

Your answer

romungi-MSFT 48,911 Reputation points Microsoft Employee Moderator

2024-07-01T08:43:56.25+00:00

@Ilias Paschalis AFAIK DSVM as compute target is supported only for ubuntu and it might have limited capabilities when used with designer since the above error seems to be from UI when you have used it as compute from ML designer. I would recommend using DSVM only for developmental activities and to create an Azure ML managed compute instance for training and inference of pipeline jobs. Thanks!!
Ilias Paschalis 0 Reputation points

2024-07-01T09:35:01.41+00:00

@romungi-MSFT Unfortunately, the lack of spot GPU support in AzureML compute instances makes them not ideal for our use case, which is why I had been looking into using attached VMs for the pipelines.

Share via

Azure ML Pipeline fails using attached data science ubuntu VM

Your answer