Azure ML Pipeline fails using attached data science ubuntu VM

Ilias Paschalis 0 Reputation points
2024-06-28T12:00:05.23+00:00

I'm trying to use an attached Data Science VM as compute for an Azure ML pipeline, and the job fails even though it runs normally when using a compute instance. The same error is reproduced by running this sample using the Data Science VM for the training step, while the CPU compute is a compute instance.

Error screenshot:
User's image

azureml_logs/70_driver_log.txt:

[2024-06-28T11:45:58.539965] Entering context manager injector.
Cannot provide tracer without any exporter configured.
[2024-06-28T11:45:58.990435] context_manager_injector.py Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', 'Dataset:context_managers.Datasets', 'RunHistory:context_managers.RunHistory', 'TrackUserError:context_managers.TrackUserError'], invocation=["mldesigner execute --source train_component.py --name train_image_classification_keras --inputs input_data='' epochs='10' --outputs output_model='DatasetOutputConfig:output_model'"])
[2024-06-28T11:45:59.288] Initialize DatasetContextManager.
Script type = COMMAND
[2024-06-28T11:45:59.318625] Command=mldesigner execute --source train_component.py --name train_image_classification_keras --inputs input_data='' epochs='10' --outputs output_model='DatasetOutputConfig:output_model'
[2024-06-28T11:45:59.319] Enter __enter__ of DatasetContextManager
[2024-06-28T11:45:59.319] SDK version: azureml-core==1.56.0 azureml-dataprep==5.1.6. Session id: e2933bd3-4f39-4c97-bac9-6c6b46edb6b6. Run id: 601231c8-72d4-487c-aaf8-681a627dd511.
[2024-06-28T11:45:59.319] Processing 'output_model'.
[2024-06-28T11:45:59.319] Mode: 'upload'.
[2024-06-28T11:45:59.319] Path on compute is specified: '/tmp/output_model_601231c8-72d4-487c-aaf8-681a627dd511_None'.
[2024-06-28T11:45:59.326] Exit __enter__ of DatasetContextManager
[2024-06-28T11:45:59.327026] Entering Run History Context Manager.
[2024-06-28T11:46:02.520854] Command Working Directory=/azureml-run
[2024-06-28T11:46:02.520897] Starting Linux command : mldesigner execute --source train_component.py --name train_image_classification_keras --inputs input_data='' epochs='10' --outputs output_model='/tmp/output_model_601231c8-72d4-487c-aaf8-681a627dd511_None'
Traceback (most recent call last):
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 362, in list_2_dict
    raise UserErrorException(f"parameter name or value missed, got {item}")
mldesigner._exceptions.UserErrorException: parameter name or value missed, got input_data=

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/bin/mldesigner", line 8, in <module>
    sys.exit(main())
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 347, in main
    _entry(command_args)
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 44, in _entry
    processed_inputs = list_2_dict(args.inputs)
  File "/azureml-envs/azureml_f06c980c516e8eaf5a296ba42ca2c38c/lib/python3.8/site-packages/mldesigner/_cli/mldesigner_commands.py", line 367, in list_2_dict
    raise UserErrorException(
mldesigner._exceptions.UserErrorException: Incorrect parameter format: parameter name or value missed, got input_data=. Please make sure command arguments are like '--inputs a=1 b=2' or '--outputs a=path0 b=path1'
[2024-06-28T11:46:02.724375] Command finished with return code 1


[2024-06-28T11:46:02.724503] The experiment failed with exit code: 1. Finalizing run...
[2024-06-28T11:46:02.724524] Start FinalizingInRunHistory
[2024-06-28T11:46:02.724578] Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 8
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.3887791633605957 seconds
[2024-06-28T11:46:05.501] Enter __exit__ of DatasetContextManager
[2024-06-28T11:46:05.501] Uploading output 'output_model'.
[2024-06-28T11:46:05.501] trying to parse datastore uri for asset output with type UriFolder
[2024-06-28T11:46:06.279] Output source path /tmp/output_model_601231c8-72d4-487c-aaf8-681a627dd511_None/ does not exist, ignoring and moving on
[2024-06-28T11:46:07.656] Exit __exit__ of DatasetContextManager
Traceback (most recent call last):
  File "azureml-setup/context_manager_injector.py", line 452, in <module>
    execute_with_context(cm_objects, options.invocation)
  File "azureml-setup/context_manager_injector.py", line 236, in execute_with_context
    process_return_code(signedReturnCode)
  File "azureml-setup/context_manager_injector.py", line 353, in process_return_code
    sys.exit(returnCode)
SystemExit: 1

[2024-06-28T11:46:07.693072] Finished context manager injector with SystemExit exception.

It seems to me that the component inputs (--inputs input_data='$AZUREML_DATAREFERENCE_input_data' in 60_control_log.txt) are not propagated properly to this step (--inputs input_data='' in 70_driver_log.txt).

I have followed the documentation detailed here and the problem persists. Is there some additional environment setup that is missing?

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,682 questions
Azure Data Science Virtual Machines
Azure Data Science Virtual Machines
Azure Virtual Machine images that are pre-installed, configured, and tested with several commonly used tools for data analytics, machine learning, and artificial intelligence training.
68 questions
{count} votes