Submitted script failed with a non-zero exit code; see the driver log file for details.\n\tReason: Job failed with non-zero exit Code",
Suresh Guntapalli
1
Reputation point
Hi All,
I am trying to creating batch inference of my pretrained churn classification model. I was following this github of iris batch inference 1: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.ipynb .
But I am getting error , please help me how can I fix this error.
Here my code:
Here my errors:
========================================================================================================================
2
. Please ignore this if the GPUs don't utilize NVIDIA® NVLink® switches.
2021-04-28T12:53:39Z Starting output-watcher...
2021-04-28T12:53:39Z IsDedicatedCompute == False, starting polling for Low-Pri Preemption
2021-04-28T12:53:39Z Executing 'Copy ACR Details file' on 10.0.0.4
2021-04-28T12:53:39Z Copy ACR Details file succeeded on 10.0.0.4. Output:
>>>
>>>
Login Succeeded
Using default tag: latest
latest: Pulling from azureml/azureml_af590fdfaae8ba3ead1eba5ea12b0fb3
4007a89234b4: Pulling fs layer
5dfa26c6b9c9: Pulling fs layer
0ba7bf18aa40: Pulling fs layer
4c6ec688ebe3: Pulling fs layer
574f361512d6: Pulling fs layer
db4d1e2d7079: Pulling fs layer
e544ee0f522d: Pulling fs layer
c655136086be: Pulling fs layer
2ec37f44090c: Pulling fs layer
5fba3bd4a2c4: Pulling fs layer
7e0ea9d0a1ab: Pulling fs layer
da005f826951: Pulling fs layer
6e842608b724: Pulling fs layer
6b1a4187f1d0: Pulling fs layer
db4d1e2d7079: Waiting
c763bae43813: Pulling fs layer
490d7c37a7d7: Pulling fs layer
791bb1082f38: Pulling fs layer
e544ee0f522d: Waiting
e863af755720: Pulling fs layer
c655136086be: Waiting
4c6ec688ebe3: Waiting
0cb6e30b3f1c: Pulling fs layer
88468e3f4c2c: Pulling fs layer
77d6ac8c0bf7: Pulling fs layer
574f361512d6: Waiting
2ec37f44090c: Waiting
da005f826951: Waiting
5fba3bd4a2c4: Waiting
6e842608b724: Waiting
6b1a4187f1d0: Waiting
c763bae43813: Waiting
490d7c37a7d7: Waiting
791bb1082f38: Waiting
e863af755720: Waiting
0cb6e30b3f1c: Waiting
88468e3f4c2c: Waiting
77d6ac8c0bf7: Waiting
7e0ea9d0a1ab: Waiting
0ba7bf18aa40: Verifying Checksum
0ba7bf18aa40: Download complete
5dfa26c6b9c9: Verifying Checksum
5dfa26c6b9c9: Download complete
4c6ec688ebe3: Verifying Checksum
4c6ec688ebe3: Download complete
4007a89234b4: Download complete
db4d1e2d7079: Verifying Checksum
db4d1e2d7079: Download complete
e544ee0f522d: Verifying Checksum
e544ee0f522d: Download complete
574f361512d6: Verifying Checksum
574f361512d6: Download complete
4007a89234b4: Pull complete
5dfa26c6b9c9: Pull complete
0ba7bf18aa40: Pull complete
4c6ec688ebe3: Pull complete
5fba3bd4a2c4: Download complete
c655136086be: Verifying Checksum
c655136086be: Download complete
7e0ea9d0a1ab: Verifying Checksum
7e0ea9d0a1ab: Download complete
da005f826951: Verifying Checksum
da005f826951: Download complete
6e842608b724: Download complete
6b1a4187f1d0: Download complete
c763bae43813: Verifying Checksum
c763bae43813: Download complete
2ec37f44090c: Verifying Checksum
2ec37f44090c: Download complete
490d7c37a7d7: Verifying Checksum
490d7c37a7d7: Download complete
0cb6e30b3f1c: Verifying Checksum
0cb6e30b3f1c: Download complete
e863af755720: Verifying Checksum
e863af755720: Download complete
77d6ac8c0bf7: Verifying Checksum
77d6ac8c0bf7: Download complete
88468e3f4c2c: Verifying Checksum
88468e3f4c2c: Download complete
574f361512d6: Pull complete
db4d1e2d7079: Pull complete
e544ee0f522d: Pull complete
791bb1082f38: Verifying Checksum
791bb1082f38: Download complete
c655136086be: Pull complete
2ec37f44090c: Pull complete
5fba3bd4a2c4: Pull complete
7e0ea9d0a1ab: Pull complete
da005f826951: Pull complete
6e842608b724: Pull complete
6b1a4187f1d0: Pull complete
c763bae43813: Pull complete
490d7c37a7d7: Pull complete
Streaming azureml-logs/65_job_prep-tvmps_287cfab3497943a39d90c089311555c3223ca350d504acc72af6aceb3d957ba3_p.txt
===============================================================================================================
[2021-04-28T12:54:05.020376] Entering job preparation.
[2021-04-28T12:54:08.337333] Starting job preparation.
[2021-04-28T12:54:08.337375] Extracting the control code.
[2021-04-28T12:54:08.365360] fetching and extracting the control code on master node.
[2021-04-28T12:54:08.365417] Starting extract_project.
[2021-04-28T12:54:08.365467] Starting to extract zip file.
[2021-04-28T12:54:09.302078] Finished extracting zip file.
[2021-04-28T12:54:09.804262] Using urllib.request Python 3.0 or later
[2021-04-28T12:54:09.804327] Start fetching snapshots.
[2021-04-28T12:54:09.804373] Start fetching snapshot.
[2021-04-28T12:54:09.804391] Retrieving project from snapshot: f4a38de4-3230-4038-ac4b-cde33bdd63e5
Starting the daemon thread to refresh tokens in background for process with pid = 51
[2021-04-28T12:54:10.714200] Finished fetching snapshot.
[2021-04-28T12:54:10.714233] Start fetching snapshot.
[2021-04-28T12:54:10.714251] Retrieving project from snapshot: b71de588-0f3c-44ae-b144-ea24a905546e
[2021-04-28T12:54:24.343681] Finished fetching snapshot.
[2021-04-28T12:54:24.343714] Finished fetching snapshots.
[2021-04-28T12:54:24.343728] Finished extract_project.
[2021-04-28T12:54:24.360941] Finished fetching and extracting the control code.
[2021-04-28T12:54:24.364330] downloadDataStore - Download from datastores if requested.
[2021-04-28T12:54:24.365371] Start run_history_prep.
[2021-04-28T12:54:24.436823] Entering context manager injector.
Acquired lockfile /tmp/a1c4fded-7336-4024-8c9e-fed19f5d1b37-datastore.lock to downloading input data references
[2021-04-28T12:54:24.903804] downloadDataStore completed
[2021-04-28T12:54:24.906597] Job preparation is complete.
Streaming azureml-logs/70_driver_log.txt
========================================
2021/04/28 12:54:26 Starting App Insight Logger for task: runTaskLet
2021/04/28 12:54:26 Attempt 1 of http call to http://10.0.0.4:16384/sendlogstoartifacts/info
2021/04/28 12:54:26 Attempt 1 of http call to http://10.0.0.4:16384/sendlogstoartifacts/status
[2021-04-28T12:54:27.564276] Entering context manager injector.
[context_manager_injector.py] Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', '
2021/04/28 12:54:31 Not exporting to RunHistory as the exporter is either stopped or there is no data.
Stopped: false
OriginalData: 1
FilteredData: 0.
Streaming azureml-logs/75_job_post-tvmps_287cfab3497943a39d90c089311555c3223ca350d504acc72af6aceb3d957ba3_p.txt
===============================================================================================================
[2021-04-28T13:02:20.275818] Entering job release
[2021-04-28T13:02:21.348190] Starting job release
[2021-04-28T13:02:21.348739] Logging experiment finalizing status in history service.
Starting the daemon thread to refresh tokens in background for process with pid = 1369
[2021-04-28T13:02:21.349418] job release stage : upload_datastore starting...
[2021-04-28T13:02:21.349812] job release stage : start importing azureml.history._tracking in run_history_release.
[2021-04-28T13:02:21.352029] job release stage : copy_batchai_cached_logs starting...
[2021-04-28T13:02:21.352142] job release stage : execute_job_release starting...
[2021-04-28T13:02:21.357651] job release stage : copy_batchai_cached_logs completed...
[2021-04-28T13:02:21.358513] Entering context manager injector.
[2021-04-28T13:02:21.372410] job release stage : upload_datastore completed...
[2021-04-28T13:02:21.595288] job release stage : execute_job_release completed...
[2021-04-28T13:02:21.628735] job release stage : send_run_telemetry starting...
[2021-04-28T13:02:21.849387] get vm size and vm region successfully.
[2021-04-28T13:02:22.175695] get compute meta data successfully.
[2021-04-28T13:02:22.444070] post artifact meta request successfully.
[2021-04-28T13:02:22.471466] upload compute record artifact successfully.
[2021-04-28T13:02:22.471531] job release stage : send_run_telemetry completed...
[2021-04-28T13:02:22.471747] Job release is complete
StepRun(batch-score) Execution Summary
=======================================
StepRun( batch-score ) Status: Failed
---------------------------------------------------------------------------
ActivityFailedException Traceback (most recent call last)
<ipython-input-30-49d7d34a142d> in <module>
3 # Run the pipeline as an experiment
4 pipeline_run = Experiment(ws, 'batc-prediction_pipeline').submit(pipeline)
----> 5 pipeline_run.wait_for_completion(show_output=True)
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/run.py in wait_for_completion(self, show_output, timeout_seconds, raise_on_error)
293 try:
294 step_run.wait_for_completion(timeout_seconds=timeout_seconds - time_elapsed,
--> 295 raise_on_error=raise_on_error)
296 except TypeError as e:
297 # If there are package conflicts in the user's environment, the run rehydration
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/run.py in wait_for_completion(self, show_output, timeout_seconds, raise_on_error)
735 try:
736 return self._stream_run_output(timeout_seconds=timeout_seconds,
--> 737 raise_on_error=raise_on_error)
738 except KeyboardInterrupt:
739 error_message = "The output streaming for the run interrupted.\n" \
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/run.py in _stream_run_output(self, timeout_seconds, raise_on_error)
823 print(json.dumps(error, indent=4))
824 if error and raise_on_error:
--> 825 raise ActivityFailedException(error_details=json.dumps(error, indent=4))
826
827 print(final_details)
ActivityFailedException: ActivityFailedException:
Message: Activity Failed:
{
"error": {
"code": "UserError",
"message": "AzureMLCompute job failed.\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\n\tReason: Job failed with non-zero exit Code",
"messageFormat": "{Message}",
"messageParameters": {
"Message": "AzureMLCompute job failed.\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\n\tReason: Job failed with non-zero exit Code"
},
"details": [],
"innerError": {
"code": "UserTrainingScriptFailed"
}
},
"correlation": {
"operation": null,
"request": "6833f86b6a0c0af1"
},
"environment": "eastus",
"location": "eastus",
"time": "2021-04-28T13:02:41.490064Z",
"componentName": "execution-worker"
}
InnerException None
ErrorResponse
{
"error": {
"message": "Activity Failed:\n{\n \"error\": {\n \"code\": \"UserError\",\n \"message\": \"AzureMLCompute job failed.\\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\\n\\tReason: Job failed with non-zero exit Code\",\n \"messageFormat\": \"{Message}\",\n \"messageParameters\": {\n \"Message\": \"AzureMLCompute job failed.\\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\\n\\tReason: Job failed with non-zero exit Code\"\n },\n \"details\": [],\n \"innerError\": {\n \"code\": \"UserTrainingScriptFailed\"\n }\n },\n \"correlation\": {\n \"operation\": null,\n \"request\": \"6833f86b6a0c0af1\"\n },\n \"environment\": \"eastus\",\n \"location\": \"eastus\",\n \"time\": \"2021-04-28T13:02:41.490064Z\",\n \"componentName\": \"execution-worker\"\n}"
}
}