Submitted script failed with a non-zero exit code; see the driver log file for details.\n\tReason: Job failed with non-zero exit Code",

Suresh Guntapalli 1 Reputation point
2021-04-28T14:50:54.417+00:00

Hi All,
I am trying to creating batch inference of my pretrained churn classification model. I was following this github of iris batch inference 1: https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/machine-learning-pipelines/parallel-run/tabular-dataset-inference-iris.ipynb .
But I am getting error , please help me how can I fix this error.

Here my code:
92040-screenshot-1.png

92147-screenshot-2.png92165-screenshot-3.png

Here my errors:

========================================================================================================================  
2  
. Please ignore this if the GPUs don't utilize NVIDIA® NVLink® switches.  
2021-04-28T12:53:39Z Starting output-watcher...  
2021-04-28T12:53:39Z IsDedicatedCompute == False, starting polling for Low-Pri Preemption  
2021-04-28T12:53:39Z Executing 'Copy ACR Details file' on 10.0.0.4  
2021-04-28T12:53:39Z Copy ACR Details file succeeded on 10.0.0.4. Output:   
>>>     
>>>     
Login Succeeded  
Using default tag: latest  
latest: Pulling from azureml/azureml_af590fdfaae8ba3ead1eba5ea12b0fb3  
4007a89234b4: Pulling fs layer  
5dfa26c6b9c9: Pulling fs layer  
0ba7bf18aa40: Pulling fs layer  
4c6ec688ebe3: Pulling fs layer  
574f361512d6: Pulling fs layer  
db4d1e2d7079: Pulling fs layer  
e544ee0f522d: Pulling fs layer  
c655136086be: Pulling fs layer  
2ec37f44090c: Pulling fs layer  
5fba3bd4a2c4: Pulling fs layer  
7e0ea9d0a1ab: Pulling fs layer  
da005f826951: Pulling fs layer  
6e842608b724: Pulling fs layer  
6b1a4187f1d0: Pulling fs layer  
db4d1e2d7079: Waiting  
c763bae43813: Pulling fs layer  
490d7c37a7d7: Pulling fs layer  
791bb1082f38: Pulling fs layer  
e544ee0f522d: Waiting  
e863af755720: Pulling fs layer  
c655136086be: Waiting  
4c6ec688ebe3: Waiting  
0cb6e30b3f1c: Pulling fs layer  
88468e3f4c2c: Pulling fs layer  
77d6ac8c0bf7: Pulling fs layer  
574f361512d6: Waiting  
2ec37f44090c: Waiting  
da005f826951: Waiting  
5fba3bd4a2c4: Waiting  
6e842608b724: Waiting  
6b1a4187f1d0: Waiting  
c763bae43813: Waiting  
490d7c37a7d7: Waiting  
791bb1082f38: Waiting  
e863af755720: Waiting  
0cb6e30b3f1c: Waiting  
88468e3f4c2c: Waiting  
77d6ac8c0bf7: Waiting  
7e0ea9d0a1ab: Waiting  
0ba7bf18aa40: Verifying Checksum  
0ba7bf18aa40: Download complete  
5dfa26c6b9c9: Verifying Checksum  
5dfa26c6b9c9: Download complete  
4c6ec688ebe3: Verifying Checksum  
4c6ec688ebe3: Download complete  
4007a89234b4: Download complete  
db4d1e2d7079: Verifying Checksum  
db4d1e2d7079: Download complete  
e544ee0f522d: Verifying Checksum  
e544ee0f522d: Download complete  
574f361512d6: Verifying Checksum  
574f361512d6: Download complete  
4007a89234b4: Pull complete  
5dfa26c6b9c9: Pull complete  
0ba7bf18aa40: Pull complete  
4c6ec688ebe3: Pull complete  
5fba3bd4a2c4: Download complete  
c655136086be: Verifying Checksum  
c655136086be: Download complete  
7e0ea9d0a1ab: Verifying Checksum  
7e0ea9d0a1ab: Download complete  
da005f826951: Verifying Checksum  
da005f826951: Download complete  
6e842608b724: Download complete  
6b1a4187f1d0: Download complete  
c763bae43813: Verifying Checksum  
c763bae43813: Download complete  
2ec37f44090c: Verifying Checksum  
2ec37f44090c: Download complete  
490d7c37a7d7: Verifying Checksum  
490d7c37a7d7: Download complete  
0cb6e30b3f1c: Verifying Checksum  
0cb6e30b3f1c: Download complete  
e863af755720: Verifying Checksum  
e863af755720: Download complete  
77d6ac8c0bf7: Verifying Checksum  
77d6ac8c0bf7: Download complete  
88468e3f4c2c: Verifying Checksum  
88468e3f4c2c: Download complete  
574f361512d6: Pull complete  
db4d1e2d7079: Pull complete  
e544ee0f522d: Pull complete  
791bb1082f38: Verifying Checksum  
791bb1082f38: Download complete  
c655136086be: Pull complete  
2ec37f44090c: Pull complete  
5fba3bd4a2c4: Pull complete  
7e0ea9d0a1ab: Pull complete  
da005f826951: Pull complete  
6e842608b724: Pull complete  
6b1a4187f1d0: Pull complete  
c763bae43813: Pull complete  
490d7c37a7d7: Pull complete  
  
Streaming azureml-logs/65_job_prep-tvmps_287cfab3497943a39d90c089311555c3223ca350d504acc72af6aceb3d957ba3_p.txt  
===============================================================================================================  
[2021-04-28T12:54:05.020376] Entering job preparation.  
[2021-04-28T12:54:08.337333] Starting job preparation.  
[2021-04-28T12:54:08.337375] Extracting the control code.  
[2021-04-28T12:54:08.365360] fetching and extracting the control code on master node.  
[2021-04-28T12:54:08.365417] Starting extract_project.  
[2021-04-28T12:54:08.365467] Starting to extract zip file.  
[2021-04-28T12:54:09.302078] Finished extracting zip file.  
[2021-04-28T12:54:09.804262] Using urllib.request Python 3.0 or later  
[2021-04-28T12:54:09.804327] Start fetching snapshots.  
[2021-04-28T12:54:09.804373] Start fetching snapshot.  
[2021-04-28T12:54:09.804391] Retrieving project from snapshot: f4a38de4-3230-4038-ac4b-cde33bdd63e5  
Starting the daemon thread to refresh tokens in background for process with pid = 51  
[2021-04-28T12:54:10.714200] Finished fetching snapshot.  
[2021-04-28T12:54:10.714233] Start fetching snapshot.  
[2021-04-28T12:54:10.714251] Retrieving project from snapshot: b71de588-0f3c-44ae-b144-ea24a905546e  
[2021-04-28T12:54:24.343681] Finished fetching snapshot.  
[2021-04-28T12:54:24.343714] Finished fetching snapshots.  
[2021-04-28T12:54:24.343728] Finished extract_project.  
[2021-04-28T12:54:24.360941] Finished fetching and extracting the control code.  
[2021-04-28T12:54:24.364330] downloadDataStore - Download from datastores if requested.  
[2021-04-28T12:54:24.365371] Start run_history_prep.  
[2021-04-28T12:54:24.436823] Entering context manager injector.  
Acquired lockfile /tmp/a1c4fded-7336-4024-8c9e-fed19f5d1b37-datastore.lock to downloading input data references  
[2021-04-28T12:54:24.903804] downloadDataStore completed  
[2021-04-28T12:54:24.906597] Job preparation is complete.  
  
Streaming azureml-logs/70_driver_log.txt  
========================================  
2021/04/28 12:54:26 Starting App Insight Logger for task:  runTaskLet  
2021/04/28 12:54:26 Attempt 1 of http call to http://10.0.0.4:16384/sendlogstoartifacts/info  
2021/04/28 12:54:26 Attempt 1 of http call to http://10.0.0.4:16384/sendlogstoartifacts/status  
[2021-04-28T12:54:27.564276] Entering context manager injector.  
[context_manager_injector.py] Command line Options: Namespace(inject=['ProjectPythonPath:context_managers.ProjectPythonPath', '  
2021/04/28 12:54:31 Not exporting to RunHistory as the exporter is either stopped or there is no data.  
Stopped: false  
OriginalData: 1  
FilteredData: 0.  
  
Streaming azureml-logs/75_job_post-tvmps_287cfab3497943a39d90c089311555c3223ca350d504acc72af6aceb3d957ba3_p.txt  
===============================================================================================================  
[2021-04-28T13:02:20.275818] Entering job release  
[2021-04-28T13:02:21.348190] Starting job release  
[2021-04-28T13:02:21.348739] Logging experiment finalizing status in history service.  
Starting the daemon thread to refresh tokens in background for process with pid = 1369  
[2021-04-28T13:02:21.349418] job release stage : upload_datastore starting...  
[2021-04-28T13:02:21.349812] job release stage : start importing azureml.history._tracking in run_history_release.  
[2021-04-28T13:02:21.352029] job release stage : copy_batchai_cached_logs starting...  
[2021-04-28T13:02:21.352142] job release stage : execute_job_release starting...  
[2021-04-28T13:02:21.357651] job release stage : copy_batchai_cached_logs completed...  
[2021-04-28T13:02:21.358513] Entering context manager injector.  
[2021-04-28T13:02:21.372410] job release stage : upload_datastore completed...  
[2021-04-28T13:02:21.595288] job release stage : execute_job_release completed...  
[2021-04-28T13:02:21.628735] job release stage : send_run_telemetry starting...  
[2021-04-28T13:02:21.849387] get vm size and vm region successfully.  
[2021-04-28T13:02:22.175695] get compute meta data successfully.  
[2021-04-28T13:02:22.444070] post artifact meta request successfully.  
[2021-04-28T13:02:22.471466] upload compute record artifact successfully.  
[2021-04-28T13:02:22.471531] job release stage : send_run_telemetry completed...  
[2021-04-28T13:02:22.471747] Job release is complete  
  
StepRun(batch-score) Execution Summary  
=======================================  
StepRun( batch-score ) Status: Failed  
---------------------------------------------------------------------------  
ActivityFailedException                   Traceback (most recent call last)  
<ipython-input-30-49d7d34a142d> in <module>  
      3 # Run the pipeline as an experiment  
      4 pipeline_run = Experiment(ws, 'batc-prediction_pipeline').submit(pipeline)  
----> 5 pipeline_run.wait_for_completion(show_output=True)  
  
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/run.py in wait_for_completion(self, show_output, timeout_seconds, raise_on_error)  
    293                             try:  
    294                                 step_run.wait_for_completion(timeout_seconds=timeout_seconds - time_elapsed,  
--> 295                                                              raise_on_error=raise_on_error)  
    296                             except TypeError as e:  
    297                                 # If there are package conflicts in the user's environment, the run rehydration  
  
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/run.py in wait_for_completion(self, show_output, timeout_seconds, raise_on_error)  
    735             try:  
    736                 return self._stream_run_output(timeout_seconds=timeout_seconds,  
--> 737                                                raise_on_error=raise_on_error)  
    738             except KeyboardInterrupt:  
    739                 error_message = "The output streaming for the run interrupted.\n" \  
  
/anaconda/envs/azureml_py36/lib/python3.6/site-packages/azureml/pipeline/core/run.py in _stream_run_output(self, timeout_seconds, raise_on_error)  
    823             print(json.dumps(error, indent=4))  
    824         if error and raise_on_error:  
--> 825             raise ActivityFailedException(error_details=json.dumps(error, indent=4))  
    826   
    827         print(final_details)  
  
ActivityFailedException: ActivityFailedException:  
 Message: Activity Failed:  
{  
    "error": {  
        "code": "UserError",  
        "message": "AzureMLCompute job failed.\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\n\tReason: Job failed with non-zero exit Code",  
        "messageFormat": "{Message}",  
        "messageParameters": {  
            "Message": "AzureMLCompute job failed.\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\n\tReason: Job failed with non-zero exit Code"  
        },  
        "details": [],  
        "innerError": {  
            "code": "UserTrainingScriptFailed"  
        }  
    },  
    "correlation": {  
        "operation": null,  
        "request": "6833f86b6a0c0af1"  
    },  
    "environment": "eastus",  
    "location": "eastus",  
    "time": "2021-04-28T13:02:41.490064Z",  
    "componentName": "execution-worker"  
}  
 InnerException None  
 ErrorResponse   
{  
    "error": {  
        "message": "Activity Failed:\n{\n    \"error\": {\n        \"code\": \"UserError\",\n        \"message\": \"AzureMLCompute job failed.\\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\\n\\tReason: Job failed with non-zero exit Code\",\n        \"messageFormat\": \"{Message}\",\n        \"messageParameters\": {\n            \"Message\": \"AzureMLCompute job failed.\\nJobFailed: Submitted script failed with a non-zero exit code; see the driver log file for details.\\n\\tReason: Job failed with non-zero exit Code\"\n        },\n        \"details\": [],\n        \"innerError\": {\n            \"code\": \"UserTrainingScriptFailed\"\n        }\n    },\n    \"correlation\": {\n        \"operation\": null,\n        \"request\": \"6833f86b6a0c0af1\"\n    },\n    \"environment\": \"eastus\",\n    \"location\": \"eastus\",\n    \"time\": \"2021-04-28T13:02:41.490064Z\",\n    \"componentName\": \"execution-worker\"\n}"  
    }  
}  
  
​  
  
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,563 questions
{count} votes