how to fail an Azure ML run?

We are using Azure ML for large tests to facilitate testing our code on CUDA in an automated manner. Things work mostly well, but one thing we cannot figure out is how to fail a job such that the job failure
- shows in the UI as Failed (see snapshot),
- gets propagated back to the submitting client (our testing code) such that we can fail the test when the Run has reached failed state.
Here's what we tried:
- Exit the run process with a non-zero status.
- Use the Run instance to send the non-zero exit code and a reason from the VM.
- Try to detect Failed state or reason
When we call the following method:
def report_error(returncode: int):
from azureml.core.run import Run
run = Run.get_context(allow_offline=False)
print(f"Failing the run with return code={returncode}")
run.fail(f"A process returned a non-zero status code {returncode}", error_code=returncode)
exit(returncode)
We can see the exit code in the UI at the top of a failed run, but the run is still marked as Completed.
As a result, we are unable to determine that the job failed from the submitting client.
After:
run.wait_for_completion(show_output=True,
raise_on_error=True)
We tried:
if result['status'] != 'Completed' or (result['details'] is not None and
'A process returned a non-zero status code' in result['details']):
run.fail(error_details=result['details'], error_code=1)
exit(1)
Yet, the return value of this process, communicated to the test client is zero.
Is this a timing issue in obtaining the result details?
What could we do to make sure such jobs actually show as Failed in the UI?
Second case, trying to fail them from the job running on the VM.
This is the code used on the VM to attempt to fail the job:
def report_error(returncode: int):
from azureml.core.run import Run
run = Run.get_context(allow_offline=False)
print(f"Failing the run with return code={returncode}")
run.fail(f"A process returned a non-zero status code {returncode}", error_code=returncode)
exit(returncode)
The reason is updated, but the job does not fail.