Azure ML Pipeline Job Status Not Reflecting Failure on Exception

Mallik, Sourav 0 Reputation points
2024-06-10T13:07:32.6933333+00:00

Description: I have encountered an issue where my Azure ML pipeline job status does not reflect a failure even when an exception is raised in one of the steps. The job continues to show as succeeded despite the error. Here are the details of the problem:

Steps to Reproduce:

  1. Create a Python script that raises an exception.
  2. Define a command component in Azure ML SDK v2 that runs the script.
  3. Create a pipeline that includes this component.
  4. Submit the pipeline job and monitor the status.

Expected Behavior: The pipeline job status should reflect a failure when an exception is raised in one of the steps.

Actual Behavior: The pipeline job status shows as succeeded even when an exception is raised.

Logs and Error Messages:


from azureml.core import Run

def main():    
	try:        
		run = Run.get_context() 
		logging.info(f'Geting ML Client using DefaultCredential.')
        ml_client = get_aml_client('cluster')
 
        registered_data_asset = ml_client.data.get(
        name=args.data_asset_name,
        label="latest")     
except Exception as error:
		logging.error(error)
        traceback.print_exc()
        run.fail(error_details=str(error))

Environment Details:

  • Azure ML SDK version: 2
  • Python version: 3.11
  • Compute target: STANDARD_D13_V2

Attempted Solution: I tried to add continue_on_step_failure = False property setting for the job, but still the job shows successful even when the job has an exception. Attached are the screenshot of the job status and the code snippet:

Code Snippet:


from azure.ai.ml.dsl import pipeline
# define a pipeline with component
@pipeline()
def pipeline_for_inferencing_meta_model():
    # pipeline steps definition here
# Submit pipeline job to workspace
pipeline_job = pipeline_for_inferencing_meta_model()
# Create Pipeline Job
pipeline_exp = ml_client.jobs.create_or_update(
    pipeline_job,
    experiment_name='predict-pipeline-job',
    continue_on_step_failure=False,
    force_rerun=True
)

Screenshots: Attach screenshots showing the job status and any other relevant information.
User's image

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,687 questions
{count} votes