As far as I can tell, you're logging the metric immediately after submitting the pipeline, before it finishes execution, and possibly from a different process or context than the one that receives the output or computes the metric. AzureML pipeline jobs run asynchronously, so your code doesn’t wait to actually collect or process the outputs before logging.
You can try to fix this in one of two ways
Option 1: Post-process results after pipeline execution completes (from same script)
- Wait for the job to finish (
ml_client.jobs.stream(...)
only streams logs, doesn’t block execution) - Extract outputs from the pipeline job
- Process the output (e.g., run aggregation)
- Log the metric into MLflow parent run
Replace the end of your script like this:
# Submit and wait for job to finish
submitted = ml_client.jobs.create_or_update(
pipeline_job,
experiment_name=experiment_name,
tags={"mlflow.parentRunId": parent_run.info.run_id},
)
print(f" Submitted pipeline job: {submitted.name}")
# Stream logs (optional)
ml_client.jobs.stream(submitted.name)
# Wait until job completes
from azure.ai.ml.entities import JobStatus
import time
# Poll status
while submitted.status not in [JobStatus.COMPLETED, JobStatus.FAILED, JobStatus.CANCELED]:
time.sleep(10)
submitted = ml_client.jobs.get(name=submitted.name)
# If completed, do post-processing
if submitted.status == JobStatus.COMPLETED:
# Example: Retrieve outputs from pipeline
output_path = submitted.outputs["<your_output_name>"].uri # replace with actual output name
# Run aggregation logic here (e.g., read CSV, compute average)
# For now, simulate metric computation
avg_score = 2.0 # placeholder
# Log metric to parent MLflow run
mlflow.set_tracking_uri(tracking_uri)
mlflow.set_experiment(experiment_name)
mlflow.start_run(run_id=parent_run.info.run_id)
mlflow.log_metric("Avg", avg_score)
mlflow.end_run()
else:
print(f"❌ Pipeline job failed with status: {submitted.status}")
mlflow.end_run(status="FAILED")
Replace "<your_output_name>"
with the actual output name in the flow_component
if it returns outputs (e.g., a metrics JSON, scores CSV, etc.)
Option 2: Use a final pipeline step to aggregate + log inside the pipeline
You can define a final component step that:
- Accepts all the required intermediate outputs,
- Aggregates them,
- Logs metrics to MLflow using
mlflow.log_metric(...)
(which will log in the pipeline context), - And ensure that MLflow is configured properly inside that step to log into the parent run.
However, metrics logged from inside the pipeline will show up on the pipeline step run, not the parent Python script MLflow run.
If the above response helps answer your question, remember to "Accept Answer" so that others in the community facing similar issues can easily find the solution. Your contribution is highly appreciated.
hth
Marcin