Share via

Submission of image segmentation jobs to existing compute cluster all of a sudden not working in Azure ML Workspace

Minh Tran 5 Reputation points
2025-07-06T18:09:33.2733333+00:00

Existing python azure ml sdk code to submit jobs to a compute cluster all of a sudden stopped working. Python code prints out following error:

HttpResponseError: (UserError) The AutoMLJob input is invalid. Compute gpu-cluster not found in workspace my-workspace-name

This compute cluster - gpu-cluster does exist and I have verified that it is in a success provisioned state. The last time I was able to submit jobs to this cluster with the same code was on June 25, 2025.

When I run az ml compute list, it shows the cluster. So not sure what is causing this issue?

Azure Machine Learning

2 answers

Sort by: Most helpful
  1. Minh Tran 5 Reputation points
    2025-07-07T12:03:33.47+00:00

    This was caused by an errant deployment by Azure team. It was resolved by user who reported the same issue previously mentioned by submitting a support ticket that resulted in rollback of errant code.

    https://learn.microsoft.com/en-us/answers/questions/2338361/error-submit-the-automl-job?comment=question&translated=false#newest-question-comment

    Was this answer helpful?

    0 comments No comments

  2. Amira Bedhiafi 42,046 Reputation points MVP Volunteer Moderator
    2025-07-06T20:23:57.81+00:00

    Hello !

    Thank you for posting on Microsoft Learn.

    Have you checked if that the workspace object in your Python code points to the correct subscription, resource group, and workspace name ?

    Try to print the workspace details in your code:

    print(ws.name, ws.location, ws.resource_group)
    

    You might be authenticating in a different Azure subscription or workspace than where gpu-cluster exists.

    There could have been a breaking change in the azureml-sdk or azure-ai-ml version if you recently upgraded or if a managed compute environment updated.

    Try to check your current version:

    pip show azure-ai-ml
    pip show azureml-sdk
    

    If using azure-ai-ml, verify if you're using a compatible version (>=1.15.0) and match the API behavior with:

    from azure.ai.ml import MLClient
    

    If the above doesn’t help, try re-creating the compute cluster with a different name and update your code to reference the new cluster. This can help isolate the issue to a potential internal registration bug.

    If your issue is still persisting, raise a support ticket with Azure (include your subscription ID, workspace name, cluster name, and timestamp of error).

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.