Azure ML real time endpoints stuck in 'Transitioning' state
I am trying to deploy Azure ML models as webservice endpoints using an AKS cluster. I have deployed the real time inference pipeline using the Azure ML Studio interface.
The endpoints deploy successfully and quickly reach a "Healthy" state, however occasionally the deployments are stuck in the "Transitioning" state for indefinite time. This disables the testing for the endpoints on the portal and we are unable to consume the webservice for that period of time.
Any idea why this might be happening, or what I can do to fix this?
Is there a limit to the number of endpoints available on an inference cluster ?
@Sumedha Singh Users have reported issues with endpoints getting stuck in transitioning state before and in most cases it might be due to missing configuration or an issue on the AKS cluster that might be causing this where the logs of an individual endpoint deployment can provide more details from the backend or the output+logs tab of the deployment. It is also recommended to report the issue through ML azure portal ml.azure.com from the top right corner, this enables our team to check such deployments and contact customers about resolving them. Please go ahead and provide details of your deployments that are stuck in transitioning so our team can check and respond.
Sign in to comment
Hello @romungi-MSFT , Thank you for your response!
I have followed your instruction and reported the issue on ML Azure portal.
But I am also unable to get any logs, it throws an error when I try to get the deployment logs
@Sumedha Singh Thanks for posting the details from the portal. It looks like there is an issue with your subscription or workspace, you can also share the trace id to our team who can check why the details are not visible from the portal.