We are getting frequent timeouts on mlflow on machine learning

Stéphane Renou 0 Reputation points
2025-04-23T15:13:09.16+00:00

On frequent basis, our code is getting timed out after 2 minutes when trying to get status of a ML flow job.

A recent example is showing the response as:

Retrying (Retry(total=6, connect=7, read=6, redirect=7, status=7)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='francecentral.api.azureml.ms', port=443): Read timed out. (read timeout=120)")': /mlflow/v2.0/subscriptions/28377576-89a8-4e51-8209-45265b06caee/resourceGroups/rg-prod-sales-azure/providers/Microsoft.MachineLearningServices/workspaces/mlw-sales-azure-prod-lc/api/2.0/mlflow/runs/search

we are making the call from the compute:

/subscriptions/28377576-89a8-4e51-8209-45265b06caee/resourceGroups/RG-PROD-MAIN/providers/Microsoft.Compute/virtualMachines/LearnVM
/subscriptions/28377576-89a8-4e51-8209-45265b06caee/resourceGroups/RG-PROD-MAIN/providers/Microsoft.Compute/virtualMachines/LearnVM

The call fails at:

/usr/local/lib/python3.11/site-packages/urllib3/connectionpool.py

while doing: urlopen

We are using mlflow library in python and we don't understand why on frequent basis, those calls are timing out after 2 minutes while all our infrastructure is located next to each other.

We are using libraries:
azure-ai-ml version 1.26.0

azureml-mlflow version 1.59.0.post1
mlflow-skinny version 2.21.1

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,268 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.