Azure ML Compute (instance or cluster) times out mounting blob storage with BFSMountError

Steve Mandras 1 Reputation point
2021-11-24T19:33:31.833+00:00

We've recently spun up an Azure ML environment to do some initial testing of its capabilities. A little background, we have quite a few different other services deployed, all encapsulated in our VNET which has no ingress or egress to the internet, just a VPN GW to our offices. We are leveraging private endpoints and private link capabilities across the board.

We explicitly followed the instructions of setting up a secure ML workspace (https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-create-secure-workspace). We are using private endpoints for the workspace, storage, ACR, KV, and everything ML-related is in its own subnet within our VNET. Compute instances and/or clusters are also deployed to the same subnet.

When we try and run one of the sample designer packages, Automobile Price Prediction, we get the following error whether using a compute instance or a compute cluster:

AzureMLCompute job failed.
BFSMountError: Unable to mount blob fuse file system
Info: Mounting of azureml-blobstore-XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX container from ${storageaccountname} account timed out
Info: Failed to setup runtime for job execution: Job environment preparation failed on ${Compute IP Address} with err exit status 1.

Any ideas or things to look at?

Thanks in advance

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
1,635 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Steve Mandras 1 Reputation point
    2021-12-08T16:41:58.993+00:00

    Hi folks- it turns out that the DNS entry for our private endpoint for blob storage in our ML storage account was misconfigured. Makes sense that the compute was trying to mount the blob storage to the wrong FQDN/IP and timing out. I found this by enabling SSH access to the test compute node, logging in while it was trying to mount, and trying to resolve FQDN/IP, which came back incorrect for some reason. After correcting the private DNS entry for access to our ML blob storage account, everything executed as expected. @romungi-MSFT , I'm not sure this is the bug you mentioned above or not, but thanks again for your help.

    No comments