Training a TensorFlow model in Azure ML

Ali Davoudian 61 Reputation points
2022-04-07T21:32:04.903+00:00

I am following the link below for training a TensorFlow model in Azure ML:

https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/ml-frameworks/tensorflow/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb

However, as my training dataset is in a container named "sample-datasets" in ADLS Gen2, I changed the following code (in the above link) to refer to the paths in my data lake. So I replaced code A (in the link above) with code B (my code)

Code A:

urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images-idx3-ubyte.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz',
filename=os.path.join(data_folder, 'train-labels-idx1-ubyte.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 't10k-images-idx3-ubyte.gz'))
urllib.request.urlretrieve('https://azureopendatastorage.blob.core.windows.net/mnist/t10k-labels-idx1-ubyte.gz',
filename=os.path.join(data_folder, 't10k-labels-idx1-ubyte.gz'))

Code B:

from azureml.core.dataset import Dataset
urllib.request.urlretrieve('https://lakehousestgenrichedzone.dfs.core.windows.net/sample-datasets/train-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'train-images-idx3-ubyte.gz'))
urllib.request.urlretrieve('https://lakehousestgenrichedzone.dfs.core.windows.net/sample-datasets/train-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'train-labels-idx1-ubyte.gz'))
urllib.request.urlretrieve('https://lakehousestgenrichedzone.dfs.core.windows.net/sample-datasets/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 't10k-images-idx3-ubyte.gz'))
urllib.request.urlretrieve('https://lakehousestgenrichedzone.dfs.core.windows.net/sample-datasets/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 't10k-labels-idx1-ubyte.gz'))

But I receive the following error:

HTTPError: HTTP Error 401: Server failed to authenticate the request. Please refer to the information in the www-authenticate header.

Can you please let me know how I can train the model using my data which are stored in the data lake? More precisely, how my Python code can copy the training dataset from my data lake into data_folder?

PS: Please note that I have already granted the Blob Storage data Contributor role on my data lake storage account to my Azure ML workspace as a managed identity.

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,398 questions
Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,666 questions
{count} votes

Accepted answer
  1. romungi-MSFT 43,451 Reputation points Microsoft Employee
    2022-04-08T12:21:38.973+00:00

    anonymous user I have not worked on ADLS scenarios with Azure ML but I have added the ADLS tag to this thread for others to chip in and add their views.

    Based on the documentation for ADLS REST API it supports Azure Active Directory (Azure AD), Shared Key, and shared access signature (SAS) authorization with the APIs that are available to download the files from its storage. So, I think a direct download might not work in this case without authentication.

    I think the easiest way to get your files locally from ADLS is to use the python SDK to authenticate using account key or AD as listed here.

    If you have many files that needs to be downloaded and referenced in your ML experiments then you may also consider to use the import data module of designer for designer experiments or register them as dataset from dataset tab of ml.azure.com which can also be referenced using the Azure ML SDK.

    If an answer is helpful, please click on 130616-image.png or upvote 130671-image.png which might help other community members reading this thread.

    0 comments No comments

1 additional answer

Sort by: Most helpful
  1. Ali Davoudian 61 Reputation points
    2022-04-19T15:42:44.427+00:00

    I solved the problem by assigning an user-assigned managed identity to the target compute to access my ASDLS Gen2

    0 comments No comments