Accessing files with the URI_FOLDER type data-assets

Vishal Mahajan 35 Reputation points
2024-02-05T13:33:12.15+00:00

I have come across the following example at https://learn.microsoft.com/en-us/training/modules/make-data-available-azure-machine-learning/4-create-data-asset

import argparse 
import glob 
import pandas as pd 
parser = argparse.ArgumentParser() 
parser.add_argument("--input_data", type=str) 
args = parser.parse_args() 
data_path = args.input_data 
all_files = glob.glob(data_path + "/*.csv") 
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)

For example, if I create the data-asset and try to list the files in the directory using os, glob package:

my_path = "../data/"
# set the version number of the data asset
v1 = "initial"

			   version=v1,    
			   description="Credit card data",    
 			   path=my_path,    
			   type=AssetTypes.URI_FOLDER)

my_data.path
# output: azureml://datastores/workspaceblobstore/paths/LocalUpload/SOMETHING/data

os.listdir(my_data.path)
# FileNotFoundError: [Errno 2] No such file or directory:

# copying and using complete path from the data-stores 
os.listdir("azureml://subscriptions/<sid>/resourcegroups/<rid>/workspaces/<wid>/datastores/workspaceblobstore/paths/LocalUpload/<cid>/data"))
# FileNotFoundError: [Errno 2] No such file or directory:

I have been able to make it work only with the following azureml.fsspec I think the tutorials and documentation should be clear about that. Even the applied skills labs have code that tries to access the files without the following package. Please correct me if I am doing something wrong.

from azureml.fsspec import AzureMachineLearningFileSystem
fs = AzureMachineLearningFileSystem(my_data.path)
fs.ls()

Azure | Azure Training
{count} vote

Accepted answer
  1. Rakesh Gurram 15,715 Reputation points Microsoft External Staff Moderator
    2024-02-07T13:21:31.8133333+00:00

    Hi Vishal Mahajan,

    Sorry for the inconvenience.  

    Based on your code, it seems that you are trying to access the files in the Azure data asset, using the os module, which is not supported in Azure Machine Learning. Instead, you should use the AzureMachineLearningFileSystem class from the azureml.fsspec module to access the files in the Azure data asset.

    If you are still facing any issue, please let us know in the comments. We are glad to help you.

    Thank you.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.