I have come across the following example at https://learn.microsoft.com/en-us/training/modules/make-data-available-azure-machine-learning/4-create-data-asset
import argparse
import glob
import pandas as pd
parser = argparse.ArgumentParser()
parser.add_argument("--input_data", type=str)
args = parser.parse_args()
data_path = args.input_data
all_files = glob.glob(data_path + "/*.csv")
df = pd.concat((pd.read_csv(f) for f in all_files), sort=False)
For example, if I create the data-asset and try to list the files in the directory using os, glob package:
my_path = "../data/"
# set the version number of the data asset
v1 = "initial"
version=v1,
description="Credit card data",
path=my_path,
type=AssetTypes.URI_FOLDER)
my_data.path
# output: azureml://datastores/workspaceblobstore/paths/LocalUpload/SOMETHING/data
os.listdir(my_data.path)
# FileNotFoundError: [Errno 2] No such file or directory:
# copying and using complete path from the data-stores
os.listdir("azureml://subscriptions/<sid>/resourcegroups/<rid>/workspaces/<wid>/datastores/workspaceblobstore/paths/LocalUpload/<cid>/data"))
# FileNotFoundError: [Errno 2] No such file or directory:
I have been able to make it work only with the following azureml.fsspec I think the tutorials and documentation should be clear about that. Even the applied skills labs have code that tries to access the files without the following package.
Please correct me if I am doing something wrong.
from azureml.fsspec import AzureMachineLearningFileSystem
fs = AzureMachineLearningFileSystem(my_data.path)
fs.ls()