Azure Machine Learning - correct input dataset however cannot open files needed due to '[Errno 2] No such file or directory'

Sanne 6 Reputation points
2021-05-20T09:26:55.9+00:00

I'm using Microsoft Azure Machine Learning to train a CNN. This is the link to the github where the model is stored: https://github.com/rodekruis/caladrius/tree/handle_imbalance/caladrius. This code already works, I'm just trying to run the model (so do both training/testing) myself in my own Microsoft Azure Machine Learning environment now. I have the data needed for training, and by following the 'Tutorial: use your own data' (https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-1st-experiment-bring-data) I uploaded the data in a datastore, which is of type 'Azure Blob Storage'. Then, I want to run the model such that it starts training. In order to do so, the 'run.py' file in the Github has to be run and I created the following control script to run it in the Microsoft Azure environment:

from azureml.core import Run, Workspace, Datastore, Dataset, Experiment, ScriptRunConfig, Environment  
from azure.identity import DefaultAzureCredential  
from azureml.data.datapath import DataPath  
from azureml.data.dataset_consumption_config import DatasetConsumptionConfig  
from azureml.data import OutputFileDatasetConfig  

if __name__ == "__main__":  
    run = Run.get_context()  
    credential = DefaultAzureCredential()  
    ws = Workspace.from_config()  
    datastore = Datastore.get(ws, 'xview')  
    dataset_small = Dataset.File.from_files(path=(datastore, '/test_small/**'))  
    #print(dataset_small.to_path())  
    #data_path = DataPath(datastore=datastore, path_on_datastore='test_small/')  

    checkpoint = OutputFileDatasetConfig(destination=(datastore, '/test_small/runs/'))  

    experiment = Experiment(workspace=ws, name='thesis-sanne')  
    config = ScriptRunConfig(source_directory='',   
    script='run.py',   
    compute_target='standardK80GPU',   
    arguments = ['--data-path',dataset_small.as_named_input('input').as_mount(),   
    '--output-type', 'classification',  
    '--run-name', 'test1',  
    '--checkpoint-path', checkpoint,  
    ]  
    )    

    env= Environment.from_conda_specification(name='caladriusenv', file_path='caladriusenv.yml')  
    config.run_config.environment = env  

    run = experiment.submit(config)  
    aml_url = run.get_portal_url()  
    print(aml_url)  

when I submit this run, the run fails after receiving the following error:

UserError: [Errno 2] No such file or directory: '9e3238da-8b8f-441a-a71a-2f6911f58f33/train/labels.txt'

See also in this picture: Error message from run

The exact error message given in the 70_driver_log.txt file is:

[2021-05-16T08:26:21.977916] The experiment failed. Finalizing run...
2021-05-16 08:26:21,978 main INFO Exiting context: TrackUserError
[2021-05-16T08:26:21.984462] Writing error with error_code UserError and error_hierarchy UserError/FileNotFoundError to hosttool error file located at /mnt/batch/tasks/workitems/073df44a-2932-4bfb-a484-64e6382d81a1/job-1/thesis-sanne_1621153_5c453238-c8de-474e-b42c-44f791fbccea/wd/runTaskLetTask_error.json
Starting the daemon thread to refresh tokens in background for process with pid = 85
2021-05-16 08:26:22,061 main INFO Exiting context: RunHistory
2021-05-16 08:26:22,061 main INFO Exiting context: Dataset
2021-05-16 08:26:22,062 main INFO Exiting context: ProjectPythonPath
Traceback (most recent call last):
File "run.py", line 59, in <module>
main()
File "run.py", line 46, in main
run_report, datasets, args.number_of_epochs, args.selection_metric
File "/mnt/batch/tasks/shared/LS_root/jobs/azureaccount/azureml/thesis-sanne_1621153514_fc6d0c0e/mounts/workspaceblobstore/azureml/thesis-sanne_1621153514_fc6d0c0e/model/trainer.py", line 395, in train
train_set, train_loader = datasets.load("train")
File "/mnt/batch/tasks/shared/LS_root/jobs/azureaccount/azureml/thesis-sanne_1621153514_fc6d0c0e/mounts/workspaceblobstore/azureml/thesis-sanne_1621153514_fc6d0c0e/model/data.py", line 144, in load
augment_type=self.augment_type,
File "/mnt/batch/tasks/shared/LS_root/jobs/azureaccount/azureml/thesis-sanne_1621153514_fc6d0c0e/mounts/workspaceblobstore/azureml/thesis-sanne_1621153514_fc6d0c0e/model/data.py", line 75, in init
os.path.join(self.directory, self.labels_filename) #self.labels_filename) # "labels.txt")
FileNotFoundError: [Errno 2] No such file or directory: '9e3238da-8b8f-441a-a71a-2f6911f58f33/train/labels.txt'

However, the datastore itself is found correctly and is stated as the 'Input datasets' in the picture, which includes the 'train/labels.txt' file I'm trying to open. I checked this in the script, by printing:

print(dataset_small.to_path())

And the output of this included the file '/train/labels.txt'

I think my problem is that the script does correctly call the dataset, however the data-path it needs to open the files, is incorrect. Trying to solve this problem I've already tried the following:

  1. Instead of '--data-path', dataset_small.as_named_input('input').as_mount() I used:
    • DataPath(datastore=datastore, path_on_datastore='test_small/'), however this doesn't work as a DataPath object is not JSON serializable
    • DataReference(datastore, path_on_datastore='./test_small/', mode='mount'), however this doesn't work as a DataReference object is not JSON serializable
    • DatasetConsumptionConfig('dataset', dataset_small, mode='direct', path_on_compute=None), this is essentially what I do with Dataset.File.from_files() already and thus also doesn't work
    • DataPathComputeBinding(mode='mount', path_on_compute=None, overwrite=False), however this doesn't work as a DataPathComputeBinding object is not JSON serializable
    • datastore_paths = [(blob_datastore, 'test_small')], however this doesn't work as this object is not JSON serializable
      1. Instead of accessing the datastore in the Azure Machine Learning environment I tried accessing the datacontainer including the data directly from my Azure Storage account/container. However the same problem occured.

Thus the problem shortly is that my Azure Machine Learning does find the dataset I want it to use, which includes everything needed. However the model is not able to open the datafiles because it says they don't exist whereas I'm sure that they do exist and the Azure Machine Learning environment does know where to find them but does not know how to access/open them properly I think?

It seems to be an easy to fix problem, however I've tried every way I could think of or was proposed online and still nothing has worked yet. So hopefully someone here can help me, thank you so much in advance! If extra information is needed to clarify things, I'd be glad to provide it.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
3,332 questions
0 comments No comments
{count} vote

1 answer

Sort by: Most helpful
  1. Ramr-msft 17,826 Reputation points
    2021-05-20T15:44:44.003+00:00

    @Sanne Thanks for the question, I usually find OutputFileDatasetConfig very powerful. Please follow the below to reference folder/directory with ScriptRunConfig in Azureml.

    from azureml.core import ScriptRunConfig, Experiment  
    from azureml.data import OutputFileDatasetConfig  
    output_port = OutputFileDatasetConfig(  
        destination=(def_data_store, "outputs/test_diroutputFileDatasetConfig/"), name="dir_test"  
    )  
       
    experiment = Experiment(ws, 'MyExperiment')  
    config = ScriptRunConfig(source_directory='modules/test_output_dir/',  
                             script='copy.py',  
                             arguments = ['--output',  
                                      output_port],  
                             compute_target=" ")  
    script_run = experiment.submit(config)  
    

    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets#create-a-filedataset
    https://learn.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets#create-a-tabulardataset

    MachineLearningNotebooks/file-dataset-image-inference-mnist.ipynb at master · Azure/MachineLearningNotebooks (github.com)

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.