OSError: Cannot save file into a non-existent directory

Ankit19 Gupta 46 Reputation points
2022-10-21T15:43:56.92+00:00

I am using Azure ML Studio to read data from a csv file by creating a data asset test5 and write data into a csv file for my current working directory (which is failing). I am submitting a Job using a Compute Cluster and a Custom Environment and I am following the instructions from the tutorial: https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day

I have written the code in a notebook cell as:

# Handle to the workspace  
from azure.ai.ml import MLClient  
  
# Authentication package  
from azure.identity import DefaultAzureCredential  
credential = DefaultAzureCredential()  
  
# Get a handle to the workspace  
ml_client = MLClient(  
    credential=credential,  
    subscription_id="abc",  
    resource_group_name="xyz",  
    workspace_name="pqr",  
)  
from azure.ai.ml import command  
from azure.ai.ml import Input  
  
registered_model_name = "read_data"  
env_name = "docker-context"  
job = command(inputs=dict(  
        data=Input(  
            type="uri_file",  
            path="azureml:test5:1",  
        ),  
        registered_model_name=registered_model_name  
    ),     
    code="./src/",  # location of source code  
    command="python main.py --data ${<!-- -->{inputs.data}} --registered_model_name ${<!-- -->{inputs.registered_model_name}}",  
    environment="docker-context:10",  
    compute="amlcluster01",  
    experiment_name="read_data1",  
    display_name="read_data2",  
    )  
ml_client.create_or_update(job)  
  

This works fine. The content of the main.py is:

import os  
import argparse  
import pandas as pd  
  
def main():  
    print("Hello")  
     # input and output arguments  
    parser = argparse.ArgumentParser()  
    parser.add_argument("--data", type=str, help="path to input data")  
    parser.add_argument("--registered_model_name", type=str, help="model name")  
    args = parser.parse_args()  
    print(" ".join(f"{k}={v}" for k, v in vars(args).items()))  
    print("input data:", args.data)  
    read_data=pd.read_csv(args.data)  
    #read_data=pd.read_parquet(args.data, engine='pyarrow')  
    #credit_df = pd.read_excel(args.data, header=1, index_col=0)  
    print(read_data)  
    read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')  
  
    print("Hello World !")  
  
if __name__ == "__main__":  
    main()  
  

Here, all lines of code work fine except read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv').

It shows the error message as: OSError: Cannot save file into a non-existent directory:/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src

Can anyone please help me how to save dataframe into a csv file into my current working directory through a Job. Any help would be appreciated.

Azure Machine Learning
Azure Machine Learning
An Azure machine learning service for building and deploying models.
2,563 questions
Azure AI services
Azure AI services
A group of Azure services, SDKs, and APIs designed to make apps more intelligent, engaging, and discoverable.
2,375 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. YutongTie-MSFT 46,566 Reputation points
    2022-10-23T19:18:49.723+00:00

    Helllo @Ankit19 Gupta

    Thanks for using Microsoft Q&A platform, for "Reading and Writing data in a job" in official guidance, please refer to below sample for ML SDK V2 - https://github.com/Azure/azureml-examples/blob/sdk-preview/sdk/assets/data/data.ipynb

    If that's not want you want, I have done some researches around it and found a thread about the same issue in Stack - https://stackoverflow.com/questions/47143836/pandas-dataframe-to-csv-raising-ioerror-no-such-file-or-directory

    It seems this error was caused by to_csv does create the file if it doesn't exist as you said, but it does not create directories that don't exist. Ensure that the subdirectory you are trying to save your file within has been created first as below -

    import os  
      
    outname = 'name.csv'  
      
    outdir = './dir'  
    if not os.path.exists(outdir):  
        os.mkdir(outdir)  
      
    fullname = os.path.join(outdir, outname)      
      
    df.to_csv(fullname)  
    

    Please have a try and I hope above helps, let me know how is going and we are happy to help.

    Regards,
    Yutong

    -Please kindly accept the answer if you feel helpful to support the community, thanks a lot.