I am using Azure ML Studio
to read data from a csv file by creating a data asset test5
and write data into a csv file for my current working directory (which is failing). I am submitting a Job
using a Compute Cluster
and a Custom Environment
and I am following the instructions from the tutorial: https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-azure-ml-in-a-day
I have written the code in a notebook cell as:
# Handle to the workspace
from azure.ai.ml import MLClient
# Authentication package
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
# Get a handle to the workspace
ml_client = MLClient(
credential=credential,
subscription_id="abc",
resource_group_name="xyz",
workspace_name="pqr",
)
from azure.ai.ml import command
from azure.ai.ml import Input
registered_model_name = "read_data"
env_name = "docker-context"
job = command(inputs=dict(
data=Input(
type="uri_file",
path="azureml:test5:1",
),
registered_model_name=registered_model_name
),
code="./src/", # location of source code
command="python main.py --data ${<!-- -->{inputs.data}} --registered_model_name ${<!-- -->{inputs.registered_model_name}}",
environment="docker-context:10",
compute="amlcluster01",
experiment_name="read_data1",
display_name="read_data2",
)
ml_client.create_or_update(job)
This works fine. The content of the main.py is:
import os
import argparse
import pandas as pd
def main():
print("Hello")
# input and output arguments
parser = argparse.ArgumentParser()
parser.add_argument("--data", type=str, help="path to input data")
parser.add_argument("--registered_model_name", type=str, help="model name")
args = parser.parse_args()
print(" ".join(f"{k}={v}" for k, v in vars(args).items()))
print("input data:", args.data)
read_data=pd.read_csv(args.data)
#read_data=pd.read_parquet(args.data, engine='pyarrow')
#credit_df = pd.read_excel(args.data, header=1, index_col=0)
print(read_data)
read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')
print("Hello World !")
if __name__ == "__main__":
main()
Here, all lines of code work fine except read_data.to_csv(r'/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src/file3.csv')
.
It shows the error message as: OSError: Cannot save file into a non-existent directory:/home/azureuser/cloudfiles/code/Users/Ankit19.Gupta/azureml-in-a-day/src
Can anyone please help me how to save dataframe into a csv file into my current working directory through a Job
. Any help would be appreciated.