using Python in ADF V2 to parse a csv file

AzeemK 516 Reputation points
2021-01-05T21:56:19.047+00:00

is there a good sample or example of how to use Python in ADF V2 to parse a csv or other flat files.
via Azure functions or other pipe methods etc.

Thanks

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,803 questions
0 comments No comments
{count} votes

Accepted answer
  1. KranthiPakala-MSFT 46,502 Reputation points Microsoft Employee
    2021-01-06T00:24:50.547+00:00

    Hi @AzeemK ,

    Thanks for the ask and using this forum.

    I believe you are looking for this tutorial which explains how to run Python scripts through Azure Data Factory using Custom Activity (Azure Batch)

    Here is the tutorial link: Run Python scripts through Azure Data Factory using Azure Batch

    The following Python script is used in the tutorial, which loads the iris.csv dataset from input container, performs a data manipulation process, and saves the results back to the output container.

    # Load libraries  
    from azure.storage.blob import BlobServiceClient  
    import pandas as pd  
      
    # Define parameters  
    storageAccountURL = "<storage-account-url>"  
    storageKey         = "<storage-account-key>"  
    containerName      = "output"  
      
    # Establish connection with the blob storage account  
    blob_service_client = BlockBlobService(account_url=storageAccountURL,  
                                   credential=storageKey  
                                   )  
      
    # Load iris dataset from the task node  
    df = pd.read_csv("iris.csv")  
      
    # Subset records  
    df = df[df['Species'] == "setosa"]  
      
    # Save the subset of the iris dataframe locally in task node  
    df.to_csv("iris_setosa.csv", index = False)  
      
    # Upload iris dataset  
    container_client = blob_service_client.get_container_client(containerName)  
    with open("iris_setosa.csv", "rb") as data:  
        blob_client = container_client.upload_blob(name="iris_setosa.csv", data=data)  
    

    Hope this info helps.

    ----------

    Thank you
    Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.


0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.