Azure function v2 (python) to chunk data in a for loop does not output to blob storage

Question

Azure function v2 (python) to chunk data in a for loop does not output to blob storage

jim01011 0

Im trying to write a simple for loop that cut's a list into chunks within an Azure function v2 (python) and outputs these blobs to csvs.

import
import logging
import azure.functions as func


app = func.FunctionApp()

@app.blob_trigger(arg_name="myblob", path="old/{name}",connection="Storage") 
@app.blob_output(arg_name="outputblob", path="new/{name}",connection="Storage") 
def BlobTrigger(myblob: func.InputStream,outputblob: func.Out[str]):
    results = random.sample(range(1, 500), 7)
    step = 10
    for i in range(0, len(results), step): 
        x = i 
        results_chunked = results[x:x+step]
        df = pd.DataFrame(results_chunked)
        filename = '/tmp/'+ str(i) +'.csv'
        output_csv = df.to_csv(filename,index = False)
        outputblob.set(output_csv)

Blob function was executed successfully

enter image description here

However the chunked data (output_csv) is not being received in path="new/{name}" any assistance on this ?

1 answer

Your answer

Answer 1

Hi,

In your code, you're trying to output CSV files to Azure Blob Storage using an Azure Function. However, there are a couple of issues:

The outputblob parameter in your function is of type func.Out[str], which expects a string as input. You're trying to set it with the output of df.to_csv(), which returns None. Instead, you should write the CSV data to a file and then upload that file to blob storage.
You're using a local file path (/tmp/) which won't work in Azure Functions as they are serverless and have limited access to the file system.

import logging

import azure.functions as func

import pandas as pd

import random

import os

@app.blob_trigger(name="myblob", path="old/{name}", connection="Storage")

@app.blob_output(name="outputblob", path="new/{name}", connection="Storage")

def BlobTrigger(myblob: func.InputStream, outputblob: func.Out[str]):

results = random.sample(range(1, 500), 7)

step = 3  # Adjust step size as needed

for i in range(0, len(results), step): 

    results_chunked = results[i:i+step]

    df = pd.DataFrame(results_chunked)

    csv_data = df.to_csv(index=False)

    # Generate a unique file name

    file_name = str(i) + '.csv'

    # Write CSV data to a temporary file

    temp_file_path = os.path.join(os.environ['TMP'], file_name)

    with open(temp_file_path, 'w') as temp_file:

        temp_file.write(csv_data)

    # Upload the temporary file to blob storage

    outputblob.set(open(temp_file_path, 'rb'))

    # Delete the temporary file

    os.remove(temp_file_path)

This code generates CSV files in chunks and uploads them to Azure Blob Storage using the Azure Function's outputblob parameter. Make sure to adjust the step variable according to your chunk size requirements.

Share via

Azure function v2 (python) to chunk data in a for loop does not output to blob storage

1 answer

Your answer