python app function error when reading a parquet file

LEANDRO DARIVA PINTO 11 Reputation points
2022-11-18T22:46:08.053+00:00

I am developing a python script that will run as a app function. It should read one parquet file from our gen1 datalake and do some processing over it.
When running in debug mode in VS Code it works perfectly but when I deploy the script to the app function it retrieve a error with a not very meaninfull message.

Executed 'Functions.get_warehouse_from_sap' (Failed, Id=227a48b8-0486-4c3f-8758-1f6298afaf68, Duration=9122ms)

This happens when it tries to read the parquet file. I tried to use pyarrow and pandas.read_parquet function but both give the same error. I tried to put try execept aroung this particular point of the code but none excepetion is retrieved. To read the datalake I am using AzureDLFileSystem from azure.datalake.store.core python libray. Here is part of my code.

from azure.datalake.store import lib  
from azure.datalake.store.core import AzureDLFileSystem  
import pandas as pd  
  
adlCreds = lib.auth(tenant_id=tenant_id,  
                               client_id=client_id,  
                               client_secret=secret_key,  
                               resource = 'https://datalake.azure.net/')  
 adlsFileSystemClient = AzureDLFileSystem(adlCreds, store_name='<repository name>')  
  
 f=adlsFileSystemClient.ls('<path to my file>')  
#until here it works fine. It can open the file  

#here is where the problem happens.  
try:  
    df=pd.read_parquet(f)  
except Exception as e:  
    logging.info(str(e))  

Any idea?
Thanks

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,300 questions
{count} votes

1 answer

Sort by: Most helpful
  1. LEANDRO DARIVA PINTO 11 Reputation points
    2022-11-29T14:05:20.55+00:00

    I finally manage to solve the problem. The problem was memory consumption. As I am still running the app in a dev enviroment it has just 1.5Gb of memory. I was reading the whole parquet file which was consuming approximatelly 2.5Gb. I changed my code so I use the pyarrow.read_table with filter option and reading just the necessary columns from the parquet file. This reduced the memory consumption and the app function started working.

    1 person found this answer helpful.
    0 comments No comments