I finally manage to solve the problem. The problem was memory consumption. As I am still running the app in a dev enviroment it has just 1.5Gb of memory. I was reading the whole parquet file which was consuming approximatelly 2.5Gb. I changed my code so I use the pyarrow.read_table with filter option and reading just the necessary columns from the parquet file. This reduced the memory consumption and the app function started working.
python app function error when reading a parquet file
I am developing a python script that will run as a app function. It should read one parquet file from our gen1 datalake and do some processing over it.
When running in debug mode in VS Code it works perfectly but when I deploy the script to the app function it retrieve a error with a not very meaninfull message.
Executed 'Functions.get_warehouse_from_sap' (Failed, Id=227a48b8-0486-4c3f-8758-1f6298afaf68, Duration=9122ms)
This happens when it tries to read the parquet file. I tried to use pyarrow and pandas.read_parquet function but both give the same error. I tried to put try execept aroung this particular point of the code but none excepetion is retrieved. To read the datalake I am using AzureDLFileSystem from azure.datalake.store.core python libray. Here is part of my code.
from azure.datalake.store import lib
from azure.datalake.store.core import AzureDLFileSystem
import pandas as pd
adlCreds = lib.auth(tenant_id=tenant_id,
client_id=client_id,
client_secret=secret_key,
resource = 'https://datalake.azure.net/')
adlsFileSystemClient = AzureDLFileSystem(adlCreds, store_name='<repository name>')
f=adlsFileSystemClient.ls('<path to my file>')
#until here it works fine. It can open the file
#here is where the problem happens.
try:
df=pd.read_parquet(f)
except Exception as e:
logging.info(str(e))
Any idea?
Thanks