I finally manage to solve the problem. The problem was memory consumption. As I am still running the app in a dev enviroment it has just 1.5Gb of memory. I was reading the whole parquet file which was consuming approximatelly 2.5Gb. I changed my code so I use the pyarrow.read_table with filter option and reading just the necessary columns from the parquet file. This reduced the memory consumption and the app function started working.
python app function error when reading a parquet file
I am developing a python script that will run as a app function. It should read one parquet file from our gen1 datalake and do some processing over it.
When running in debug mode in VS Code it works perfectly but when I deploy the script to the app function it retrieve a error with a not very meaninfull message.
Executed 'Functions.get_warehouse_from_sap' (Failed, Id=227a48b8-0486-4c3f-8758-1f6298afaf68, Duration=9122ms)
This happens when it tries to read the parquet file. I tried to use pyarrow and pandas.read_parquet function but both give the same error. I tried to put try execept aroung this particular point of the code but none excepetion is retrieved. To read the datalake I am using AzureDLFileSystem from azure.datalake.store.core python libray. Here is part of my code.
from azure.datalake.store import lib from azure.datalake.store.core import AzureDLFileSystem import pandas as pd adlCreds = lib.auth(tenant_id=tenant_id, client_id=client_id, client_secret=secret_key, resource = 'https://datalake.azure.net/') adlsFileSystemClient = AzureDLFileSystem(adlCreds, store_name='<repository name>') f=adlsFileSystemClient.ls('<path to my file>') #until here it works fine. It can open the file #here is where the problem happens. try: df=pd.read_parquet(f) except Exception as e: logging.info(str(e))
@LEANDRO DARIVA PINTO , Instead of using the adlsFileSystemClient , could you please push the file to blob storage and try to read it using function app input binding or Blob storage SDK?
Could you explain this a little better?
I don't have much experience with blob storage but what is the difference between a blob storage and the data lake gen1 where the file I want to read is already saved?
@LEANDRO DARIVA PINTO , If it is already saved. It is better to read using the adlsFileSystemClient. Since this requires deeper troubleshooting, I would request you to open a support ticket with MS support. Please let me know if you dont have a support plan
Yes. We have a support plan here in my company (braskem). I will check how to open a ticket there.
Sign in to comment
Sort by: Most helpful