Azure Functions “The operation has timed out.” for timer trigger blob archival

ddx1 171 Reputation points
2020-11-18T19:52:01.383+00:00

I have a Python Azure Functions timer trigger that is run once a day and archives files from a general purpose v2 hot storage container to a general purpose v2 cold storage container. I'm using the Linux Consumption plan. The code looks like this:

container = ContainerClient.from_connection_string(conn_str=hot_conn_str, 
                                                   container_name=hot_container_name)
blob_list = container.list_blobs(name_starts_with = hot_data_dir)
files = []
for blob in blob_list:
    files.append(blob.name) 
for file in files:
    blob_from = BlobClient.from_connection_string(conn_str=hot_conn_str, 
                                             container_name=hot_container_name, 
                                             blob_name=file)
    data = blob_from.download_blob()
    blob_to = BlobClient.from_connection_string(conn_str=cold_conn_str, 
                                             container_name=cold_container_name, 
                                             blob_name=f'archive/{file}')
    try:                                         
        blob_to.upload_blob(data.readall())
    except ResourceExistsError:
        logging.debug(f'file already exists: {file}')
    except ResourceNotFoundError:
        logging.debug(f'file does not exist: {file}')
    container.delete_blob(blob=file)

This has been working for me for the past few months with no problems, but for the past two days I am seeing this error halfway through the archive process:
The operation has timed out.
There is no other meaningful error message other than that. If I manually call the function through the UI, it will successfully archive the rest of the files. The size of the blobs ranges from a few KB to about 5 MB and the timeout error seems to be happening on files that are 2-3MB. There is only one invocation running at a time so I don't think I'm exceeding the 1.5GB memory limit on the consumption plan (I've seen python exited with code 137 from memory issues in the past). Why am I getting this error all of a sudden when it has been working flawlessly for months?

Azure Data Lake Storage
Azure Data Lake Storage
An Azure service that provides an enterprise-wide hyper-scale repository for big data analytic workloads and is integrated with Azure Blob Storage.
1,423 questions
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,620 questions
Azure Storage Accounts
Azure Storage Accounts
Globally unique resources that provide access to data management services and serve as the parent namespace for the services.
2,904 questions
Azure Blob Storage
Azure Blob Storage
An Azure service that stores unstructured data in the cloud as blobs.
2,606 questions
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. ddx1 171 Reputation points
    2020-11-18T20:55:31.76+00:00

    I think I'm going to use the method described here to archive instead so I don't have to store the blob contents in memory: https://www.europeclouds.com/blog/moving-files-between-storage-accounts-with-azure-functions-and-event-grid

    0 comments No comments

  2. JayaC-MSFT 5,526 Reputation points
    2020-11-26T08:22:35.237+00:00

    Hello @TomTakiyama-3838, upon further investigation, we have found that there was an invocation latency hence the timeout. Could you confirm the below points for the same and we believe that you are already following the guidelines:

    1. if your function(s) are async, ensure that you're following correct async coding practices. - Any calls to Task.Wait() or Task.Result or Task.GetAwaiter().GetResult() are troublesome, as they block threads, which again can prevent the background renewal tasks from running. Ensure your async functions are "async all the way down".
    2. ensure you're using asynchronous APIs for any IO operations. Synchronous/blocking IO is troublesome.
    3. look for ways you may be overloading the host. Are you running too many functions on a single instance, stressing the host?

    Hope this helps!