Python aiohttp Azure Function works locally, but throws ClientConnector error on Azure
Hi there,
I'm working on a TimerTrigger Azure Function in python that makes pretty heavy use of the aiohttp library to make concurrent requests to a file cache, grab ~8K JSON files, and prepare them to be loaded into a database. I have been able to run the process end-to-end without issue on my local machine (OSX). That is to say, with Azure Functions Core Tools, I've been able to func start
the process, start the job with a POST request to http://localhost:7071/admin/functions/NameOfMyFunction
, and have everything work just fine.
However, when I publish this function to my Azure Functions App, the TimerTrigger kicks off as expected, but somewhere not too far into the process of "concurrently fetching the JSON files," the function execution fails with this error (I've redacted the actual url and IP address I'm hitting for confidentiality reasons):
Result: Failure Exception: ClientConnectorError: Cannot connect to host https://FILE-CACHE-URL:443 ssl:default [Connect call failed ('XX.XXX.XXX.XXX', 443)] Stack: File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 370, in _handle__invocation_request call_result = await self._loop.run_in_executor( File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions
Uncertain if this is the place for too much more detail, but I figure some peek into actual code I am running might be helpful. Here are the most critical excerpts.
From the run.py
file, the entry point of the Azure Function
import asyncio
import azure.functions as func
from helpers.doctor_info import fetch_doctor_profiles
def main(myTimer: func.TimerRequest) -> None:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
doc_profiles = loop.run_until_complete(fetch_doctor_profiles())
From the doctor_info.py
file, a helper function that is imported the fetch the profiles: Given a big list of the ~8K files I need to grab from the Cache, this splits them into batches of 50, and fetches files from each batch concurrently, allowing for pauses in between.
async def fetch_doctor_profiles(batch_size = 50, max_concurrent_requests = 15,
use_trust_env = True):
json_list = []
file_paths = get_cache_file_paths_from_manifest()
path_batches = make_batches_of_paths(paths = file_paths, size = batch_size)
sem = asyncio.Semaphore(max_concurrent_requests)
connector = aiohttp.TCPConnector(verify_ssl=False)
async with ClientSession(connector = connector, trust_env = use_trust_env) as session:
json_batches = await asyncio.gather(*[fetch_jsons_in_batch(sem, session, batch) \
for batch in path_batches])
for jsons in json_batches:
unpack_fetched_profiles(profile_list = jsons, out_list = json_list)
return json_list
As you may be able to see in the above excerpt, I originally thought that this might be an SSL handshake issue, and have experimented with disabling SSL validation with no luck: things continue to work fine when hosted on my laptop, but break in Azure.
Given that this always works just fine locally but has never worked in deployment, I figure that the root of this issue is some difference in environment once this process is hosted in the cloud, but I'm at a bit of a loss at how to diagnose exactly what that difference is.
If there are MS Azure Functions maintainers monitoring this thread, I would be happy to provide function invocation IDS, error traces, or even more detail than above.
Thanks very much, and any help would be appreciated