Parellize url scraping in Function Apps (Python)

Micaela 21 Reputation points
2022-11-09T13:37:36.283+00:00

Hi,

I am facing issues/erros when running a timetrigger funciton app. The function triggers the code, which runs just fine, until I add the parellelization. I show the code below:

df_list = []
with multiprocessing.Pool(processes=4) as p:
for result in p.imap(web_scraper_function, url_list_to_scrape):
price = result[1]
item_id = result[2]
df_list.append([price, item_id])
df_final = pd.DataFrame(df_list)
df_final.to_sql('table1', AZURE_CONN, schema='one', if_exists='append', index=False)

The issue/error faced are:
(1) after scraping 6000/7000 urls, I get:
(a) Timeout value of 00:05:00 exceeded by function 'Functions.TimerTest123456' (Id: 'xxxx'). Initiating cancellation.
(b) Executed '{functionName}' ({status}, Id={invocationId}, Duration={executionDuration}ms)
(c) Executed 'Functions.TimerTest123456' (Failed, Id=xxxx, Duration=300142ms)
(2) It never gets to send the df_final to our database (hosted in azure)

Would anyone be able to help on how to make this code work to paste the df into the database? Or, aleternatively -yet not preferable-, to change the way I am approching the parallelization so as to make it work?

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
4,196 questions
0 comments No comments
{count} votes

Accepted answer
  1. MughundhanRaveendran-MSFT 12,411 Reputation points
    2022-11-11T06:50:25.687+00:00

    Hi @Micaela ,

    Thanks for posting this query in Q&A forum.

    Instead of trying parallelization in the code, you can make use of the inbuilt functionality of running the function in more threads. By adding the app setting PYTHON_THREADPOOL_THREAD_COUNT to a value between 2 and 32, you can achieve parallelization.

    https://learn.microsoft.com/en-us/azure/azure-functions/functions-app-settings#python_threadpool_thread_count

    Hope this helps!

    Please 'Accept as answer' and ‘Upvote’ if it helped so that it can help others in the community looking for help on similar topics.


0 additional answers

Sort by: Most helpful