Hi,
I am facing issues/erros when running a timetrigger funciton app. The function triggers the code, which runs just fine, until I add the parellelization. I show the code below:
df_list = []
with multiprocessing.Pool(processes=4) as p:
for result in p.imap(web_scraper_function, url_list_to_scrape):
price = result[1]
item_id = result[2]
df_list.append([price, item_id])
df_final = pd.DataFrame(df_list)
df_final.to_sql('table1', AZURE_CONN, schema='one', if_exists='append', index=False)
The issue/error faced are:
(1) after scraping 6000/7000 urls, I get:
(a) Timeout value of 00:05:00 exceeded by function 'Functions.TimerTest123456' (Id: 'xxxx'). Initiating cancellation.
(b) Executed '{functionName}' ({status}, Id={invocationId}, Duration={executionDuration}ms)
(c) Executed 'Functions.TimerTest123456' (Failed, Id=xxxx, Duration=300142ms)
(2) It never gets to send the df_final to our database (hosted in azure)
Would anyone be able to help on how to make this code work to paste the df into the database? Or, aleternatively -yet not preferable-, to change the way I am approching the parallelization so as to make it work?