a compatibility or interaction problem between Scrapy's asynchronous nature and the Azure Functions environment.
Nanduri Sai Venkata Raju
0
Reputation points
compaitability issue when azure functions trigeer Spacy Class. I have Specific requirement whenever there is a URL, I should call Webcrawling code. But webcrawling code writtenin Spacy CrawlerProcess. I am getting :"Signal only works in main thread of the main interpreter"
def crawl_websites_from_old(start_urls,max_depth):
# process = CrawlerRunner ()
process = CrawlerProcess(settings)
process.crawl(SiteDownloadSpider, input='inputargument', url=start_urls, depth=max_depth)
process.start()
ERROR:
Exception: ValueError: signal only works in main thread of the main interpreter
Stack: File "C:\Program Files (x86)\Microsoft\Azure Functions Core Tools\workers\python\3.10\WINDOWS\X64\azure_functions_worker\dispatcher.py", line 493, in _handle__invocation_request
call_result = await self._loop.run_in_executor(
File "C:\Users\nandurisai.venkatara\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Program Files (x86)\Microsoft\Azure Functions Core Tools\workers\python\3.10\WINDOWS\X64\azure_functions_worker\dispatcher.py", line 762, in _run_sync_func
return ExtensionManager.get_sync_invocation_wrapper(context,
File "C:\Program Files (x86)\Microsoft\Azure Functions Core Tools\workers\python\3.10\WINDOWS\X64\azure_functions_worker\extension.py", line 215, in _raw_invocation_wrapper
result = function(**args)
File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\function_app.py", line 76, in crawling
process.start()
File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\scrapy\crawler.py", line 420, in start
install_shutdown_handlers(self._signal_shutdown)
File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\scrapy\utils\ossignal.py", line 28, in install_shutdown_handlers
reactor._handleSignals()
File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\twisted\internet\posixbase.py", line 142, in _handleSignals
_SignalReactorMixin._handleSignals(self)
File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\twisted\internet\base.py", line 1281, in _handleSignals
signal.signal(signal.SIGINT, reactorBaseSelf.sigInt)
File "C:\Users\nandurisai.venkatara\AppData\Local\Programs\Python\Python310\lib\signal.py", line 47, in signal
handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
Function code:
@app.function_name(name="Crawling")
@app.queue_trigger(arg_name="azqueue", queue_name=AzureConstants.queue_name_crawl, connection="AzureWebJobsStorage")
@app.queue_output(arg_name="trainmessage",queue_name=AzureConstants.queue_name_train,connection="AzureWebJobsStorage")
def crawling(azqueue: func.QueueMessage,trainmessage: func.Out[str]):
url,depth=azqueue.get_body().decode('utf-8').split("|")
crawl_websites_from_old(url,depth)
Sign in to answer