a compatibility or interaction problem between Scrapy's asynchronous nature and the Azure Functions environment.

Nanduri Sai Venkata Raju 0 Reputation points
2024-02-09T09:28:05.65+00:00

compaitability issue when azure functions trigeer Spacy Class. I have Specific requirement whenever there is a URL, I should call Webcrawling code. But webcrawling code writtenin Spacy CrawlerProcess. I am getting :"Signal only works in main thread of the main interpreter"

def crawl_websites_from_old(start_urls,max_depth):   
    # process = CrawlerRunner ()    
   process = CrawlerProcess(settings) 
   process.crawl(SiteDownloadSpider, input='inputargument', url=start_urls, 	  depth=max_depth)
	
   process.start()

ERROR:

Exception: ValueError: signal only works in main thread of the main interpreter
Stack:   File "C:\Program Files (x86)\Microsoft\Azure Functions Core Tools\workers\python\3.10\WINDOWS\X64\azure_functions_worker\dispatcher.py", line 493, in _handle__invocation_request
    call_result = await self._loop.run_in_executor(
  File "C:\Users\nandurisai.venkatara\AppData\Local\Programs\Python\Python310\lib\concurrent\futures\thread.py", line 52, in run
    result = self.fn(*self.args, **self.kwargs)
  File "C:\Program Files (x86)\Microsoft\Azure Functions Core Tools\workers\python\3.10\WINDOWS\X64\azure_functions_worker\dispatcher.py", line 762, in _run_sync_func
    return ExtensionManager.get_sync_invocation_wrapper(context,
  File "C:\Program Files (x86)\Microsoft\Azure Functions Core Tools\workers\python\3.10\WINDOWS\X64\azure_functions_worker\extension.py", line 215, in _raw_invocation_wrapper
    result = function(**args)
  File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\function_app.py", line 76, in crawling
    process.start()
  File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\scrapy\crawler.py", line 420, in start
    install_shutdown_handlers(self._signal_shutdown)
  File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\scrapy\utils\ossignal.py", line 28, in install_shutdown_handlers
    reactor._handleSignals()
  File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\twisted\internet\posixbase.py", line 142, in _handleSignals
    _SignalReactorMixin._handleSignals(self)
  File "C:\Users\nandurisai.venkatara\projects\ai-kb-bot\venv\lib\site-packages\twisted\internet\base.py", line 1281, in _handleSignals
    signal.signal(signal.SIGINT, reactorBaseSelf.sigInt)
  File "C:\Users\nandurisai.venkatara\AppData\Local\Programs\Python\Python310\lib\signal.py", line 47, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))

Function code:

@app.function_name(name="Crawling")
@app.queue_trigger(arg_name="azqueue", queue_name=AzureConstants.queue_name_crawl,                               connection="AzureWebJobsStorage")
@app.queue_output(arg_name="trainmessage",queue_name=AzureConstants.queue_name_train,connection="AzureWebJobsStorage")
def crawling(azqueue: func.QueueMessage,trainmessage: func.Out[str]):	
 	         url,depth=azqueue.get_body().decode('utf-8').split("|")	
	         crawl_websites_from_old(url,depth)
Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,555 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.