Azure Function that uses scrapy returns 404 with correct URL

Lykos, Manos 0 Reputation points
2025-03-18T17:15:08.7233333+00:00

Hello,

I have the following Function that uses Scrapy in order to crawl data from a specific site

import azure.functions as func 
import os 
import datetime 
import json 
import logging 
import subprocess 
import sys  

sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), 'X_crawler')))  

from scrapy.crawler import CrawlerProcess 
from scrapy.utils.project import get_project_settings  
from X_crawler.spiders.X import X    

app = func.FunctionApp()  

@app.route(route="crawl_X", auth_level=func.AuthLevel.ANONYMOUS) 
def crawl_X(req: func.HttpRequest) -> func.HttpResponse:
     logging.info("Scrapy Azure Function triggered.")

      # Set the project file path
     os.environ.setdefault('SCRAPY_SETTINGS_MODULE', 'X_crawler.settings')

     # Load settings from scrapy.cfg
     print(get_project_settings())
     process = CrawlerProcess(get_project_settings())
     process.crawl(X)
     process.start()
    
 	 # Blocking call
     return func.HttpResponse('Crawling completed successfully9', status_code=200) 

The problem is that with this code when I invoke the URL I get a 404 and I also do not get any information from Log Stream like the request never happened. Also when I run the function locally using "func start" it runs as expected.

When I comment the last 3 imports + the necessary code to keep only the logging line and the return statement the function runs successfully. Therefore, I think that this has to do with those imports. However, why that happens, why I'm not getting any information and how can I fix them but keep crawler's code into a separate module if possible?

PS. Also not that I get the same error just by importing scrapy.crawler and scrapy.utils

Azure Functions
Azure Functions
An Azure service that provides an event-driven serverless compute platform.
5,911 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.