scrapping site through azure webapp

Ludovic LAURENT 0 Reputation points
2024-12-20T13:06:30.8433333+00:00

HI      
i have created a python program on my computer to scrap an url , it works 

 

  but if i push it in azure in a web application , it doesn't work .

1 / if i run it through a SSH session,  i got a 403 error   

2 / i try the curl -I  command on my url  , i have same result  :  error 403   

    3 / I tried to add header   in the request.get command  , but same result : error 403 

 

    Have you some idea to solve this problem   

 my code is  :

import requests

response = requests.get("https://www.proclinic.es/tienda/020-195-unitwin-roth-022-5-5-s-i-gnch-3.html")

print(response)

print(response.content)

thanks

Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
8,178 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. brtrach-MSFT 17,166 Reputation points Microsoft Employee
    2024-12-20T23:22:56.9066667+00:00

    @Ludovic LAURENT The 403 is a forbidden error, which implies that the web server you are trying to scrape is blocking your request.

    Certain sites will block IP address ranges from VPN providers or cloud service providers, especially if the traffic coming from these ranges does not via an anticipated method such as requests that do not contain the user-agent header.

    Can you please see if adding a user agent header to mimic a browser request resolves the issue?

    import requests
    
    headers = {
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    }
    
    response = requests.get("https://www.proclinic.es/tienda/020-195-unitwin-roth-022-5-5-s-i-gnch-3.html", headers=headers)
    
    print(response)
    print(response.content)
    
    
    

    If adding the header works, your target website was only allowing browser level requests.

    If the header does not work, see if you are able to scrape another website. Be sure to check the website's robots.txt file to ensure they have not blocked scrapping requests. If another website works, then it's likely the original site is blocking cloud provider IP addresses.

    I hope this helps you with your project.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.