scrapping site through azure webapp

Question

scrapping site through azure webapp

Ludovic LAURENT 0

HI
i have created a python program on my computer to scrap an url , it works

but if i push it in azure in a web application , it doesn't work .

1 / if i run it through a SSH session,  i got a 403 error   

2 / i try the curl -I  command on my url  , i have same result  :  error 403

3 / I tried to add header in the request.get command , but same result : error 403

Have you some idea to solve this problem

my code is :

import requests

response = requests.get("https://www.proclinic.es/tienda/020-195-unitwin-roth-022-5-5-s-i-gnch-3.html")

print(response)

print(response.content)

thanks

1 answer

Your answer

Answer 1

@Ludovic LAURENT The 403 is a forbidden error, which implies that the web server you are trying to scrape is blocking your request.

Certain sites will block IP address ranges from VPN providers or cloud service providers, especially if the traffic coming from these ranges does not via an anticipated method such as requests that do not contain the user-agent header.

Can you please see if adding a user agent header to mimic a browser request resolves the issue?

import requests

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

response = requests.get("https://www.proclinic.es/tienda/020-195-unitwin-roth-022-5-5-s-i-gnch-3.html", headers=headers)

print(response)
print(response.content)

If adding the header works, your target website was only allowing browser level requests.

If the header does not work, see if you are able to scrape another website. Be sure to check the website's robots.txt file to ensure they have not blocked scrapping requests. If another website works, then it's likely the original site is blocking cloud provider IP addresses.

I hope this helps you with your project.

Share via

scrapping site through azure webapp

1 answer

Your answer