Client Sraper - Azure MVC

JStus 41 Reputation points
2022-09-09T15:56:40.39+00:00

I think the answer is no, but I'll ask anyway. I've created a simple MVC application (hosted in Azure) that uses Selenium to scrap a specific website. Essentially, when running on my local computer, the program launches my Chrome and collects some information from another website. Of course it doesn't work when I deploy my app to Azure. Is there a way to scrap a website using javascript? I have seen several examples using

require('selenium-webdriver')

but I couldn't get it to work.

ASP.NET
ASP.NET
A set of technologies in the .NET Framework for building web applications and XML web services.
3,288 questions
Azure App Service
Azure App Service
Azure App Service is a service used to create and deploy scalable, mission-critical web apps.
6,958 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Bruce (SqlWork.com) 56,931 Reputation points
    2022-09-09T16:30:35.58+00:00

    You would use the headless mode. Pick a driver that matches your requirements.


  2. VenkateshDodda-MSFT 18,851 Reputation points Microsoft Employee
    2022-09-14T12:04:49.34+00:00

    @JStus Thank you for reaching out to Microsoft Q&A, based on the above information shared we have understood that you want to scrap a website from your application (hosted on azure) could you please help us whether you are leveraging the Azure App service or Azure VM to host you application?

    • If you are hosting your application using Azure app service as @Bruce (SqlWork.com) suggested you need to use headless mode to scrap a website and Whatever Chrome or Chromium with headless or non-headless, they all require GDI support. However, on Azure App Services on Windows, it conflicts with Win32k.sys (User32/GDI32) Restrictions, as the figure below.

    240909-image.png

    • And other frameworks like PhantomJS/Selenium also be restricted by it, see below.

    240989-image.png

    So, you cannot use chromium within Azure WebApp on Windows.

    Alternatively, Checkout this blog where anthonychu one of the PM shares insights on [Azure Linux function app] (https://anthonychu.ca/post/azure-functions-headless-chromium-puppeteer-playwright/) to run headless chromium with Puppeteer and playwright.


  3. Bruce (SqlWork.com) 56,931 Reputation points
    2022-09-20T20:38:13.59+00:00

    if you are not tied to coding in C#, you could use node and Playwright (a Microsoft npm package)

    https://www.npmjs.com/package/playwright

    a sample of azure function (node) using playwright:

    https://dotnetthoughts.net/running-playwright-on-azure-functions/

    a sample of azure static web app

    https://nitya.github.io/learn-playwright/003-aswa-demo-app/

    0 comments No comments