Web scraper azure setup

Monteiro Del Prete 20 Reputation points
2025-03-20T08:10:28.2533333+00:00

I developed a web scraper that operates on several days. I've tried container app as execution environment and it ran for 10 days with a very strange pattern.

immagine

As you can see from above, suddenly the CPU percentage usage started going up and down. During this time the scraper responses were sporadic, as the execution was hiccup-paused. Then, without any other message of conclusion I got this CPU timeline

immagine

I was expecting a database insertion at the end of the scraping process, but the only received message is that it is waiting for new requests (the scraper is API base and starts with a specific API call).

Container configuration:

  • 4 CPU cores
  • 8Gi memory size

immagine

Azure Container Apps
Azure Container Apps
An Azure service that provides a general-purpose, serverless container platform.
691 questions
{count} votes

Accepted answer
  1. Arko 4,150 Reputation points Microsoft External Staff Moderator
    2025-03-28T13:58:52.2433333+00:00

    It seems like you are dealing with GC-related pauses or memory fragmentation that increase over time (a "slow leak" scenario). This is especially common in long-running Python, Java, or Node.js apps doing heavy in-memory operations. If memory usage spikes or objects are held in memory unnecessarily (e.g., large lists/dicts), GC eventually struggles to clean up.

    Would recommend you enable GC Profiling / Logs:

    • If using Python: use gc.set_debug(gc.DEBUG_STATS) and log to stdout.
    • If using .NET: enable GC ETW events or use Diagnostic Tools.
    • This can confirm if GC activity aligns with the drop in CPU/network.

    one more workaround is since this happens every 2 days, you can schedule a job restart every 48 hours as a temporary workaround via an automation rule or CRON-triggered stop/start. Add liveness probes if not already configured — so the system can restart the job if it becomes unresponsive.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.