Connection issues with Postgres flexible server & PGBouncer

William De Prêtre 0 Reputation points
2025-12-13T21:22:18.9633333+00:00

Our API (sensotec-app-rekenapps-api-productie) has been having a lot of issues with connection issues leading to very disgruntled end-users and a steady flow of users abandoning the product.
This API connects to a PostgreSQL database (sensotec-psql-rekenapps-production) and it's there that the problems start.

Each morning thousands of people log in and around 9 am the first performance issues show up. Response times go from milliseconds to seconds to minutes, users start to hit Refresh and the system slows down even more. Initially we used the instance connection pools but then we switched to the builtin PGBouncer. We started with Pooling=false and a fairly large PGBouncer pool (10k on Friday) but we keep seeing errors : too many clients and transient network failures. Assuming the network failures are a consequence of the max clients being reached we have done the following:

Increase PGBouncer connections was set to 50k, default pool size to 300.
DB Connections now have Pooling=True
Maximim Pool Size = 20
Minimum Pool Size = 2
Connection Idle Lifetime = 300
Connection Pruning interval = 10

Few people use the application during the weekend so the real test will be on Monday.

Can somebody assist us with checking if this configuration is best practice? (Some articles on MSDN recommend Pooling=False).
Can somebody check the metrics on Monday and help us determine if there is another underlying cause that creates these problems?
Kind regards,
William

Azure Database for PostgreSQL
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. VRISHABHANATH PATIL 2,310 Reputation points Microsoft External Staff Moderator
    2025-12-15T03:15:10.82+00:00

    Hi

    It sounds like you're facing some pretty significant connection issues with your API and PostgreSQL database that are affecting user experience. Here’s a breakdown of your current configuration and a few recommendations based on best practices that might help mitigate those connection issues.

    Current Configuration Review

    PGBouncer Settings:

    • Pooling: You switched to PGBouncer with Pooling=True, which is a good way to manage connection limits.
      • Max Pool Size: Setting it to 20 with a minimum pool of 2 and idle lifetime of 300 seconds is generally reasonable, but you might need to adjust based on the performance test outcomes.
      Connection Issues: The errors like "too many clients" suggest that you might be hitting the maximum connection limits set by your Azure PostgreSQL flexible server tier.

    Recommendations

    Increase Max Connections: Make sure the max connection limit is properly set for your compute tier. You can monitor how close you are to this limit in the Azure portal under Metrics. If you frequently hit this limit (for your tier), consider scaling to a higher compute tier that allows more connections.

    PgBouncer Configuration:

    • Session Pooling vs Transaction Pooling: Given that you are using transaction pooling (default for Azure's PGBouncer), this is usually the recommended pooling mode. It should help manage connections effectively. Keeping Pooling=False isn't advisable, as it would lead to unnecessary resource consumption and can't handle high concurrency.

    Review Resource Metrics:

      - Monitor your **CPU** and **Memory** usage to ensure you are not maxing out the resources. High CPU/memory use can lead to slow responses and timeouts.
      
         - Use tools like Azure's Query Performance Insights to identify any slow queries that could be hogging connections.
         
         **Diagnostic Tools**: Run the diagnostics tool provided within the Azure portal to check for user or system errors.
         
         **Check Network Configuration**: Ensure the security groups and firewall settings are correctly configured to allow traffic on port 6432 for PgBouncer.
         
         **Maintenance Windows**: Set up custom maintenance windows to align with your expected workload, especially if you face issues during peak times (like around 9 am according to your observations).
         
    

    Follow-Up Questions

    To provide more tailored assistance, it might help to get answers to the following questions:

    • What is the compute tier of your PostgreSQL flexible server?
    • Are there any specific errors logged that occur more frequently or seem to correlate with the performance issues?
    • Have you analyzed any specific time frames for peak usage beyond just 9 am?
    • Is there a pattern in user behavior that coincides with connection failures (e.g., specific features or functions being accessed)?

    Resources

    Feel free to update with any additional details or questions! Hope this helps you optimize your configuration!

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.