Connectivity issues between AKS and PostgreSQL Flexible Server

Paul Breton 81 Reputation points
2023-08-21T08:47:29.18+00:00

Hi all,

I have a complex problem to diagnose and solve. I have an AKS with pods connecting to an Azure Database for PostgreSQL Flexible Server. I deployed it first on a test environment and everything worked well. Then, in the production environment "everything" went wrong :

  • Pods are regularly killed by Kubernetes because their Healthcheck is taking too much time to answer
  • Pods are Spring boot Applications, just trying to connect to their PostgreSQL database on startup
  • Connections are using a Hikari pool of roughly (by default) ~10 sessions.
  • The healthcheck triggers a communication to ensure that the pod can connect to the PostgreSQL Server, which fails after ~30 secs (timeout), randomly (sometimes it does not fail, but it always fails within ~5-10 mins)
  • Because many different applications are using the same PostgreSQL Server, it actually hosts around 180~ sessions.

I tried many different things to diagnose the problem, and the conclusion always remains the same : connecting from the pods in this AKS ends in timeout (even manually with psql), whereas connecting from outside the pod works perfectly.

The main track I have is that the AKS and the PostgreSQL Server are not on the same Availability Zone (but on the same Region though, West Europe). AKS is not configured with a precise Zone, but PostgreSQL Flexible is configured to be on the AZ number 2. During my tests, I found that my AKS could correctly discuss with another PostgreSQL Flexible which was on AZ 3 (from another environment, the test environment).

Is there a possibility where there would be connectivity issues between Azure services located in two different Availability Zones ? Would it be a solution to migrate my PostgreSQL Server to the AZ number 3 ?

Thank you for your help,

Azure Kubernetes Service
Azure Kubernetes Service
An Azure service that provides serverless Kubernetes, an integrated continuous integration and continuous delivery experience, and enterprise-grade security and governance.
2,458 questions
Azure Database for PostgreSQL
{count} vote

Accepted answer
  1. Carlos Villagomez 1,106 Reputation points Microsoft Employee Moderator
    2023-08-23T14:46:35.44+00:00

    @Paul Breton
    How do I troubleshoot connectivity issues between AKS and PostgreSQL Flexible Server?
    After checking the details the best will be to open a ticket with Support.

    There are many elements to be reviewed for this type of integrations starting by

    https://supportability.visualstudio.com/AzureDBPostgreSQL/_wiki/wikis/AzureDBPostgreSQL/503215/How-to-check-connectivity-errors

    https://supportability.visualstudio.com/AzureContainers/_wiki/wikis/Containers%20Wiki/540934/How-to-troubleshoot-connectivity-issues-with-databases-from-AKS-clusters

    If it is possible to get the error messages could be a good point for starting the investigation.

    • Kubectl get events
    • Kubectl describe pods
    • Kubectl describe enpoints
    • etc.
    • Only difference is the agents sizes : "Standard_D2_v3" in test and "Standard_DS4_v2" in prod.
    • Concerning PostgreSQL, they tested different configurations with the 4 components : AKS test (no AZ), AKS prod (no AZ), PG test (AZ 3) and PG prod (AZ 2).
    • AKS test -> PG test : works perfectly
    • AKS test -> PG prod : works perfectly
    • AKS prod -> PG test : works perfectly AKS prod -> PG prod : works but with latency, timeouts and error.

    To further troubleshoot this issue we're going to need to look at your resources in more detail. Please email the following to AzCommunity@microsoft.com and we'll get back to you promptly:

    • Subject: "Attn: kobulloc - Additional support required"
    • Email body: Your Subscription ID
    • Email body: A link to this thread so we can validate and expedite the request

    If you don't receive a response within 24 hours, please reply to the thread so we can investigate.

    1 person found this answer helpful.

1 additional answer

Sort by: Most helpful
  1. Paul Breton 81 Reputation points
    2023-08-24T12:38:33.7466667+00:00

    Update : I observed that these problems did not happen for PostgreSQL Flexible Servers on AZ 1. I was planning to migrate my server from AZ 2 to AZ 1 using High Availability and Planned Failover. Problem is, even this fails. I cannot enable High Availability on AZ 1 because it fails with a timeout error ! Exactly the same as when I try to connect to it from the AKS. I tried many times, and it always has the same result after 2 hours trying to create the replica : timeout error.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.