The Azure Database for PostgreSQL flexible server in our production environment has been down for over 20 hours. No connection can be made, whether through the Azure Portal, via pgAdmin4, or through the application.
There was a critical health event when our issues begun. It has logs:
"properties": {
"title": "Unknown Reason",
"details": "An unexpected problem is preventing restoring access to your Azure Database for PostgreSQL - Flexible server. We are working on resolving the problem.",
"currentHealthStatus": "Available",
"previousHealthStatus": "Available",
"type": "Downtime",
"cause": "PlatformInitiated"
},
A similar issue happened with our QA environment last month, in which it was down for approx. 4 hours. A support engineer reached out to me and fixed the problem, after saying that the issue was with the Azure backend, and not something that we ourselves could solve
What we have tried to troubleshoot:
- Restarting the server - This failed with an Internal Server Error
- Stopping the server completely - The server is still in a stopping state, and has been for over an hour. The in progress health event has:
"properties": {
"title": "Stopped",
"details": "Your Azure Database for PostgreSQL - Flexible server is stopped state",
"currentHealthStatus": "Unavailable",
"previousHealthStatus": "Unavailable",
"type": "Downtime",
"cause": "UserInitiated"
}
Even though the log is titled "Stopped" the status on the overview page is still "Stopping"
This is all happening in only one of our three environments (separate resource groups). All three database servers have the same settings (Burstable, Standard B1ms compute, 32 GB storage). Other environments are working fine.
Our region is Australia East with 1 availability zone. High availability is not enabled.
What could be causing this issue? As this is our production database, this is quite important.