Share via

Database Failed

Mark Davis 0 Reputation points
2025-09-15T11:12:57.2933333+00:00

We have observed repeated failures on our production database starting from 11th September 2025. The database has failed every day for the past five consecutive days. This pattern is concerning for service continuity and may indicate an underlying resource or configuration issue.

Impact:

Service reliability is at risk due to repeated outages

Potential disruption to dependent applications and users

Risk of data access delays or downtime during business-critical operations

Request: Please investigate the Resource Health for this database instance to identify the root cause of these repeated failures. We would like to understand:

The specific cause(s) of the failures

Any related alerts or incidents in the service backend

Recommended remediation steps to prevent recurrence

Additional Details:

Service: common-rm-qa-psql-01

Region: North Europe

Failure Dates: 11th September 2025 – 15th September 2025 (daily)

09/15/2025 1 health event(s) 1 health event(s)
09/15/2025 1 health event(s)1 health event(s)
09/14/2025 2 health event(s)2 health event(s)
09/13/2025 1 health event(s)1 health event(s)
09/12/2025 4 health event(s)4 health event(s)
09/11/2025 11 health event(s)11 health event(s)

Environment: QA

Priority: High – QA Impact

Azure Database for PostgreSQL

1 answer

Sort by: Most helpful
  1. Anonymous
    2025-09-16T09:13:28.7733333+00:00

    Hi Mark Davis,

    As part of our discussion over the private messaging, our internal team identified the server reported as unhealthy.

    Mitigation: The PostgreSQL container which was repeatedly restarting due to input/output errors related to the data drive. Despite these container failures, the data drive itself remained in a normal state. Various related services and containers were restarted as part of the recovery process, and the overall server status was eventually restored to a healthy state.

    The database is now back online, so you should not see any further failures as the backend team has mitigated the issue. As you confirmed, the database is accessible again. Since you have requested the RCA for the unavailability, I am actively following up with the PG team and will share the update with you as soon as I receive it.

    Thanks for your patience!

    Regards,

    Kalyani

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.