IOPS postgresql monitoring - meaning of absolute values in graphs

Ľubomíra Trnavská 0 Reputation points
2023-03-16T14:24:36.54+00:00

Hi.
I have a slight problem understanding the metrics for IOPS.

What do the absolute values/percentages in graphs represent?

These are the absolute values
User's image

This is the percentual usage
User's image

Both graphs are from the same time span and with 5-minute granularity.

The server`s specs are:

  • Azure Database for PostgreSQL flexible server
  • General Purpose, D8s_v3, 8 vCores, 32 GiB RAM, 1024 GiB storage
    - 5000 IOPS
  • plus one readonly replica with the same specs

If the first graph shows real IOPS (average per second) and we have the specs 5000, then the usage before 2PM -> 10+K should reach 100% (or more).
We can see a peak in the second graph as well but it is below 50%, so the disk should not be overloaded according to the second graph.

The question is:
What do the values in the first graph represent? What does the avg of absolute values of IOPS mean? Are IOPS logged and measured every second -> the first graph is the real representation? If yes, then what does the second graph represent? Why is there such low percentual usage? Which graph is more relevant to see if we are experiencing issues with IOPS?



We have tried to understand it better using the disk depth queue metric, but the units/measurements are unclear. What do the absolute values for disk depth queue mean?
Did < 40 disk IO operations wait in the queue at the peak before 2PM? How was this measured?
User's image

Azure Database for PostgreSQL
{count} votes

1 answer

Sort by: Most helpful
  1. Oury Ba-MSFT 15,821 Reputation points Microsoft Employee
    2023-03-29T17:03:55.3266667+00:00

    @Ľubomíra Trnavská

    Thank you for being patient while working on this.

    Problem:

    Discrepancy between the IOPS metric and the Disk IOPS Consumed Percentage metric.

     Solution:

    To clarify,

    • The IOPS metric measures the Input/Output operations per second, calculated directly from the Linux diskstats command on Postgres VM.
    • On the other hand, the Disk IOPS Consumed Percentage is a saturation metric derived from the default Azure VM Storage IO Utilization Metric. details here

    During testing, we discovered an issue where the Data Disk IOPS Consumed Percentage was reaching 100%, even though the Read IOPS and Write IOPS were well below the storage maximum of 1000 IOPS. The Azure platform team identified this as a platform bug affecting the Percentage consumed metrics for the disks.

    We are happy to inform you that the team has fixed the issue, and the fix has been pushed to resolve it. However, it will take a few weeks to get fully rolled out to all our production.

    Please don't forget to mark this as accept answer if the reply was helpful.

    Regards,

    Oury