How to calculate Median Value

Sudip Bhatt 2,271 Reputation points
2020-12-13T19:32:57.437+00:00

Suppose i have table and one column has numeric value. now how could i calculate median value ?

i found one code from this url https://stackoverflow.com/questions/1342898/function-to-calculate-median-in-sql-server
but their approach is not clear

DECLARE @table AS TABLE
(
    Number int not null
);

insert into @table select 2;
insert into @table select 4;
insert into @table select 9;
insert into @table select 15;
insert into @table select 22;
insert into @table select 26;
insert into @table select 37;
insert into @table select 49;

DECLARE @Count AS INT
SELECT @Count = COUNT(*) FROM @table;

WITH MyResults(RowNo, Number) AS
(
    SELECT RowNo, Number FROM
        (SELECT ROW_NUMBER() OVER (ORDER BY Number) AS RowNo, Number FROM @table) AS Foo
)
SELECT AVG(Number) FROM MyResults WHERE RowNo = (@Count+1)/2 OR RowNo = ((@Count+1)%2) * ((@Count+2)/2)

1) RowNo = (@Count+1)/2 not clear
2) RowNo = ((@Count+1)%2) * ((@Count+2)/2) not clear

what they are doing?

any easy approach exist for median calculation? thanks

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,552 questions
{count} votes

Accepted answer
  1. MelissaMa-MSFT 24,176 Reputation points
    2020-12-15T02:48:31.737+00:00

    Hi @Sudip Bhatt ,

    Actually Erland has provided all the details.

    Only adding some notes here.

    this line not clear WHERE ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 )

    If it contains an odd number of values, it will refer 'where ROWASC =ROWDESC'.

    48136-2.png

    If it contains an even number of values, it will refer 'where ROWASC =ROWDESC+1 or ROWASC =ROWDESC-1 '.

    48124-1.png
    So this where condition will include both situations.

    PERCENTILE_CONT(0.5) why .5 is sending ? why not different value ?

    Syntax:
    PERCENTILE_CONT ( numeric_literal )
    WITHIN GROUP ( ORDER BY order_by_expression [ ASC | DESC ] )
    OVER ( [ <partition_by_clause> ] )

    You could send any decimal between 0.0 and 1.0.

    In your situation, PERCENTILE_COUNT(0.5) is similar to finding median.

    You could use below query to find out the differences.

    CREATE TABLE MEDIAN  
      (  
          Number int not null  
      );  
    
      insert into MEDIAN select 2;  
      insert into MEDIAN select 4;  
      insert into MEDIAN select 9;  
      insert into MEDIAN select 15;  
      insert into MEDIAN select 22;  
      insert into MEDIAN select 26;  
      insert into MEDIAN select 37;  
      insert into MEDIAN select 49;  
      insert into MEDIAN select 66;  
    
    SELECT Number,  
    PERCENTILE_CONT(0) WITHIN GROUP (ORDER BY Number) OVER () AS MINCOUNT,  
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY Number) OVER () AS QUARTTERCOUNT,  
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Number) OVER () AS MEDIANCONT ,  
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY Number) OVER () AS THREEQUARTTERCOUNT ,  
    PERCENTILE_CONT(1) WITHIN GROUP (ORDER BY Number) OVER () AS MAXCONT  
    FROM   MEDIAN   
    

    what WITHIN GROUP() does ?

    This specifies a list of numeric values to sort and compute the percentile over. Only one order_by_expression is allowed.

    why Over() is blank ?

    It divides the result set produced by the FROM clause into partitions to which the percentile function is applied.

    In your case, there is only one column number, then you could leave it as blank.

    If you have another column, for example Groupid, then you need to put 'PARTITION BY Groupid' inside OVER().

    Please refer below example:

    DROP TABLE IF EXISTS MEDIAN  
    
     CREATE TABLE MEDIAN  
      (  
          Groupid int not null,  
          Number int not null  
      );  
    
      insert into MEDIAN select 1,2;  
      insert into MEDIAN select 1,4;  
      insert into MEDIAN select 1,9;  
      insert into MEDIAN select 1,15;  
      insert into MEDIAN select 2,22;  
      insert into MEDIAN select 2,26;  
      insert into MEDIAN select 2,37;  
      insert into MEDIAN select 2,49;  
      insert into MEDIAN select 2,66;  
    
    SELECT Number,  
    PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Number) OVER (PARTITION BY Groupid) AS MEDIANCONT   
    FROM   MEDIAN   
    

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    Hot issues November--What can I do if my transaction log is full?
    Hot issues November--How to convert Profiler trace into a SQL Server table


3 additional answers

Sort by: Most helpful
  1. Erland Sommarskog 100.9K Reputation points MVP
    2020-12-13T20:24:50.367+00:00

    The median is the middle value of a range. If the number is even, you may want the average of the range. If the values are 1, 2, 7, 8, 1000 the median is 7. If the values are 1, 2, 7, 8, 980, 1000, the median is said to be 7.5.

    The solution you post numbers the rows, and then they select the one or two numbers in the middle and take the average of them.

    However, since SQL 2012 there is a more direct way of computing the median, but strangely still not as a true aggregate function_

     DECLARE @table AS TABLE
     (
         Number int not null
     );
    
     insert into @table select 2;
     insert into @table select 4;
     insert into @table select 9;
     insert into @table select 15;
     insert into @table select 22;
     insert into @table select 26;
     insert into @table select 37;
     insert into @table select 49;
    
     SELECT DISTINCT 
            percentile_cont(0.5) WITHIN GROUP(ORDER BY Number) OVER(),
            percentile_disc(0.5) WITHIN GROUP(ORDER BY Number) OVER()
     FROM   @table
    

    The two function serve the same purpose, but the result is different when there is a even number of values. _cont (for continuous) gives the average of the two middle values, whereas _disc (for discrete) gives you the lowest of the two values.

    0.5 says that you want the median, but you can say 0.1 to get deciles etc.

    1 person found this answer helpful.

  2. MelissaMa-MSFT 24,176 Reputation points
    2020-12-14T02:46:12.45+00:00

    Hi @Sudip Bhatt ,

    In SQL calculation, '/' means divide the left operand by the right operand. For example: 23/10=2 and 9/2=4. '%' means divide the left operand by the right operand and return the remainder. For example: 5%2=1 and 9%2=1.

    If the dataset contains an odd number of values, than the middle value of the entire dataset will be considered as a median.
    Median (M) = value of ((X + 1)/2) th item. (x is the number of values in the dataset)

    For example, if we calculate the median of the dataset {1,2,3,4,6,7,8} by using the above formula, then Median (M)= (7+1)/2=4. So, the fourth value of the specified dataset, which is 4 in this case, is the median.

    If the dataset contains an even number of rows, we must determine the two middle values of this dataset, add them together, and divide the result by two.
    Median (M) = value of [(x/2) th item + (x/2 + 1) th item]/2 (x is the number of values in the dataset)

    For example, if we apply this formula to the dataset {1,2,4,6,8,10}, then the median value is calculated as shown below:
    Median (M)= [ 6/2 ] = 3rd value of the dataset + [ 6/2 + 1 ]= 4th value of the dataset.= (4+6)/2 = 5. So, the median value in this case is 5.

    1) RowNo = (@Count+1)/2 not clear
    2) RowNo = ((@Count+1)%2) * ((@Count+2)/2) not clear

    So if there is an odd number of values, (@Count+1)/2 could be the correct value which is actually the middle one.

    If there is an even number of values, we need to have the two middle values,(@Count+1)/2 is the smaller one, ((@Count+1)%2) * ((@Count+2)/2) is the other bigger one.

    any easy approach exist for median calculation?

    We could have another method using Row_number() or rank() as below:

    CREATE TABLE MEDIAN  
     (  
         Number int not null  
     );  
    
     insert into MEDIAN select 2;  
     insert into MEDIAN select 4;  
     insert into MEDIAN select 9;  
     insert into MEDIAN select 15;  
     insert into MEDIAN select 22;  
     insert into MEDIAN select 26;  
     insert into MEDIAN select 37;  
     insert into MEDIAN select 49;  
    
    SELECT AVG(number) AS MEDIAN   
    FROM   (SELECT number,   
                   ROW_NUMBER()   
                     OVER (   
                       ORDER BY number ASC) AS ROWASC,   
                   ROW_NUMBER()   
                     OVER (   
                       ORDER BY number DESC) AS ROWDESC   
            FROM   MEDIAN ) X   
    WHERE  ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 )   
    

    As mentioned by Erland, you could also use the PERCENTILE_CONT function (SQL Server 2012 and later versions). It is capable of calculating the median within a partitioned set. It calculates the median when we pass 0.5 as an argument and specify the order within that dataset.

    SELECT Number,  
           PERCENTILE_CONT(0.5)   
             WITHIN GROUP (ORDER BY Number) OVER () AS MEDIANCONT   
    FROM   MEDIAN   
    

    In addition, you could also create a user-defined Function to calculate the median value of the specified dataset.

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    Hot issues November--What can I do if my transaction log is full?
    Hot issues November--How to convert Profiler trace into a SQL Server table

    1 person found this answer helpful.

  3. Erland Sommarskog 100.9K Reputation points MVP
    2020-12-14T22:13:21.15+00:00

    1) this line not clear WHERE ROWASC IN ( ROWDESC, ROWDESC - 1, ROWDESC + 1 ) what this line is doing?

    That is the same as

    ROWASC = ROWDESC OR ROWASC = ROWDESC -1 OR ROWASC = ROWDESC + 1
    

    ROWASC are the rows numbered in ascending order, ROWDESC numbers them in descending order. Where they meet, you find the median. If there is an odd number of values, there is one value where ROWDESC = ROWASC, but no rows where the other two conditions are true. The opposite applies when there is an even number of values.

    2) PERCENTILE_CONT(0.5) why .5 is sending ? why not different value ?

    Because you asked for the median. I tried to briefly explain this in my post, but it appears that there were some formatting incident. You could pass 0.1 for the decile (that is, the 10% percentile) or something else.

    3) WITHIN GROUP (ORDER BY Number) OVER ()

    >

    what WITHIN GROUP does ?
    why Over is blank ?

    Over is blank, because there was only one column in your table, so there was nothing to partition on. But say that you have a table with lots of rows per customer, and you want the median of some value per customer. Then you would say OVER(PARTITION BY CustomerID).

    WITHIN GROUP then specifies which value you are seeking the percentile of. That is, order the group by this value.

    1 person found this answer helpful.