Group BY gives Wrong result in Azure Synapse Dedicated SQL Pool

KDP-User 0 Reputation points
2024-03-26T06:54:10.5+00:00

We have a Azure Synapse Dedicated SQL Pool, and we run the above query in the datawarehouse.
But we get the result wrong and there is no reason for this simple aggregation goes wrong.

Appreciate if we can find the reason and the root cause for this issue.
For this example, in the first result aggregates from driver_id. We have two driver_ids (704162 and 0 ) and the final result in the second result shows only the count for driver_id 704162.

So strange this simple aggregate shows like this.
User's image

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,336 questions
{count} votes

1 answer

Sort by: Most helpful
  1. phemanth 5,490 Reputation points Microsoft Vendor
    2024-03-28T10:30:56.09+00:00

    @KDP-User

    1. Use ROLLUP, GROUPING SETS, or CUBE with GROUP BY: These are all extensions of GROUP BY that allow you to specify how you want the data to be grouped and aggregated. By using one of these extensions, you can explicitly define how you want the duplicates to be handled.
    2. Add a DISTINCT clause to your GROUP BY: This will cause the query to return only distinct values for the columns specified in the GROUP BY clause.

    Here is an example of how you can modify your query to use the DISTINCT clause:

    SELECT driver_id, COUNT(DISTINCT 1)
    FROM fact.netBI_tripStops AS ts
    JOIN dim.Trip AS t ON t.trip_key = ts.trip_key AND source_key = 1
    JOIN dim.Date AS d ON ts.date_key = d.date_key
    WHERE ts.is_timing_point = 1
    AND ts.date_key BETWEEN 20210101 AND 20210131
    AND ts.trip_key in (-3738457970739693310)
    GROUP BY driver_id
    
    

    This query will count the number of distinct rows for each driver_id.

    For more information on how GROUP BY works in Azure Synapse Dedicated SQL Pool, you can refer to the Microsoft documentation on GROUP BY with ROLLUP, GROUPING SETS, or CUBE: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16

    0 comments No comments