Group BY gives Wrong result in Azure Synapse Dedicated SQL Pool

Question

Group BY gives Wrong result in Azure Synapse Dedicated SQL Pool

KDP-User 0

We have a Azure Synapse Dedicated SQL Pool, and we run the above query in the datawarehouse.
But we get the result wrong and there is no reason for this simple aggregation goes wrong.

Appreciate if we can find the reason and the root cause for this issue.
For this example, in the first result aggregates from driver_id. We have two driver_ids (704162 and 0 ) and the final result in the second result shows only the count for driver_id 704162.

So strange this simple aggregate shows like this.
User's image

phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-03-27T07:27:28.25+00:00

@KDP-User

Thanks for reaching out to Microsoft Q&A

Based on the SQL query and results you’ve shared, it seems like the issue might be related to how SQL is handling the driver_id of 0. In some databases, a value of 0 can be treated as NULL or it might be getting filtered out due to some condition in your data or query. Here are a few things you could check:

NULL or Zero Values: Check if there are any filters or conditions that exclude driver_id 0, and ensure that the data type and values for driver_id are consistent and accurate.

Data Consistency: Verify the data in your fact.netBI_tripStops, dim.Trip, and dim.Date tables. Ensure that there are no inconsistencies or anomalies, especially related to the driver_id.

Query Logic: Review the logic of your SQL query. Make sure that the joins and where clause conditions (like ts.is_timing_point = 1 and ts.date_key BETWEEN 20210101 AND 20210131) are not causing the driver_id 0 to be excluded from the results.Hope this helps. Do let us know if you any further queries.
KDP-User 0 Reputation points

2024-03-28T00:33:25.7366667+00:00

Hi @phemanth
Appreciate the reply.
All in both DWs are same. checked partitioning, Checked the row counts etc.
No filters to exclude driverid =0 above image has the result set.

If we use this with in a temp table it shows the expected result in target DW.
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-03-29T06:08:42.4933333+00:00

@KDP-User We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

1 answer

Your answer

phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-03-27T07:27:28.25+00:00

@KDP-User

Thanks for reaching out to Microsoft Q&A

Based on the SQL query and results you’ve shared, it seems like the issue might be related to how SQL is handling the driver_id of 0. In some databases, a value of 0 can be treated as NULL or it might be getting filtered out due to some condition in your data or query. Here are a few things you could check:

NULL or Zero Values: Check if there are any filters or conditions that exclude driver_id 0, and ensure that the data type and values for driver_id are consistent and accurate.

Data Consistency: Verify the data in your fact.netBI_tripStops, dim.Trip, and dim.Date tables. Ensure that there are no inconsistencies or anomalies, especially related to the driver_id.

Query Logic: Review the logic of your SQL query. Make sure that the joins and where clause conditions (like ts.is_timing_point = 1 and ts.date_key BETWEEN 20210101 AND 20210131) are not causing the driver_id 0 to be excluded from the results.Hope this helps. Do let us know if you any further queries.
KDP-User 0 Reputation points

2024-03-28T00:33:25.7366667+00:00

Hi @phemanth
Appreciate the reply.
All in both DWs are same. checked partitioning, Checked the row counts etc.
No filters to exclude driverid =0 above image has the result set.

If we use this with in a temp table it shows the expected result in target DW.
phemanth 15,765 Reputation points Microsoft External Staff Moderator

2024-03-29T06:08:42.4933333+00:00

@KDP-User We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet. In case if you have any resolution please do share that same with the community as it can be helpful to others. Otherwise, will respond with more details and we will try to help.

Answer 1

@KDP-User

Use ROLLUP, GROUPING SETS, or CUBE with GROUP BY: These are all extensions of GROUP BY that allow you to specify how you want the data to be grouped and aggregated. By using one of these extensions, you can explicitly define how you want the duplicates to be handled.
Add a DISTINCT clause to your GROUP BY: This will cause the query to return only distinct values for the columns specified in the GROUP BY clause.

Here is an example of how you can modify your query to use the DISTINCT clause:

SELECT driver_id, COUNT(DISTINCT 1)
FROM fact.netBI_tripStops AS ts
JOIN dim.Trip AS t ON t.trip_key = ts.trip_key AND source_key = 1
JOIN dim.Date AS d ON ts.date_key = d.date_key
WHERE ts.is_timing_point = 1
AND ts.date_key BETWEEN 20210101 AND 20210131
AND ts.trip_key in (-3738457970739693310)
GROUP BY driver_id

This query will count the number of distinct rows for each driver_id.

For more information on how GROUP BY works in Azure Synapse Dedicated SQL Pool, you can refer to the Microsoft documentation on GROUP BY with ROLLUP, GROUPING SETS, or CUBE: https://learn.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16

Share via

Group BY gives Wrong result in Azure Synapse Dedicated SQL Pool

1 answer

Your answer