Databricks SQL warehouse with inconsistent order of data

Question

Databricks SQL warehouse with inconsistent order of data

Sandeep 21

We have a use case where clients are reading the data from Databricks SQL warehouse.

Client fire a query to read 1000 records per request from a table of 60k records.

Below is the sample query they use

--> select id,name,city from employee where name is not null LIMIT 1000 OFFSET 1000

When client calls 60times to read the complete data set they are seeing duplicates as though the actual data set is not having any duplicates.

What i read is when we use offset clause order-by is mandatory but we cannot ask the consumers to do that. Is there is way where we can maintain the ordering with in warehouse than asking client to update the query ?

PRADEEPCHEEKATLA 91,866 Reputation points

2023-11-15T06:38:32.53+00:00

@Sandeep - Just checking in to see if the below answer provided by @Boris Von Dahle helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

PRADEEPCHEEKATLA 91,866 Reputation points

2023-11-15T06:38:32.53+00:00

@Sandeep - Just checking in to see if the below answer provided by @Boris Von Dahle helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

Boris Von Dahle 3,226

Hello,

In order to maintain the ordering, you could create a view or a materialized view on the employee table that includes an ORDER BY clause :

CREATE VIEW ordered_employee AS
SELECT id, name, city
FROM employee
WHERE name IS NOT NULL
ORDER BY id;

Otherwise if you have control over the API or middleware that clients use to access the Databricks SQL warehouse, you could implement a layer that automatically adds an ORDER BY clause to incoming queries.

Also you could create a stored procedure that clients can call instead of querying.

Hope this helps

If this answer helped you, please accept it, so others can benefit from it too.

Regards

Sandeep 21 Reputation points

2023-11-15T14:40:23.5166667+00:00

Thanks for your reply @Boris Von Dahle. This is one of the plans we have but we don't want to add one more view. There is no control on the API. We recommended client to use the order by as a workaround. But wanted to see if there is any suggestion/recommendation to resolve this within the existing job?

Share via

Databricks SQL warehouse with inconsistent order of data

1 answer

Your answer