How to update a column with DATEDIFF function by using SQL Server

messer gatti 41 Reputation points
2022-03-23T09:46:56.97+00:00

Hallo All,

I want to find the difference between two order dates, which have the order numbers 3 and 1 and write it to another column (named fark) for each Cust_id.

Or in another words, I want to write a query to return for each user according to the time elapsed between the first purchasing and the third purchasing, in ascending order by Customer ID(in the column with name =fark)

So, I want to find something as 320 in fark for Charlotte Melton, as a desired result, which is the time difference between 3rd order and 1st order. And do this for each customers. And I want to write this result in the column with the name fark, instead of NULL.

186042-image.png

If the customer have no order number =3, I'll delete these columns later.

This is the last try I've made:

UPDATE [D1367_Alex_Project].[Alex].[calisma3_2]
SET fark = SELECT DATEDIFF (DAY, (SELECT [Alex].[calisma3_2].[order_date] As one FROM [D1367_Alex_Project].[Alex].[calisma3_2] WHERE
[order_number]=1),
(SELECT [Alex].[calisma3_2].[order_date] AS three FROM [D1367_Alex_Project].[Alex].[calisma3_2] WHERE
[order_number]=3))
WHERE one.Cust.ID = three.Cust.ID

But I always encounter with an error. Have you got an idea to solve this?

You can create the table as: (As you can see from the table's photo:)

CREATE TABLE [D1367_Alex_Project].[Alex].[calisma3_2] (Cust_id VARCHAR(20),
Customer_Name VARCHAR(50),
order_date DATE,
order_number INT,
fark INT NULL)

I use the latest version for SQL Server.

Thank you very much in advance..

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
10,930 questions
4,388 questions
{count} votes

Accepted answer
  1. Viorel 102.4K Reputation points
    2022-03-23T12:10:38.963+00:00

    Try a statement like this too:

    UPDATE t
    SET fark = DATEDIFF (DAY,
        (SELECT order_date FROM calisma3_2 WHERE Cust_Id = t.Cust_Id and order_number = 1 ),
        (SELECT order_date FROM calisma3_2 WHERE Cust_Id = t.Cust_Id and order_number = 3 ))
    from  calisma3_2 t
    
    1 person found this answer helpful.

3 additional answers

Sort by: Most helpful
  1. Ronen Ariely 15,041 Reputation points
    2022-03-23T11:23:01.73+00:00

    If I understand you correctly then you want to calculate the differences in the dates (I assume differences in days) between rows with the same Cust_id value. You claim that for each Cust_id you have two rows, one with the value 1 and one with the value 3 in the column order_number.

    What is your expected result if there is only one row for specific Cust_id? What if there are more the than 2 rows for the same Cust_id or maybe 2 rows with value 3? and so on...

    It seems like your design of the database might be problematic and open for inconsistency for your needs. At least make sure that you prevent such cases (for example using constraints).

    Solution...

    Assuming all the data fits this theoretical description above (each Cust_id has exactly two rows with order_number values 1 and 3) then you can use LAG or LEAD function instead of subquery. If you order the rows by Cust_id, order_number then you have the value of your tow rows with the same Cust_id in a single row. This way, you can directly compare values from these two rows

    Please provide sample data if the explanation not clear so we will be abel to provide a simple query which we tested on the data

    0 comments No comments

  2. messer gatti 41 Reputation points
    2022-03-23T12:21:38.42+00:00

    Hallo, Thank you very much for your time..

    I've filtered this table from a big database. I want to update the column fark,that I've added.
    All the customers have only one order, with the order number one or three. Some customers have no order with the order number three.

    You can also insert values to this table by this way as sample:

    CREATE TABLE [D1367_Alex_Project].[Alex].[calisma3_2] (Cust_id VARCHAR(20),
    Customer_Name VARCHAR(50),
    order_date DATE,
    order_number INT)

    INSERT INTO [D1367_Alex_Project].[Alex].[calisma3_2] ( Cust_1, 'Muhammed Macintyre', '2010-10-13', 1),
    (Cust_10, 'CLAUDIA MINER', '2012-08-04',1),
    (Cust_100, 'CHARLETTO MELTON', '2009-06-21',1),
    (Cust_100, 'CHARLETTO MELTON', '2010-05-07',3),
    (Cust_1000, 'JANE WACO', '2010-10-06',1),
    (Cust_1000, 'JANE WACO', '2012-02-06',3),
    (Cust_1001, 'SANDRA FLANAGAN', '2009-01-08',1),
    (Cust_1001, 'SANDRA FLANAGAN', '2011-10-24',3),
    (Cust_1002, 'STEVE NGUYEN', '2010-01-05',1),
    (Cust_1002, 'STEVE NGUYEN', '2010-12-14',3),
    ... (and so on)
    Then I've added the column (with name fark) to the right side with:

    ALTER TABLE [Alex].[calisma3_2] ADD fark INT (Because of that, all of them are NULL)

    As you can see, some customers have order with order number three. My aim is to change column (named fark) with the time difference between order number three and order number one. We can write these in the rows with order_number (for example 3) and erase the rows with order number 1

    I'll also delete the customers who have no order with order number three..

    Every customer has their own customer number

    0 comments No comments

  3. Ronen Ariely 15,041 Reputation points
    2022-03-23T12:53:09.787+00:00

    Awesome! Now with the DDL+DML you gave us we can provide a solution :-)

    The following solution based on the assumption that each Cust_id as not more than 2 columns. If this is not exactly your case, then we can change it a bit to fit your needs, but this should be the approach for the solution

    Please check this simple query without any subqueries

    SELECT t0.Cust_id,t0.order_date, t0.order_number, 
     MyDif = ABS(DATEDIFF(DAY,
     t0.order_date,
     COALESCE(
     LEAD(t0.order_date) OVER (partition by t0.Cust_id order by t0.order_number),
     LAG(t0.order_date) OVER (partition by t0.Cust_id order by t0.order_number)
     )
     ))
    FROM [calisma3_2] t0
    

    You can directly UPDATE the table from the above SELECT or simply forget about the extra column which is not seeded and only include duplicate data. Using the extra column or not is a question of how you use the data but in general you can get the value on-the-fly whenever is needed using LAG and LEAD functions

    Check performance! Compare the IO for the solutions you get before you select the solution that fits your needs

    -- First we will create the clustered index which fits both queries
    CREATE CLUSTERED INDEX Inx_Cust_id_order_date ON [calisma3_2](Cust_id,order_number);
    GO
    
    SET STATISTICS IO ON
    GO
    
    ;With MyCTE as(
     SELECT t0.Cust_id,t0.order_date, t0.order_number, 
     MyDif = ABS(DATEDIFF(DAY,
     t0.order_date,
     COALESCE(
     LEAD(t0.order_date) OVER (partition by t0.Cust_id order by t0.order_number),
     LAG(t0.order_date) OVER (partition by t0.Cust_id order by t0.order_number)
     )
     ))
     FROM [calisma3_2] t0
    )
    UPDATE [calisma3_2] SET fark = MyDif 
    FROM [calisma3_2] INNER JOIN MyCTE ON [calisma3_2].Cust_id = MyCTE.Cust_id and [calisma3_2].order_number= MyCTE.order_number
    GO
    /*
    Table 'calisma3_2'. Scan count 2, logical reads 43, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    */
    
     UPDATE t
     SET fark = DATEDIFF (DAY,
         (SELECT order_date FROM calisma3_2 WHERE Cust_Id = t.Cust_Id and order_number = 1 ),
         (SELECT order_date FROM calisma3_2 WHERE Cust_Id = t.Cust_Id and order_number = 3 ))
     from  calisma3_2 t
     GO
     /*
    Table 'calisma3_2'. Scan count 21, logical reads 62, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    Table 'Worktable'. Scan count 1, logical reads 23, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
    */
    

    Did you noticed the 21 scans as a result of subqueries?!? You have 10 rows and for each row you execute 2 sub-queries which add another scan to the IO statistics + you have 1 main scan of the table 1+10+10 = 21 scans. This sample has 10 rows only! And what if your table has 1000 rows or a real case table with several millions rows?

    Anyway, you should always test on your specific case :-)

    1 person found this answer helpful.