Data Obfuscation

VDT-7677 121 Reputation points
2020-11-04T18:14:33.073+00:00

Hi,

Running SQL Server 2017. I am working on an obfuscation process for production data. I have the following sample query:

UPDATE
[T1]
SET
[T1].[First_Name] = [S1].[First_Name],
[T1].[Last_Name] = [S1].[Last_Name],
FROM
(
SELECT
[Row_ID],
[First_Name],
[Last_Name]
FROM
[dbo].[Test]
) [T1]
CROSS APPLY
(
SELECT TOP 1
*
FROM
[dbo].[Sample]
WHERE
[T1].[Row_ID] = [T1].[Row_ID]
ORDER BY
NEWID() ASC
) [S1]

The first & last name combinations in the sample table are unique. At first glance the above query does the trick but on examination the same sample row is used more than once in the update. As my sample table has more rows than the production table, how can I ensure that a sample row is not used more than once in the update query?

Thanks!

Transact-SQL
Transact-SQL
A Microsoft extension to the ANSI SQL language that includes procedural programming, local variables, and various support functions.
4,625 questions
{count} votes

5 answers

Sort by: Most helpful
  1. Tom Phillips 17,731 Reputation points
    2020-11-04T20:17:22.593+00:00

    This is likely wrong:

    WHERE
    [T1].[Row_ID] = [T1].[Row_ID]
    
    0 comments No comments

  2. VDT-7677 121 Reputation points
    2020-11-04T22:04:06.72+00:00

    Hi Tom,

    Removing the WHERE clause as specified results in all rows in the target table to be updated with the same first and last name.

    Thoughts?

    0 comments No comments

  3. Erland Sommarskog 110.4K Reputation points MVP
    2020-11-04T22:47:42.833+00:00

    I believe this should work, but since I don't have your tables, it's untested.

    ; WITH T1_numbering AS (
        SELECT First_Name, Last_Name, row_number() OVER(ORDER BY Row_ID) AS rowno
        FROM   T1
    ), S_Numbering AS (
       SELECT First_Name, Last_Name, row_number() OVER(ORDER BY newid()) AS rowno
       FROM   Sample
    )
    UPDATE T1
    SET    First_Name = S.First_Name,
           Last_Name  = S.Last_Name
    FROM   T1_Numbering T1
    JOIN   S_Numbering S ON T1.rowno = S.rowno
    
    0 comments No comments

  4. MelissaMa-MSFT 24,196 Reputation points
    2020-11-05T03:02:17.58+00:00

    Hi @VDT-7677 ,

    For this type of problem we recommend that you post CREATE TABLE statements for your tables together with INSERT statements with sample data, enough to illustrate all angles of the problem. We also need to see the expected result of the sample.

    You could try with below which has the least modification based on your code.

    UPDATE [T1]  
    SET  
    [T1].[First_Name] = [S1].[First_Name],  
    [T1].[Last_Name] = [S1].[Last_Name]  
    FROM [dbo].[Test] [T1]  
    CROSS APPLY  
    (  
    SELECT top 1 [First_Name],[Last_Name]  
    FROM  
    [dbo].[Sample]   
     --TABLESAMPLE (1000 ROWS) --add this line if necessary  
    WHERE  
    [T1].[Row_ID] = [T1].[Row_ID]  
    ORDER BY  NEWID(),[T1].[Row_ID]  
    ) [S1]  
    

    In additon, you could also try with Rank() function which is similar with RowNumber() function mentioned by Erland as below:

     ; WITH T1_numbering AS (  
         SELECT First_Name, Last_Name, RANK() OVER(ORDER BY Row_ID) AS RANKID  
         FROM   test  
     ), S_Numbering AS (  
        SELECT First_Name, Last_Name, RANK() OVER(ORDER BY newid()) AS RANKID  
        FROM   Sample  
     )  
     UPDATE T1  
     SET    First_Name = S.First_Name,  
            Last_Name  = S.Last_Name  
     FROM   T1_Numbering T1  
     JOIN   S_Numbering S ON T1.RANKID = S.RANKID  
    

    Best regards
    Melissa


    If the answer is helpful, please click "Accept Answer" and upvote it.
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.
    Hot issues October--Users always get connection timeout problem when using multi subnet AG via listener. Especially after failover to another subnet


  5. SQLZealots 276 Reputation points
    2020-11-09T02:55:54.34+00:00

    Try the function in the below post if that helps you.

    https://sqlzealots.com/2019/10/08/generate-random-character-in-sql-server/

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.