Alternative to DB Sync from local db to Azure

Paul Guillou 1 Reputation point
2021-08-24T11:22:26.163+00:00

We have a local SQL database and I have been trying to use MS db sync on a few tables so we can consume data in our web application and API's from Salesforce via the Azure db - mainly for scalability and bandwidth. DB Sync is not working because some of our tables have clustered primary keys and the values in the clustered columns can change, so sync thinks a new record has been created instead of being changed.

What other options are there? Someone suggested a data warehouse as it doesn't rely on primary keys. What about table storage? All suggestions welcome.

Thanks,

Paul.

Azure SQL Database
0 comments No comments
{count} votes

2 answers

Sort by: Most helpful
  1. Ronen Ariely 15,206 Reputation points
    2021-08-25T06:09:24.803+00:00

    Good day @Paul Guillou

    DB Sync is not working because some of our tables have clustered primary keys and the values in the clustered columns can change, so sync thinks a new record has been created instead of being changed.

    General

    This is a pretty good question and a common one. Therefore, I think a quick Google search can return a lot of good blogs on the subject :-)

    I cannot cover all the options and explain the pros and cos of each one in a forum's thread. For this we need a very long post or event several posts/lectures. With that being said, let me mention some of the options which can be used in order to to Sync from Local SQL Server to Azure Database - not all necessarily fit well in your case but general common sync solutions from local to azure):

    1. dacpac
    2. bacpac
    3. Replication
    4. bcb
    5. linked server + Jobs on the local server
    6. Azure data sync
    7. SQL Server Integration Service
    8. Azure Data Factory
    9. Azure Elastic job
    10. and many more...

    Note: Personally, I do not have a bog or a lecture on this topic since I usually prefer to focus on internals and other advance topics but if you search my lecture about The Internals of the External SqlPackage Tool - Inside a bacpac File, then you can find recordings in Hebrew and in English. During that lecture I mention these solutions (in VERY SHORT during the "Wrapping Up") and it might give you some insights.

    126158-image.png

    Let's focus on your specific scenario

    SQL Server must have some means of uniquely identifying every row. If the clustered index is not unique then the server add uniqueifier hidden column behind the scenes. In your case you use clustered primary keys which means that this column is used to uniquely identify the row.

    The big issue in solving your scenario automatically, is that the the clustered primary keys can be changed which make it impossible to recognize which row was changed, since the clustered index is the entity which identify the row. In fact, behind the scenes when you change the clustered index column, then the server do not UPDATE the value! Behind the scenes the server create a new row and mark the old row as deleted (and if needed make sure the new row is in the right location according to the order of the clustered values).

    Before continue the discussion, Let's prove it!

    Note: This part is advance undocumented procedure of checking what we actually have on the database file

    USE master  
    GO  
      
    DROP DATABASE IF EXISTS test  
    GO  
      
    CREATE DATABASE test  
    GO  
      
    use test  
    GO  
      
    CREATE TABLE dbo.test01 (  
        ID INT PRIMARY KEY CLUSTERED  
    );  
      
    INSERT test01(id) values (999)  
    GO  
      
    -- find the page which include the data (Type = 1 is the data page)  
    DBCC IND('test',test01,-1)  
    GO  
      
    -- Read the content of the page - use the data page number from previous query  
    DBCC TRACEON(3604)  
    DBCC PAGE('test',1,312,3)  
    GO  
      
    --> Notice this  
    --     Slot 0 Offset 0x60 Length 11  
    -- The position of the row is on offset 0x60 from the start of the page   
    -- 0x60 Hexadecimal is 96 Decimal which fir the fact that Every page has a 96 byte header. So this row existing right after the header of the page.  
      
    -- What will happen when we UPDATE the row clustered column?  
    UPDATE test01  
     SET ID = 888  
     where ID = 999  
    GO  
      
    -- Check if the page number was not changed  
    -- If page was changed then it is prove that the server did not really update the row but created a new row in different location on the file!  
    DBCC IND('test',test01,-1)  
    GO  
      
    -- If page was not cganged then we should check where the row is in the page now  
    DBCC TRACEON(3604)  
    DBCC PAGE('test',1,312,3)  
    GO  
    -- notice this in the result: Slot 0 Offset 0x6b Length 11  
    -- You can notice that the server DID NOT UPDATE the row but created a new row in position 0x6b which is 107 in eEcimal  
    -- The server createrd the new row 11 bytes after the location of the the original row since the original row was in position 96 and has Length 11 (96+11=107)  
    

    OK.. so we can understand the issue... even the local server treats the scenario as creating new row.

    So, from the database after the changed, using simplae queries on the data, you have no way to "understand" if the changed row is a new one which was created using INSERT or the old one which was UPDATE, since you changed the identifier of the row.

    You can deal with these row as "removed" and "insert" (just as it is actually done behind the scenes), or you can get the information about the action from the transaction log, which include the information about each DML query.

    SO what you should use?!?

    Without more information, In first glance it seems like Merge replication is your best solution.

    Why?

    Merge replication is implemented by the SQL Server Snapshot Agent and Merge Agent. It monitors changes using triggers.

    To track changes, merge replication must be able to uniquely identify every row. To by pass such issues of changes of the clustered column, it does not count on the clustered column. Instead Merge replication adds the uniqueidentifier column (with the ROWGUIDCOL property set) to every table (if there is no such column).

    More

    Note! You can use most of the other options which I mentioned above, in one way or another. For example, You can create an automat procedure based on dacpac (create dacpac every X time and deploy it to staging table -> use elastic jobs to sync from the staging table to the production table). In most cases using a taging table can solve your needs as well as first step and next you simply need to sync that staging table to the production tabvle.

    Note that if you change the values of the clustered index columns then (not necessarily but) there is a very good option that your database design has flaws and you might need to get some help to re-design your system (from database architect).

    1 person found this answer helpful.
    0 comments No comments

  2. Alberto Morillo 34,671 Reputation points MVP Volunteer Moderator
    2021-08-24T15:06:06.947+00:00

    You can create your own sync application as explained here.

    Have you tried using transactional replication? You can implement replication as explained here.

    A non-Microsoft alternative to SQL Data Sync is SymmetricDS. During my contributions on Stackoverflow , I became aware a good number of users use SymmetricDS as alternative to SQL Data Sync and they had a good opinion about the product. I have not used this product personally.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.