What are the best way of archiving table to a new database

Rathirojini Sangilipandy 241 Reputation points
2022-01-13T08:54:59.827+00:00

Hi,

I would need to archive logtable from one db to another db from the last achieved logs to the recent logs.

Eg: If archived happened on 5th of jan and if am executing the procedure today then it should copy log from 6th of jan till now .

we will create a procedure to do this process . what is the best way to insert /transfer these rows to archieve database?a

Thanks!

SQL Server | Other
{count} votes

5 answers

Sort by: Most helpful
  1. Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
    2022-01-13T22:50:56.02+00:00

    With such a loose description, it is difficult to give any exact advice, as there are many "it depends".

    But the column you archive by, should preferably be index, so that you quickly can retrieve the rows.

    How many rows would you copy at a time, do you think? Depending on volume, you have to do it batches.

    Since it is a cross-database issue, there can also be security considerations, depending under which security context this archival operation will run.


  2. CathyJi-MSFT 22,396 Reputation points Microsoft External Staff
    2022-01-14T08:05:51.407+00:00

    Hi anonymous user,

    Using SQL server partition or you can create a SQL job that will archive the records from one table to another table(Insert and Delete) based on the dates (through stored procedures).


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment". 


  3. Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
    2022-01-14T22:10:46.717+00:00

    OK, so you are only going to copy the rows nightly, but only delete by the end of the month. And since the table is partitioned, you could that with partition switching.

    Are there any LOB columns of any size in this table?

    If not, I think you can do a straight INSERT into the log table. 1.5 million rows of a normal size is not really that much. I would suggest that you use some form of snapshot, so that you don't block writing to the table. If the database has READ_COMMITTED_SNAPSHOT enabled, this is already done for you.

    Else you could consider enabling it, although there are situations where using a snapshot can yield incorrect results, because you are reading stale data. Whether this is an issue or not, is very much application dependent. An alternative is to enable use ALLOW_SNAPSHOT_ISOLATION, and then make sure that the archiving process run SET TRANSACTION ISOLATION LEVEL SNAPSHOT. The advantage here is that you can enable snapshot selectively where you know that it is safe.

    By all means, do not use NOLOCK. This can lead to that your archiving process archives incorrect data. That is not only uncommitted data, but it may also fail to read it should have read.

    Then again, if you use partition switching to switch out the past month at the end of the month, you could run the archiving of the data after the partition switch before that table is truncated. That table should be isolated from the rest of the system and you can lock it all day long without causing issues. The only reason to do the daily archiving is really that you want the archive database to be more up to date.


  4. Rathirojini Sangilipandy 241 Reputation points
    2022-01-19T19:45:32.593+00:00

    @Erland Sommarskog , what is your opinion on below options

    1. Can Azure Data Factory be consider for monthly data copying from one DB to another ?
    2. create SSIS package to copy data every night from source DB to Destination DB
    0 comments No comments

  5. Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator
    2022-01-19T22:08:22.307+00:00

    Thanks for the detail steps. If table isn't partitioned , what would have been your recommendation just curious to know your thoughts on this .

    Short answer: it depends.

    I think with partitioning in place, that is such an obvious solution, that there is little reason not to use it. But since I'm not a big fan of partitioning, I don't think my prime suggestion would be to partition the table if the table had been unpartitioned.

    I discussed snapshot isolation earlier, but that was based on the idea that we are only doing the archiving. If the table is not partitioned, we have the headache of deleting the data, and this is more prone to cause problem. So I would do deletion and archiving at the same time. And I would look into to do this batches of a fairly small size. The exact size depends on how many indexes there are on the table. If there are four non-clustered indexes on the table, I would go for a batch size of just below 1000 rows. This ensures that the delete operation does not result in lock escalation to table level, to permit the delete operation run while the system is live. (The limit for lock escalation is 5000 locks). Very, very important here that there is an index to support the deletion criteria.

    I might also consider adding the command SET LOCK_TIMEOUT 100 to the operation, and then trap the lock-timeout error and retry after a short wait of 500 ms. The idea here is that if the DELETE operation conflicts with another process, the DELETE process is the one that should yield to prevent deadlocks.

    Then again, if you tell me that there is a monthly maintenance window which is long enough to run the deletion and archiving job, I would go for a lot simpler solution where archiving is done in bigger batches, as that will be faster.

    As for SSIS or ADF I would not consider them, of a very simple reason: I don't know any of these products! (But as I understand it, ADF would only be an option if your database is in the cloud, but maybe that is the case?)


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.