What are the best way of archiving table to a new database

Question

What are the best way of archiving table to a new database

Rathirojini Sangilipandy 241

Hi,

I would need to archive logtable from one db to another db from the last achieved logs to the recent logs.

Eg: If archived happened on 5th of jan and if am executing the procedure today then it should copy log from 6th of jan till now .

we will create a procedure to do this process . what is the best way to insert /transfer these rows to archieve database?a

Thanks!

Viorel 122.6K Reputation points

2022-01-13T09:24:34.347+00:00

Maybe you should compare the best ways with "Partitioned Tables and Indexes" (https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes).

5 answers

Your answer

Viorel 122.6K Reputation points

2022-01-13T09:24:34.347+00:00

Maybe you should compare the best ways with "Partitioned Tables and Indexes" (https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes).

Answer 1

Erland Sommarskog 121.9K MVP Volunteer Moderator

With such a loose description, it is difficult to give any exact advice, as there are many "it depends".

But the column you archive by, should preferably be index, so that you quickly can retrieve the rows.

How many rows would you copy at a time, do you think? Depending on volume, you have to do it batches.

Since it is a cross-database issue, there can also be security considerations, depending under which security context this archival operation will run.

Rathirojini Sangilipandy 241 Reputation points

2022-01-14T10:54:22.267+00:00

|@Erland Sommarskog Thanks for the response.

There is a log table which is monthly partitioned . There will be an estimate of around 35million rows per month roughly 1,2m rows per day.
we need to archive previous days rows from this log table every night (using sql jobs or other schedulers which can execute stored procedure) to archive DB which will have separate table for every month logs. Partition is not needed in the log table which will be created in the archiveDB .

for eg : on 14.1.2022 we need to copy/archive rows from log table where date = 13.1.2022 ( just for eg date is provided here but the actual criteria will be to copy records from the last copied rows like an incremental backup . if the scheduler jobs has failed last nite (13.1.2022) then we have to archive data from 12.1.2022 to the current row )

Initially we planned to do it as monthly activity so we delete the records from the source DB once they are achieved.
Since the load is huge and there are possibilities for more T.logs and dead lock we have decided to archive every night but data from the source DB will be only deleted at the end of the month.

everyday we need to archive /copy around 1 - 1,5 m rows from source DB to archive DB .

looking for doing this in a best possible way .

Answer 2

CathyJi-MSFT 22,396 Microsoft External Staff

Hi anonymous user,

Using SQL server partition or you can create a SQL job that will archive the records from one table to another table(Insert and Delete) based on the dates (through stored procedures).

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Rathirojini Sangilipandy 241 Reputation points

2022-01-14T10:59:43.003+00:00

@CathyJi-MSFT ! thanks for the response.

There will be around than 1 - 1,5m record which needs to be copied to another DB every night .

since its a huge insert there could be possibilities of deadlock .. and sorry for not mentioning about the size of rows that need to be copied.

One option will be insert with nolock ? but looking for more suggestions

Answer 3

Erland Sommarskog 121.9K MVP Volunteer Moderator

OK, so you are only going to copy the rows nightly, but only delete by the end of the month. And since the table is partitioned, you could that with partition switching.

Are there any LOB columns of any size in this table?

If not, I think you can do a straight INSERT into the log table. 1.5 million rows of a normal size is not really that much. I would suggest that you use some form of snapshot, so that you don't block writing to the table. If the database has READ_COMMITTED_SNAPSHOT enabled, this is already done for you.

Else you could consider enabling it, although there are situations where using a snapshot can yield incorrect results, because you are reading stale data. Whether this is an issue or not, is very much application dependent. An alternative is to enable use ALLOW_SNAPSHOT_ISOLATION, and then make sure that the archiving process run SET TRANSACTION ISOLATION LEVEL SNAPSHOT. The advantage here is that you can enable snapshot selectively where you know that it is safe.

By all means, do not use NOLOCK. This can lead to that your archiving process archives incorrect data. That is not only uncommitted data, but it may also fail to read it should have read.

Then again, if you use partition switching to switch out the past month at the end of the month, you could run the archiving of the data after the partition switch before that table is truncated. That table should be isolated from the rest of the system and you can lock it all day long without causing issues. The only reason to do the daily archiving is really that you want the archive database to be more up to date.

Rathirojini Sangilipandy 241 Reputation points

2022-01-18T16:51:11.94+00:00

@Erland Sommarskog , thanks for such a detail explanation.
sorry I couldn't get back to system for the past two days hence couldn't reply.

There isn't any LOB columns in the table and the only reason for daily archiving is we do not want monthly archiving to cause any locks on the table since the data will be really huge (40m +rows)

ALLOW_SNAPSHOT_ISOLATION and partition switching both are really a new topic to me i would need to check on those options .

Thanks for your suggestions.
Rathirojini Sangilipandy 241 Reputation points

2022-01-18T18:16:38.967+00:00
@Erland Sommarskog ,

Gone through some articles regarding ALLOW_SNAPSHOT_ISOLATION and Partition switching.
Now i understand why do you want me to archive data after partition switching because data need to be achieve to a different DB.

So the steps will be

do partition switching to a New table in same DB

Isolate the New table

archive data from New table to target DB

am i right ?
Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator

2022-01-18T21:57:14.227+00:00
Yes, that is the way you go! I will just change the steps a little bit to be more practical:

Create an empty table X with the schema as the main table.

When a month as gone, you switch the last partition with X.

You copy the data in X to the archive database. (X should be isolated already.) With as many as 40 million rows, you should probably do that in batches of 5 million at a time.

TRUNCATE TABLE X.

Merge the partition range so that the empty partition is merged with the current.

Split the partition to add a new partition for the next month.

And then you can reuse X next month. (Some of the steps are not related to the archiving, but to the partitioning in general.)

I'm not really a big fan of partitioning, but if you already have it in place, I think that is the best way to go.
Rathirojini Sangilipandy 241 Reputation points

2022-01-19T07:57:51.237+00:00

@Erland Sommarskog ,

Thanks for the detail steps. If table isn't partitioned , what would have been your recommendation just curious to know your thoughts on this .
Rathirojini Sangilipandy 241 Reputation points

2022-01-24T10:58:08.67+00:00

@Erland Sommarskog ,

In the above mentioned steps ,

On Step 3, do you mean to just isolate table X ? because ALLOW_SNAPSHOT_ISOLATION need to be enable at DB level . or do you mean to enable it at DB level and follow the steps.

On step 6, Does split the partition is mandatory to do because there are already partition created for the whole year. so Merging empty partition to the current should be enough i guess. That split partition would have been an automatic way of creating partition for next month if its not created already.

Thanks!
Erland Sommarskog 121.9K Reputation points MVP Volunteer Moderator

2022-01-24T22:46:30.1+00:00

I made the assumption that you had partitioned the table per month. Of course, you can create partitions for several months ahead, I can't see anything wrong in that. As long as you don't move into a new month with the same partition as the previous month you are good.

Answer 4

Rathirojini Sangilipandy 241

@Erland Sommarskog , what is your opinion on below options

Can Azure Data Factory be consider for monthly data copying from one DB to another ?
create SSIS package to copy data every night from source DB to Destination DB

Answer 5

Thanks for the detail steps. If table isn't partitioned , what would have been your recommendation just curious to know your thoughts on this .

Short answer: it depends.

I think with partitioning in place, that is such an obvious solution, that there is little reason not to use it. But since I'm not a big fan of partitioning, I don't think my prime suggestion would be to partition the table if the table had been unpartitioned.

I discussed snapshot isolation earlier, but that was based on the idea that we are only doing the archiving. If the table is not partitioned, we have the headache of deleting the data, and this is more prone to cause problem. So I would do deletion and archiving at the same time. And I would look into to do this batches of a fairly small size. The exact size depends on how many indexes there are on the table. If there are four non-clustered indexes on the table, I would go for a batch size of just below 1000 rows. This ensures that the delete operation does not result in lock escalation to table level, to permit the delete operation run while the system is live. (The limit for lock escalation is 5000 locks). Very, very important here that there is an index to support the deletion criteria.

I might also consider adding the command SET LOCK_TIMEOUT 100 to the operation, and then trap the lock-timeout error and retry after a short wait of 500 ms. The idea here is that if the DELETE operation conflicts with another process, the DELETE process is the one that should yield to prevent deadlocks.

Then again, if you tell me that there is a monthly maintenance window which is long enough to run the deletion and archiving job, I would go for a lot simpler solution where archiving is done in bigger batches, as that will be faster.

As for SSIS or ADF I would not consider them, of a very simple reason: I don't know any of these products! (But as I understand it, ADF would only be an option if your database is in the cloud, but maybe that is the case?)

Rathirojini Sangilipandy 241 Reputation points

2022-01-24T10:58:24.12+00:00

Thanks for sharing your thoughts here.

Good to know in detail about DELETE process .
Yes we are recently migrated to Azure SQL MI

Share via

What are the best way of archiving table to a new database

5 answers

Your answer