MS SQL | Is it possible to read database from remote server and write the changes on local server?

Question

MS SQL | Is it possible to read database from remote server and write the changes on local server?

iamnotabot 26

For some reason, I want to use the database from a remote server but if I'm making changes to it, the changes should be stored on the local machine only.

Is there any way to achieve this functionality? May some knowledge required on IO operations?

1-> Read OPs from the remote server
2-> Writes OPs on local Server

Like copying DB Schema locally and then whatever changes we are making on that remote database, can be added to local database, something like this?

Viorel 122.6K Reputation points

2021-04-26T18:05:09.503+00:00

Does “1” actually mean reading data from Remote Server, taking into consideration (merging) the changes that were done before and saved to Local Server in “2”?
iamnotabot 26 Reputation points

2021-04-26T18:46:14.477+00:00

Thank you @Viorel for your reply!

Let me mention the problem:

Suppose we have a Production DB Server (approx size is in 5TBs)

Now few of the teams(Test, QA, Stagging, OR Analytics) from the organization want to perform some actions on data. Now of course we won't allow them to use Production DB for their need.

So here what we are trying to build is providing Data as a Service (Virtual Data Pipeline) on demand because replicating prod DB doesn't make sense in terms of storage and costing.

Now If you can observe, our need is to read the data from Production DB, at the same time we don't want to update Production DB. So whatever the changes made by the team should be stored on a different DB server

For Example:
Tester is interested in data of a particular time frame as there was a problem? OR Analyst wants data from a particular timeframe to understand the pattern?

Can you help me to understand the challenges as well?

Accepted answer

3 additional answers

Your answer

Viorel 122.6K Reputation points

2021-04-26T18:05:09.503+00:00

Does “1” actually mean reading data from Remote Server, taking into consideration (merging) the changes that were done before and saved to Local Server in “2”?
iamnotabot 26 Reputation points

2021-04-26T18:46:14.477+00:00

Thank you @Viorel for your reply!

Let me mention the problem:

Suppose we have a Production DB Server (approx size is in 5TBs)

Now few of the teams(Test, QA, Stagging, OR Analytics) from the organization want to perform some actions on data. Now of course we won't allow them to use Production DB for their need.

So here what we are trying to build is providing Data as a Service (Virtual Data Pipeline) on demand because replicating prod DB doesn't make sense in terms of storage and costing.

Now If you can observe, our need is to read the data from Production DB, at the same time we don't want to update Production DB. So whatever the changes made by the team should be stored on a different DB server

For Example:
Tester is interested in data of a particular time frame as there was a problem? OR Analyst wants data from a particular timeframe to understand the pattern?

Can you help me to understand the challenges as well?

Answer 1

Erland Sommarskog 121.4K MVP Volunteer Moderator

I can see two solutions, one decently simple, and one not simple at all.

The first is to take a copy of production. Yes, that is 5TB, but the storage cost is a lot cheaper that the other solution. In this copy environment, you create a database snapshot. When a tester has completed his or her evil tests, you revert from the database snapshot, so that you go back to a known starting point. Reverting to a snapshot has the side effect of setting the log size to 0.5 MB, which is really bad for a production database, but assuming that these testers will only make a limited amount of updates, that should pose a problem in this environment.

The other solution? As Jeffery says, what you outline is a big challenge. Exactly how big depends on the complexity of your data model. It is certainly easier if you only have 20 tables than if you have 2000.

I did write a specification for a customer of my client a couple of years back. Their production database is around 3TB, and they wanted smaller databases for their developers, but the data should still be realistic. The idea was to extract data for some accounts and take it from there. In my spec I detailed which tables that should be extracted and how. An important thing was to find customer that had traded with different types of instruments and other specific features which is needed for the testing.

I don't remember how much time I spent on writing that spec, but it was surely more than 40 hours. Mind you, this is a decently complex database, with 1800 tables in total, of which my spec did not cover all. On the other hand, I have been working with this system for many years, so I have very good knowledge about the tables.

Whether this spec ever was implemented, I don't know, as I am no longer working a lot with this client. I don't recall my time estimate, but maybe it was 500 h. Of course, this cannot be translated to your site, as your database may be simpler. But it can still give an idea of what the cost might be for your virtual pipeline.

Jeffrey Williams 1,896 Reputation points

2021-04-26T21:40:07.023+00:00

To add to what Erland has stated - to get meaningful testing you would need to schedule a restore from production. For a 5TB database - it could be done daily, but that would require some tricks with your SAN. If you had to copy the backup file and then perform the restore using native SQL backups - it will take quite some time. Additionally, to avoid one group from stepping on another group - you would have to schedule each group different days/times to perform their work.

If you refresh the data on a daily basis - then you wouldn't need a snapshot to roll back, because the next day would be a new refreshed copy. If you refresh less frequently you could then use a database snapshot to rollback every morning. But you still have the issue with each group - any changes that group makes could interfere with another groups testing.
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-04-26T21:47:45.63+00:00

Actually, I think Red Gate has a product called SQL Clone, which can be useful here. I have not used it myself; I've only seen Red Gate people present on it. On their marketing page, they say: Clone SQL Server databases in seconds and save up to 99% disk space.

https://www.red-gate.com/products/dba/sql-clone/
iamnotabot 26 Reputation points

2021-04-27T05:18:11.94+00:00

@Erland Sommarskog @Jeffrey Williams Thank you so much for your time!

What will be the handy way to take DB snapshots? I've heard about the VSS framework but don't know its implementation.

Have you heard of a product called "actifio", It is an Enterprise product that offers VDP for any DB

I am curious about it thus decoding this product and trying to build similar functionality with opensource tools.
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-04-27T06:34:58.817+00:00
What I had in mind was simply

CREATE DATABASE snappy ON (NAME = = 'db1', FILENAME = '<path>'), (NAME = = 'db2', FILENAME = '<path>'), ... AS SNAPSHOT OF sourcedb

The normal use of a database snapshot is to provide a read-only copy of a database from a fixed point in time. The files for the database snapshot are so-called sparse files. These files maps back to the files for the source database for the pages that have not been modified.

In this scenario I was thinking that the snapshot could serve as a quick way to revert.

As Jeffery says, this maybe an overkill if you go for daily restores of production, but I was thinking that you may not want to do it daily given the size.
Mohit Patil 1 Reputation point

2021-05-27T09:31:22.143+00:00

@Erland Sommarskog
Can we use NFS here as a backup environment?
Erland Sommarskog 121.4K Reputation points MVP Volunteer Moderator

2021-05-27T21:26:28.617+00:00

Are you in some way in relation to the original poster?

In any case, I think you are better off asking this question in a separate thread, and I would recommend that you give some context to your question, so that it is easier to answer.

Answer 2

Erland Sommarskog 121.4K MVP Volunteer Moderator

Your question is not overly clear, but as a start you would have to have a database with the same schema on the local database. Then again, even if the schema is the same, the data may not be. For instance if you read a row with ID = 302 from the remote server, and then run locally run

UPDATE tbl SET col = value WHERE ID = 302

this may be a complete no-op, because there is ID = 302 locally.

Or did you have something else in mind?

I think you need to describe your scenario a little more closely.

iamnotabot 26 Reputation points

2021-04-26T05:19:24.957+00:00

Thank you for your replies!

Let me mention the problem:

Suppose we have a Production DB Server (approx size is in 5TBs)

Now few of the teams(Test, QA, Stagging, OR Analytics) from the organization want to perform some actions on data. Now of course we won't allow them to use Production DB for their need.

So here what we are trying to build is providing Data as a Service (Virtual Data Pipeline) on demand because replicating prod DB doesn't make sense in terms of storage and costing.

Now If you can observe, our need is to read the data from Production DB, at the same time we don't want to update Production DB. So whatever the changes made by the team should be stored on a different DB server

For Example:
Tester is interested in data of a particular time frame as there was a problem? OR Analyst wants data from a particular timeframe to understand the pattern?

Can you help me to understand the challenges as well?
Jeffrey Williams 1,896 Reputation points

2021-04-26T20:08:50.237+00:00

First challenge: any request by the users will need to read that subset of data from production. That will be an impact to your production system and could cause significant performance issues.

Second challenge: identifying the required data elements for each request. Doing this 'dynamically' is going to be problematic - since you won't know what tables/columns are needed until the users request the data. Additionally, you will find it near impossible to identify the related data that will be needed to support the request.
Jeffrey Williams 1,896 Reputation points

2021-04-26T20:09:47.277+00:00

This process would require extracting the required/identified data from production - and placing that data 'somewhere' for further analysis. That means each table selected would need the schema replicated, including all foreign keys and related tables. Worse yet...what happens if QA wants a set of data and Analytics wants a 'similar' set of data with overlapping ranges and overlapping columns?

Probably much easier to provide a read-only secondary they can query as needed - when needed, and a local database on that secondary where they can save results.

Answer 3

Jeffrey Williams 1,896

In addition to what Erland has stated - it would be much better if you asked for help in solving the actual problem. If you can state what issue/problem you are trying to solve we can offer up different solutions that may be much easier to implement, manage and maintain.

iamnotabot 26 Reputation points

2021-04-26T08:34:15.46+00:00

@Jeffrey Williams

Thank you for your reply, Check out my reply on @Erland Sommarskog 's Answer above.

Answer 4

Cris Zhan-MSFT 6,661

Hi，

Your question is not very clear. It sounds like you want to saves write operation on one server, while keeping the data on the two servers synchronized. This reminds me of Always on availability groups. . The availability group can keep the data synchronization of multiple replicas and has a read-only routing function that can offload read-only requests to the secondary replica.

iamnotabot 26 Reputation points

2021-04-26T08:34:36.247+00:00

@Cris Zhan-MSFT

Thank you for your reply, Check out my reply on @Erland Sommarskog 's Answer above.

Share via

MS SQL | Is it possible to read database from remote server and write the changes on local server?

3 additional answers

Your answer