Share via


Sample: Joining tables from different Azure SQL Databases

Abstract:
The Elastic Database Query feature allows you to perform cross-database queries to access remote tables. It is a great feature if you plan to send straight-forward queries with well-defined Where clauses to the remote database. But as soon as you need to join a remote table with a local table, you are in for a surprise.

For example, if you run a query like the following:
SELECT * FROM dbo.ExternalTable x INNER JOIN dbo.LocalTable l on x.ID = l.ID;
all rows from the remote table will be pulled over, and the join will be performed locally. No problem if you only have a few rows remotely. Bad idea if the remote table holds thousands or millions of rows.

This article will show you how you may perform the join remotely at the external database, and return the resultset back to the local database.

Possible Solutions:
This sample assumes that you have a central database used as entry point for your application. It also assumes that you have at least one external database for storing additional data outside of your central database.

The central database holds a Subscriptions table with a few SubscriptionIDs. The goal is to select all matching rows from an external AccountDetails table.

The suggested solution is to create a Stored Procedure inside the external database. The procedure accepts the join IDs as a parameter list, inserts them into a table variable, and then joins that table variable with the remote external table.

(1) Prepare the connectivity between central and external database
/*** RUN THIS ON YOUR CENTRAL AZURE SQL DATABASE ***/ -- create a master key, a credential, and an external data source CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'strongpassword'; GO CREATE DATABASE SCOPED CREDENTIAL ElasticDBQueryCred WITH IDENTITY = 'username', SECRET = 'strongpassword'; GO CREATE EXTERNAL DATA SOURCE MyElasticDBQueryDataSrc WITH (TYPE = RDBMS, LOCATION = 'yourserver.database.windows.net', DATABASE_NAME = 'YourExternalDatabase', CREDENTIAL = ElasticDBQueryCred, ) ; GO
(2) Create an external sample table and the stored procedure for performing the remote join:
/*** RUN THIS ON YOUR EXTERNAL AZURE SQL DATABASE ***/ -- create a table for the rows to be returned to the central database CREATE TABLE dbo.AccountDetails ( AccountID INT PRIMARY KEY NOT NULL, AccountName VARCHAR(50) NULL, MailAddress VARCHAR(50) NOT NULL, SubscriptionID UNIQUEIDENTIFIER NOT NULL ); INSERT INTO dbo.AccountDetails (AccountID, AccountName, MailAddress, SubscriptionID) VALUES (1, 'Wizard 1', 'some@where.com', '11111111-2222-3333-4444-000000000001'); INSERT INTO dbo.AccountDetails (AccountID, AccountName, MailAddress, SubscriptionID) VALUES (2, 'Wizard 2', 'over@rainbow.com', '11111111-2222-3333-4444-000000000002'); GO -- create the stored procedure that will return the rows to the central database CREATE PROCEDURE [dbo].[sp_GetAccountDetails10] ( @SubscriptionID uniqueidentifier = NULL, @1SubscriptionID uniqueidentifier = NULL, @2SubscriptionID uniqueidentifier = NULL, @3SubscriptionID uniqueidentifier = NULL, @4SubscriptionID uniqueidentifier = NULL, @5SubscriptionID uniqueidentifier = NULL, @6SubscriptionID uniqueidentifier = NULL, @7SubscriptionID uniqueidentifier = NULL, @8SubscriptionID uniqueidentifier = NULL, @9SubscriptionID uniqueidentifier = NULL ) AS begin SET NOCOUNT ON declare @SubIDs table (SubscriptionID uniqueidentifier); if @SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@SubscriptionID); if @1SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@1SubscriptionID); if @2SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@2SubscriptionID); if @3SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@3SubscriptionID); if @4SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@4SubscriptionID); if @5SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@5SubscriptionID); if @6SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@6SubscriptionID); if @7SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@7SubscriptionID); if @8SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@8SubscriptionID); if @9SubscriptionID is not null INSERT INTO @SubIDs (SubscriptionID) VALUES (@9SubscriptionID); select AccountID, AccountName, MailAddress, SubscriptionID from dbo.AccountDetails where SubscriptionID in (select SubscriptionID from @SubIDs); end; GO -- Test: -- exec [sp_GetAccountDetails10] '11111111-2222-3333-4444-000000000001', '11111111-2222-3333-4444-000000000002', '4B1DE4FD-9051-4839-86BA-0000A5CCF12A'

(3) Create a central sample table, extract the IDs to be joined, and call the remote stored procedure to retrieve the results:

/*** RUN THIS ON THE CENTRAL DATABASE: ***/ -- create a temporary table that holds the subscription IDs of interest -- these values would be joined to the external table declare @Subscriptions TABLE (SubscriptionID UNIQUEIDENTIFIER PRIMARY KEY NOT NULL, SubscriptionName NVARCHAR(256) NOT NULL); insert into @Subscriptions (SubscriptionID, SubscriptionName) values ('11111111-2222-3333-4444-000000000001', 'Subscription1'); insert into @Subscriptions (SubscriptionID, SubscriptionName) values ('11111111-2222-3333-4444-000000000002', 'Subscription2'); insert into @Subscriptions (SubscriptionID, SubscriptionName) values ('2150482E-D354-4E61-AF4D-38705F095C2C', 'Subscription3'); -- prepare the parameter string declare @subid_list nvarchar(500) = NULL, @cmd nvarchar(3000) = '' select top 10 @subid_list = case when @subid_list is null then '''' + cast(SubscriptionID as nvarchar(36)) + '''' else @subid_list + ', ''' + cast(SubscriptionID as nvarchar(36)) + '''' end from @Subscriptions -- prepare the command string select @cmd = N'sp_GetAccountDetails10 ' + @subid_list -- select @cmd -- returns: sp_GetAccountDetails10 '11111111-2222-3333-4444-000000000001', '11111111-2222-3333-4444-000000000002', '2150482E-D354-4E61-AF4D-38705F095C2C' -- call the remote stored procedure EXEC sp_execute_remote @data_source_name = N'MyElasticDBQueryDataSrc', @stmt = @cmd; -- alternate solution: run the select statement directly at the external database select @cmd = N'select AccountID, AccountName, MailAddress, SubscriptionID from dbo.AccountDetails where SubscriptionID in (' + @subid_list + ')' -- select @cmd -- returns: select AccountID, AccountName, MailAddress, SubscriptionID from dbo.AccountDetails where SubscriptionID in ('11111111-2222-3333-4444-000000000001', '11111111-2222-3333-4444-000000000002', '2150482E-D354-4E61-AF4D-38705F095C2C') EXEC sp_execute_remote @data_source_name = N'MyElasticDBQueryDataSrc', @stmt = @cmd;

As a side note, this technique is derived from a stored procedure used in Transactional Replication, where it turned out to be the most efficient way to move data result sets between the publisher and distribution database. The replication procedure is called with more than 240 parameters. So depending on the maximum number of IDs you are expecting to join, you may increase the number of parameters accordingly. The NULL checks and the inserts into the remote table variable are consuming just a few CPU cycles, so would be negligible compared to transferring thousands of rows from the external table.