Migrate to Innovate Summit:
Learn how migrating and modernizing to Azure can boost your business's performance, resilience, and security, enabling you to fully embrace AI.Register now
This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
The split-merge tool is intended for Azure Web Apps. The end of life for the Cloud Services (Classic) is August 31, 2024. If you were using the split-merge tool on Cloud Services (Classic), migrate to Azure Web Apps before August 31, 2024.
Prerequisites
Create a SQL database to be used as the split-merge status database. Go to the Azure portal. Create a new SQL database. Name the database and create a new administrator and password. Be sure to record the name and password for later use.
Ensure that your logical server in Azure allows Azure Services to connect to it. In the Azure portal, in the Firewall Settings for your logical server, ensure the Allow access to Azure Services setting is set to On. Select the Save icon.
Create an Azure Storage account for diagnostics output.
Use the public split-merge docker images, or push split-merge docker images to either Azure Container Service or your docker registry of choice.
Populate the following information: Image source: Docker hub Access type: Public Image and tag: mcr.microsoft.com/splitmerge/splitmergeworker:20240812.1
Use Review + create to create the web app.
UI Web App
To create the UI web app, follow the same steps you used to create the Worker web app with one difference:
A different docker image in the Image and tag field: mcr.microsoft.com/splitmerge/splitmergeweb:20240812.1
For the purposes of a simple test deployment for this tutorial, a minimal set of configuration steps are performed to get the service up and running. These steps enable only the one machine/account executing them to communicate with the service.
Create a self-signed certificate and PFX file
Use PowerShell to create a self-signed certificate and PFX file.
First, create a new directory. Then replace the inline values accordingly and run the following PowerShell commands from the new directory:
Valid connection string to the previously created Azure storage.
DataEncryptionPrimaryCertificateThumbprint
Previously generated certificate thumbprint.
MetadataExpirationPeriodInMinutes
20160
MaxRetryCount
5
WEBSITE_LOAD_CERTIFICATES
*
WEBSITE_PULL_IMAGE_OVER_VNET
0
Select Apply and restart the application.
Repeat the same steps for both worker and UI web app.
Troubleshoot the deployment
If your web role fails to come online, it's likely a problem with the security configuration. Check that the TLS/SSL is configured as described previously.
If your worker role fails to come online, but your web role succeeds, it's most likely a problem connecting to the status database that you created earlier.
Make sure that the connection string is accurate.
Check that the server and database exist, and that the user ID and password are correct.
For Azure SQL Database, the connection string should be in the form:
Ensure that the server name doesn't begin with https://.
Ensure that your server allows Azure Services to connect to it. To do this, open your database in the portal and ensure that the Allow access to Azure Services setting is set to On.
Test the service deployment
Connect with a web browser
Go to the Overview of your UI Web App and select Browse. Choose the correct certificate, if prompted.
Test with PowerShell scripts
The deployment and your environment can be tested by running the included sample PowerShell scripts.
Important
The sample scripts run on PowerShell 5.1. They don't currently run on PowerShell 6 or later.
The script files included are:
SetupSampleSplitMergeEnvironment.ps1 - sets up a test data tier for split-merge.
Creates a shard map manager database.
Creates two shard databases.
Creates a shard map for those databases (deletes any existing shard maps on those databases).
Creates a small sample table in both the shards, and populates the table in one of the shards.
Declares the SchemaInfo for the sharded table.
ExecuteSampleSplitMerge.ps1 - executes test operations on the test data tier.
Sends a split request to the Split-Merge Service web frontend, which splits half the data from the first shard to the second shard.
Polls the web frontend for the split request status and waits until the request completes.
Sends a merge request to the Split-Merge Service web frontend, which moves the data from the second shard back to the first shard.
Polls the web frontend for the merge request status and waits until the request completes.
GetMappings.ps1 - top-level sample script that prints the current state of the shard mappings.
ShardManagement.psm1 - helper script that wraps the ShardManagement API.
SqlDatabaseHelpers.psm1 - helper script for creating and managing databases in SQL Database.
Use PowerShell to verify your deployment
Open a new PowerShell window and navigate to the directory where you downloaded the Split-Merge package, and then navigate to the "PowerShell" directory.
Create a server (or choose an existing server) where the shard map manager and shards will be created.
Note
The SetupSampleSplitMergeEnvironment.ps1 script creates all these databases on the same server by default to keep the script simple. This isn't a restriction of the Split-Merge Service itself.
A SQL authentication login with read/write access to the databases is needed for the Split-Merge service to move data and update the shard map. Since the Split-Merge Service runs in the cloud, it doesn't currently support Integrated Authentication.
Make sure the server is configured to allow access from the IP address of the machine running these scripts. You can find this setting under SQL server / Firewalls and virtual networks / Client IP addresses.
Execute the SetupSampleSplitMergeEnvironment.ps1 script to create the sample environment.
Running this script wipes out any existing shard map management data structures on the shard map manager database and the shards. It might be useful to rerun the script if you wish to reinitialize the shard map or shards.
Execute the ExecuteSampleSplitMerge.ps1 script to execute a split operation (moving half the data on the first shard to the second shard) and then a merge operation (moving the data back onto the first shard). If you configured TLS and left the http endpoint disabled, ensure that you use the https:// endpoint instead.
If you receive the following error, it's most likely a problem with your Web endpoint's certificate. Try connecting to the Web endpoint with your favorite Web browser and check if there's a certificate error.
Invoke-WebRequest : The underlying connection was closed: Could not establish trust relationship for the SSL/TLSsecure channel.
If it succeeds, the output should look like the following output:
Output
.\ExecuteSampleSplitMerge.ps1 -UserName 'mysqluser' -Password 'MySqlPassw0rd' -ShardMapManagerServerName 'abcdefghij.database.windows.net' -SplitMergeServiceEndpoint 'http://mysplitmergeservice.cloudapp.net' -CertificateThumbprint 0123456789abcdef0123456789abcdef01234567
Sending split request
Began split operation with id dc68dfa0-e22b-4823-886a-9bdc903c80f3
Polling split-merge request status. Press Ctrl-C to end
Progress: 0% | Status: Queued | Details: [Informational] Queued request
Progress: 5% | Status: Starting | Details: [Informational] Starting split-merge state machine for request.
Progress: 5% | Status: Starting | Details: [Informational] Performing data consistency checks on target shards.
Progress: 20% | Status: CopyingReferenceTables | Details: [Informational] Moving reference tables from source to target shard.
Progress: 20% | Status: CopyingReferenceTables | Details: [Informational] Waiting for reference tables copy completion.
Progress: 20% | Status: CopyingReferenceTables | Details: [Informational] Moving reference tables from source to target shard.
Progress: 44% | Status: CopyingShardedTables | Details: [Informational] Moving key range [100:110) of Sharded tables
Progress: 44% | Status: CopyingShardedTables | Details: [Informational] Successfully copied key range [100:110) for table [dbo].[MyShardedTable]
...
...
Progress: 90% | Status: Completing | Details: [Informational] Successfully deleted shardlets in table [dbo].[MyShardedTable].
Progress: 90% | Status: Completing | Details: [Informational] Deleting any temp tables that were created while processing the request.
Progress: 100% | Status: Succeeded | Details: [Informational] Successfully processed request.
Sending merge request
Began merge operation with id 6ffc308f-d006-466b-b24e-857242ec5f66
Polling request status. Press Ctrl-C to end
Progress: 0% | Status: Queued | Details: [Informational] Queued request
Progress: 5% | Status: Starting | Details: [Informational] Starting split-merge state machine for request.
Progress: 5% | Status: Starting | Details: [Informational] Performing data consistency checks on target shards.
Progress: 20% | Status: CopyingReferenceTables | Details: [Informational] Moving reference tables from source to target shard.
Progress: 44% | Status: CopyingShardedTables | Details: [Informational] Moving key range [100:110) of Sharded tables
Progress: 44% | Status: CopyingShardedTables | Details: [Informational] Successfully copied key range [100:110) for table [dbo].[MyShardedTable]
...
...
Progress: 90% | Status: Completing | Details: [Informational] Successfully deleted shardlets in table [dbo].[MyShardedTable].
Progress: 90% | Status: Completing | Details: [Informational] Deleting any temp tables that were created while processing the request.
Progress: 100% | Status: Succeeded | Details: [Informational] Successfully processed request.
Experiment with other data types. All of these scripts take an optional -ShardKeyType parameter that allows you to specify the key type. The default is Int32, but you can also specify Int64, Guid, or Binary.
Create requests
The service can be used either by using the web UI, or by importing and using the SplitMerge.psm1 PowerShell module, which submits your requests through the web role.
The service can move data in both sharded tables and reference tables. A sharded table has a sharding key column and has different row data on each shard. A reference table isn't sharded so it contains the same row data on every shard. Reference tables are useful for data that doesn't change often and is used to JOIN with sharded tables in queries.
In order to perform a split-merge operation, you must declare the sharded tables and reference tables that you want to move. This is accomplished with the SchemaInfo API. This API is in the Microsoft.Azure.SqlDatabase.ElasticScale.ShardManagement.Schema namespace.
For each sharded table, create a ShardedTableInfo object describing the table's parent schema name (optional, defaults to "dbo"), the table name, and the column name in that table that contains the sharding key.
For each reference table, create a ReferenceTableInfo object describing the table's parent schema name (optional, defaults to "dbo") and the table name.
Add the previous TableInfo objects to a new SchemaInfo object.
Get a reference to a ShardMapManager object, and call GetSchemaInfoCollection.
Add the SchemaInfo to the SchemaInfoCollection, providing the shard map name.
An example of this can be seen in the SetupSampleSplitMergeEnvironment.ps1 script.
The Split-Merge service doesn't create the target database (or schema for any tables in the database) for you. They must be precreated before sending a request to the service.
Known errors
You might see the following message when running the sample PowerShell scripts:
Invoke-WebRequest : The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.
This error means that your TLS/SSL certificate isn't configured correctly. Follow the instructions in the section Connect with a web browser.
If you can't submit requests, you might see this:
[Exception] System.Data.SqlClient.SqlException (0x80131904): Could not find stored procedure 'dbo.InsertRequest'.
In this case, check your configuration file, in particular the setting for WorkerRoleSynchronizationStorageAccountConnectionString. This error typically indicates that the worker role couldn't successfully initialize the metadata database on first use.
Azure HPC is a purpose-built cloud capability for HPC & AI workload, using leading-edge processors and HPC-class InfiniBand interconnect, to deliver the best application performance, scalability, and value. Azure HPC enables users to unlock innovation, productivity, and business agility, through a highly available range of HPC & AI technologies that can be dynamically allocated as your business and technical needs change. This learning path is a series of modules that help you get started on Azure HPC - you
Administer an SQL Server database infrastructure for cloud, on-premises and hybrid relational databases using the Microsoft PaaS relational database offerings.