2019-01-03

Connect(); Special Issue 2018

Volume 33 Number 13

[SQL]

Introducing Azure SQL Database Hyperscale

By Kevin Farlee; Special Issue 2018

At the Ignite conference, we announced the public preview of Azure SQL Database Hyperscale, a new storage architecture providing a SQL-based and highly scalable service tier for databases that adapts on-demand to your workload’s needs. With Azure SQL Database Hyperscale, databases can quickly auto-scale up to 100TB, eliminating the need to pre-provision storage resources, and significantly expand the potential for app growth without being limited by storage size.

Compared to current Azure SQL Database service tiers, Azure SQL Database Hyperscale provides the following additional capabilities:

Support for 100TB+ database size
Rapid scale up/down and point-in-time restore, regardless of the database size
Fewer size-of-data operations
Higher log throughput than current service tiers
Scale-out read-only workload with read-scale replicas without data copy

New Architecture

Azure SQL Database Hyperscale is a fundamental rearchitecting of the storage engine within the SQL database that was accomplished without making changes in the query engine that processes queries and determines behaviors. So, we’ve added significant new capabilities without introducing any compatibility challenges.

The Azure SQL Database Hyperscale architecture breaks the monolithic SQL engine into several microservices designed for the cloud environment; these services decouple compute, log and storage.

As shown in Figure 1, the Azure SQL Database Hyperscale model comprises three primary components:

Compute is the query engine from SQL Server, the portion of the database engine that executes the query logic, and determines compatibility with other SQL implementations, as well as behaviors when the query is evaluated.
Page Servers is a new scale-out architecture for storing the data that takes the traditional Storage Engine component and splits it into a scale-out set of services, each one managing a defined set of data pages (nominally totaling 1TB of data pages).
Log Service is a new logging architecture that manages the data flow for log data, ensuring that logged updates to the data get propagated to all of the replicas of a changed page, and persisted durably for future needs.

Figure 1 The New Azure SQL Database Hyperscale Architecture

Compute Nodes

The compute nodes look like a traditional SQL Server, but without local data files or log files. Compute nodes are the “server” that applications and users interact with, whether updating the data (via the Primary compute node) or doing strictly read-only transactions via one of the readable secondary compute nodes.

The primary compute node writes transaction log records to the Landing Zone of the log service, and fetches data pages from page servers if they’re not found in the local data cache or Resilient Buffer Pool Extension (RBPEX). This is where all of the query execution logic resides, and that logic is unchanged from other SQL deployment modes. By preserving this top layer of the SQL Engine largely unchanged, you can maintain complete compatibility with other deployment modes of SQL while delivering the revolutionary new capabilities the separate storage components offer. Because we’ve separated the compute nodes containing the query engine from the storage, they’re effectively stateless, meaning you can lose a compute node without endangering any data at all. It also means that you can scale the compute resources up or down without moving any data, which provides unparalleled agility for scaling resources up or down at will.

I’ll discuss more high availability (HA) features later.

Log Service

The log service externalizes the transactional log of a Hyperscale database. The primary compute instance writes the log records directly to Azure Premium Storage managed by the log service (The “Landing Zone” in Figure 1). The log service retrieves log records from the landing zone and makes it available to page servers and secondary compute. The log service also offloads log records to cheaper long-term storage to support point-in-time restore.

Because the landing zone is Azure Premium Storage, which has resiliency and HA built into it, log records are persisted and safe as soon as they’re written to the landing zone. With the comparable AlwaysOn Availability Groups HA architecture for on-premises SQL Server, a transaction commit needs to be sent to a quorum of secondary replicas via network protocol, and an acknowledgement received back from that quorum of replicas before the transaction can be deemed to be complete, and the success returned to the calling application. This can take a significantly longer period of time than with the Hyperscale architecture. You can have full HA without waiting for secondary nodes to acknowledge the receipt of the log records. This means that even with full HA, you have end-to-end log latencies in the 2ms range. When the Azure Ultra SSD storage technology becomes available, this latency will shrink from 2.5ms to around 0.4ms for remote, fully durable and resilient data storage.

Finally, log records are persisted in Azure Standard storage for long-term storage, and the space in the landing zone is reclaimed. The log records in the Azure storage are kept as long as the backup retention period configured for the database, so there’s no need to do transaction log backups. You just keep the log data online, which gives you in effect an infinite transaction log (within the bounds of the user-configured backup retention period).

Page Servers

The page servers host and maintain the data files. They consume the log stream from the log service and apply the data modifications described in the log stream to data files. Read requests of data pages that aren’t found in the compute’s local data cache or RBPEX are forwarded to the page servers that own the relevant pages. In page servers, the data files are persisted in Azure Storage and are heavily cached through their own RBPEX SSD-based caches.

Each page server manages pages representing approximately 1TB of data, so the page servers are scaled horizontally in units of approximately 1TB. The Compute Nodes/query engine models each page server as a 1TB data file, so from the page ID (File ID: Page within file), you know exactly which page server hosts the page being requested.

The data for each page server is ultimately persisted in Azure Standard storage, which provides full resilience and availability. Thus, you can survive the loss of an entire page server without risking any data loss, as the data is fully available from the Azure storage, and you can completely rebuild the page server from that data if needed.

The data for each page server is fully cached in its own local SSD RBPEX cache, so that in operation, no data request is ever forwarded to the Azure storage backing the page server, because it can always be satisfied out of the local RBPEX cache.

Multiple page servers will be created for a large database. When the database is growing and available space in existing page servers is lower than a threshold, a new page server is automatically added to the database. Because page servers are working independently, you can grow the database with no local resource constraints.

High Availability and Disaster Recovery (DR)

Automated Backup and Point-in-Time Restore In a Hyperscale database, snapshots of the data files are taken from the page servers, leveraging Azure blob storage capabilities periodically to replace the traditional streaming backup. This allows you to back up a very large database in just a few seconds with no impact to the running workload. Together with the log records stored in the log service, you can restore the database to any point in time during retention—which is seven days in public preview and will be configurable up to 35 days at general availability (GA)—in a very short time, regardless of database size. Each page server’s data is snapshotted independently, with no requirement to synchronize the snapshots, so that potential source of delay is eliminated, as well.

Highly Available Components Each of the components in the Azure SQL Database Hyperscale architecture is designed to be highly available so that there will be no interruptions in database access:

Compute nodes each have at least one replica running and hot at all times. In the event of a compute node failure or failover, the replica would immediately take over the Primary role, keeping the database online and available. A replacement replica can be started up very quickly, and can warm up its caches as a background task without impacting production performance. Other replicas may be configured for read scale-out purposes. With this architecture, the compute nodes are effectively stateless.
Page servers each have a standby replica online with their RBPEX cache fully populated, so they’re available to take over for the active page server in the event of failure. Again, because they’re stateless outside of cached data, a replacement can be online very quickly, without any risk of data loss.
The log service doesn’t have a hot standby, but this is unnecessary as the log service has no cached data and can be replaced as quickly as you could failover to a standby replica. The data that’s managed by the log service resides first in the landing zone, which is resilient Azure Premium Storage (soon Ultra SSD storage), and is ultimately persisted in Azure Standard storage for long-term retention.

So, as you can see, there’s no component that represents a single point of failure. All Hyperscale components have active standbys, with the exception of the log service, and all data is backed by fully resilient Azure storage.

Creating Your Own Hyperscale Database

You can create a new Hyperscale Database in the same ways you create any other Azure SQL Database, using the Microsoft Azure Portal, PowerShell or Azure Resource Manager (ARM).

Here, I’ll show the method to create a Hyperscale Database using PowerShell. Note that this is no different from the standard script for creating any Azure SQL Database, with the exception of the -RequestedServiceObjectiveName parameter. In the example that follows, I’ll use the HS_Gen5_8, which is a Hyperscale SQL Database, on Generation 5 hardware, with eight cores. Of course, if you already have a resource group and logical server, you can skip those portions of the script.

The first thing you’ll need to do is to launch Azure Cloud Shell. Cloud Shell is a free, interactive shell that you can use to run the steps mentioned in this article. It has common Azure tools preinstalled and configured to use with your account. You can just click the Copy button to copy the code, then paste the code into the Cloud Shell and press enter to run it.

There are a number of ways to launch Cloud Shell. Among the easiest are to either open Cloud Shell in your browser by going to shell.azure.com/powershell and clicking the Launch Cloud Shell button, or by clicking the Cloud Shell button on the menu in the upper right of the Azure Portal. I recommend you use the second method. Log in to the Azure Portal at portal.azure.com using the account that has access to your Azure subscription. Then click on the link as shown in Figure 2.

Figure 2 Launching Cloud Shell from the Azure Portal

You can then paste the script shown in Figure 3 into the Cloud Shell window and execute it to create a resource group, a SQL Logical Server and a database. Note that the resource group name is set to “myResourceGroup-” plus a random number, for example “MyResourceGroup-2088327474.” Other objects are named similarly. However, you can customize these names if it’s more convenient.

Figure 3 Creating a Resource Group, SQL Logical Server and Database

# Set the resource group name and location for your server
$resourcegroupname = "myResourceGroup-$(Get-Random)"
$location = "westus2"
# Set an admin login and password for your server
$adminlogin = "ServerAdmin"
$password = "ChangeYourAdminPassword1"
# Set server name - the logical server name has to be unique in the system
$servername = "server-$(Get-Random)"
# The sample database name
$databasename = "mySampleDatabase"
# The IP address range that you want to allow to access your server
$startip = "0.0.0.0"
$endip = "0.0.0.0"
# Create a resource group
$resourcegroup = New-AzureRmResourceGroup -Name $resourcegroupname -Location $location
# Create a server with a system-wide unique server name
$server = New-AzureRmSqlServer -ResourceGroupName $resourcegroupname `
  -ServerName $servername `
  -Location $location `
  -SqlAdministratorCredentials $(New-Object -TypeName
    System.Management.Automation.PSCredential -ArgumentList $adminlogin,
    $(ConvertTo-SecureString -String $password -AsPlainText -Force))
# Create a server firewall rule that allows access from the specified IP range
$serverfirewallrule = New-AzureRmSqlServerFirewallRule
  -ResourceGroupName $resourcegroupname `
    -ServerName $servername `
    -FirewallRuleName "AllowedIPs" -StartIpAddress $startip -EndIpAddress $endip
# Create a blank database with the Hyperscale, Gen5, two-core performance level
$database = New-AzureRmSqlDatabase  -ResourceGroupName $resourcegroupname `
  -ServerName $servername `
  -DatabaseName $databasename `
  -RequestedServiceObjectiveName "HS_Gen5_2" `
  -SampleName "AdventureWorksLT"
Echo $servername
# Clean up deployment
# Remove-AzureRmResourceGroup -ResourceGroupName $resourcegroupname

In the script, the server is created in the WestUS2 region. At this time, Hyperscale is not available in all Azure regions, so you can either keep it in WestUS2, which is active, or try another region. If you get an error that the “Specified edition is not available. Please choose a different edition,” this indicates that the region you’ve chosen has not yet enabled Hyperscale. You can either try another region, or use WestUS2.

Another object that’s configured for you is the firewall rule that grants access to your server. The example script sets both startip and endip to 0.0.0.0, which doesn’t allow any access. You can add the address for your specific computer by finding the new server in the Portal, clicking on “show firewall settings” and then clicking on “Add client IP,” which will add a firewall rule allowing your current machine to connect to the database.

Once your database and server are created, and the firewall rules are adjusted, you can connect to your database using SQL Server Management Studio (SSMS). The script prints out the name it assigned to the SQL Server. Simply append “.database.windows.net” to the server name when you connect. The user name and password are contained in this script (of course, you can and should change them to something only you know). At that point, you’re connected to your new Hyperscale database.

Other methods for creating Azure Hyperscale SQL Databases can be found at bit.ly/2QoTLbj.

The Next-Generation Database Architecture

With this brief tour, you’ve seen that Azure SQL Database Hyperscale is a revolutionary new architecture that has the unique benefit of providing full compatibility with previous generations of SQL engines. Azure SQL Database Hyperscale is truly cloud-born, and has significant advantages in scale, performance and manageability. By eliminating common size-of-data operations, it’s able to deliver Very Large Database (VLDB) capabilities without the typical VLDB challenges. For more details, please refer to the “Hyperscale Service Tier (Preview) for up to 100TB” page at bit.ly/2TTJkeK.

Kevin Farlee has more than 30 years in the industry, in both database and storage management software. In his current role as a princial program manager on the Microsoft SQL Database team, he’s engaged in developing the Hyperscale VLDB features of Azure SQL DB.

Thanks to the following Microsoft technical experts for reviewing this article: Alain Dormehl, Xiaochen Wu
Alain Dormehl is a Senior Program Manager with the Azure SQL Database team at Microsoft, based in Redmond. He has held various roles at Microsoft and is now focusing on feature and product development for Azure SQL Databases. He has over a decade of experience with SQL Server and was an early adopter and evangelist for Azure. Follow him on Twitter @APSolutely

Xiaochen Wu is a senior program manager on SQL team at Microsoft. He has been working on several features and products in SQL Server and Azure SQL Databases in the past 10 years. Follow him on Twitter @allenwux

Discuss this article in the MSDN Magazine forum

Share via