Azure SQL Hyperscale - 0 secondary replicas?

iffak 26 Reputation points
2021-10-29T09:37:24.497+00:00

Backdrop

I develop a forecasting engine (time series) for different purposes. Processing, modeling and forecasting modules are written in Python, and data is currently stored in an Azure SQL database. Currently the database is General Purpose (vCore-based) service tier, Provisioned compute tier and Gen5 (12 vCores) hw config. I'm approaching the limit of maximum storage (approx 3 TB), but since I read almost the entire database daily (cold start models only), I do not see many other options than increasing the storage size. Truncating parts of historical data is out of the question.

Problem

At 12 vCores, maximum storage is approx 3 TB, and increasing vCores to enable the approx 4 TB maximum storage size is not feasible in a $-perspective (especially since it is storage, not compute, that is the bottleneck). I have read a bit about the alternative services / tiers on the Azure platform, and see that Hyperscale could possibly solve my problem: I can keep vCores untouched and have up to 100 TB storage. A config with zero secondary replicas (other things equal) will end up in the same $-range as before (see "Backdrop"). I get the impression that secondary replicas (read only nodes) are central to the Hyperscale architecture, so I'm not sure if such an outlined setup with zero secondary replicas is abuse / misuse. E.g. would it give the same performance, or could I expect a performance hit (even with the same vCore config)? Will the primary read / write node basically resemble a non-Hyperscale node? Other aspects I should think about? Adding a secondary replica (or several) might be relevant in the future (e.g. in combination with decreasing vCores), but is $-wise not an option atm.

Microsoft states that "The capability to change from Hyperscale to another service tier is not supported" (really?), so I would like clarify this to avoid doing a semi manual data migration (and delta migration) and having two instances side-by-side if the shait hits the fan. Given the scope of such a reconfig and the forecasting system as a whole, I feel it is not feasible to do small / full scale testing in advance to get representtive benchmarks. If there is anything else I should think about (related or semi related), feel free to point me in the right direction.

Azure SQL Database
0 comments No comments
{count} votes

Answer accepted by question author
  1. Oury Ba-MSFT 21,121 Reputation points Microsoft Employee Moderator
    2021-10-31T00:57:25.467+00:00

    Hi @iffak Thank you for posting your Question on Microsoft Q&A.

    The great thing about Hyperscale is that you don’t have to use any replicas if you don’t want to. Any failover operation that has to occur would be faster if you replicas but is not required.

    And later if you want to add replicas you can at any number you wish up to the max.

    Let me know if you have additional queries

    Regards,
    Oury


1 additional answer

Sort by: Most helpful
  1. Davide 11 Reputation points Microsoft Employee
    2021-11-05T17:30:03.613+00:00

    I'm one of the PM on the Azure SQL DB team. I see that Oury already gave you extensive answer, so I'm chiming in on to complete the picture. Azure SQL DB Hyperscale offers different type of secondary replicas. The replicas that can help to get an higher SLA are named High-Availability replica. You can use 0 replicas without any issue. What will happen is that if the primary replicas for any reason is not available, we need to spin up a new (compute) replica from scratch (as there are no HA replica available) so that can take some time (minutes, usually) which means that your service will not be available that amount time. Having an HA replica, drastically diminish the time in which the database is not available.

    You can read all the details here:

    https://learn.microsoft.com/en-us/azure/azure-sql/database/service-tier-hyperscale-replicas?tabs=tsql

    The SLA are defined here:

    https://www.azure.cn/en-us/support/sla/sql-data/

    Regarding the performances: unless you are specifically using secondary replicas also to offload read-only workload, you'll not have performance hit by not having an HA replica


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.