Hi,
I currently have a Data Warehouse on an Azure Managed Instance and SSIS Packages hosted on the SSISDB Catalog of the same instance being executed via Azure-SSIS Integration Runtime.
By looking at the features offered by a Microsoft Managed Instance, it gives me the impression that it’s designed for OLTP (Online Transaction Processing) workloads, and not optimized for heavy ETL processes that involve large volumes of data transformations and loading.
Here are some arguments that supports the statement above and considering moving from Azure SQL Server MI to SQL Server on a VM:
§ Recovery Model
Staging and DW database can be set to SIMPLE or Bulk Logged Recover Model, when applicable.
A database in Full recover model generates a considerable amount of transaction log that needs to be backup up frequently, e.g. every 5 minutes. This is recommended for systems that need to recover to a certain point in time, high availability groups, log shipping, etc. This is not the case for many Data Warehouse solutions, which can be recovery by re-running jobs or restoring last night backup, i.e. RPO of 1 day.
Full recover model puts an unnecessary pressure on the system while Simple allows the transaction log to be recycled automatically.
§ Control of Backups
Currently I have Full Databases backups running once a week, differential ones every day and transaction logs every 5 minutes. I have no control of it, as they are set up automatically by the Managed Instance. I can only input the retention policy, i.e. No. of days to keep a backup. However, the Full and differential backups run in the middle of the night while the SSIS jobs are running. The impact in performance is evident, specially when the Full backup runs.
Backups are also being replicated to another geographical location causing network waits. However, it can be configured to replicate locally.
§ Flexible Resource Configuration
My current solution runs on a SQL MI Standard Configuration, which uses magnetic drives. I know that SSDs can be order of magnitude faster, but the vCore purchasing model of the MI doesn't allow me to upgrade the disks only. I must move to a Premium Performance Configuration in order to upgrade the disks, and even though, I must increase the number of cores until I'm allowed to select the amount of disk I need, which has a high cost implication.
On a VM, I have four options of storage on SSDs to choose from, which is not tied to the number of cores. And also much more freedom to tweak the system for data integration workloads.
§ Replaces the SSIS Integration Runtime
I can kill two birds with one stone as the same SQL Server on a VM can be used to host the databases and the SSIS runtime.
§ Choose SQL Server Edition
As most features of Enterprise Edition is not required for the current solution, it could be replaced by an Standard Edition, which can be 3x cheaper.
§ Full control of SQL Server
o Full control of SQL Server Instance and database configuration
o I can use open source and 3rd-party monitoring and health check tools that are currently designed and tested on VMs only, which is the case for most of tools out there.
Some of the consideration (things you lose) when moving away from MI can be considered as wanted features for most Data Warehouse + ETL environments, such as:
§ You may use manual or automated backups.
§ You need to implement your own High-Availability solution.
§ There is a downtime while changing the resources(CPU/storage).
§ You need SQL Server License (your own or included with the VM).
So, I would really appreciate if you could share your take on this. How do the arguments above sound? Any recommendations?
Thanks so much,
Regards,
Marcelo.