Synapse Environment Design Pattern

saurabh 206 Reputation points
2021-11-26T05:17:41.673+00:00

Hi,

I am looking for some insights from the community into how various customers are setting up the synapse environments for their analytical platform. The analytical platform will be used by say ML Engineers, Data Scientist, Report Developers, Data Analyst

Which of the below will be considered to be a good design pattern considering cost, easy of deployment, ease of use, ease of operations, security and governance.

a) Multiple Synapse workspaces(in different subscriptions/resource groups/Vnet) for Prod and Non Prod environment. Each synapse will have its own Storage Account.
b) Multiple Synapse workspaces(in same subscriptions, different resource groups, different subnets) for Prod and Non Prod environment. Synapse workspace to share the storage account, but use different containers for Prod/Non-Prod
c) Single Synapse workspace with multiple pools for Prod/Non-Prod workloads. Synapse workspace to share the storage account, but use different containers for Prod/Non-Prod

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
2,622 questions
No comments
{count} votes

Accepted answer
  1. PRADEEPCHEEKATLA-MSFT 53,276 Reputation points Microsoft Employee
    2021-11-26T10:59:18.177+00:00

    Hello @saurabh ,

    Thanks for the question and using MS Q&A platform.

    You can go with option b) Multiple Synapse workspaces(in same subscriptions, different resource groups, different subnets) for Prod and Non Prod environment. Synapse workspace to share the storage account, but use different containers for Prod/Non-Prod.

    Create one Azure Synapse Workspace per environment (DEV, UAT and PRD). For each instance of Synapse Workspace, provide the following configuration values:

    • Workspace Name: syn-cicddemo-workspace-<env_suffix>
    • Resource Group: rg-cicddemo-<env_suffix>
    • Managed Resource Group: mrg-cicddemo-<env_suffix>
    • ADLS Gen2 Account Name: dlscicddemo<env_suffix>
    • File system name: fs-dlscicddemo-<env_suffix>

    Continuous integration (CI) is the process of automating the build and testing of code every time a team member commits a change to version control. Continuous delivery (CD) is the process of building, testing, configuring, and deploying from multiple testing or staging environments to a production environment.

    In an Azure Synapse Analytics workspace, CI/CD moves all entities from one environment (development, test, production) to another environment. Promoting your workspace to another workspace is a two-part process. First, use an Azure Resource Manager template (ARM template) to create or update workspace resources (pools and workspace). Then, migrate artifacts like SQL scripts and notebooks, Spark job definitions, pipelines, datasets, and data flows by using Azure Synapse CI/CD tools in Azure DevOps or on GitHub.

    For more details, refer to Continuous integration and delivery for an Azure Synapse Analytics workspace and How to use CI/CD integration to automate the deploy of a Synapse Workspace to multiple environments.

    Hope this will help. Please let us know if any further queries.

    ------------------------------

    • Please don't forget to click on 130616-image.png or upvote 130671-image.png button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
    • Want a reminder to come back and check responses? Here is how to subscribe to a notification
    • If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

1 additional answer

Sort by: Most helpful
  1. Richard Landy 6 Reputation points
    2021-11-29T08:05:41.263+00:00

    Hi @PRADEEPCHEEKATLA-MSFT

    Can you use this process to include a dedicated pool database in the ARM template, or would you still need to manage dedicated pool schema changes using SSDT and a dacpac file?