What type of Synapse architecture should be implemented for organisations that are large and have multiple departments working on the same data?

Question

What type of Synapse architecture should be implemented for organisations that are large and have multiple departments working on the same data?

Parameswaran Serussery Narayanan 35

Hello,

I am looking for inputs in relation to the synapse strategy that needs to be adopted. I have the below scenario,

Company XYZ has different departments (Dept 1, 2, 3 and 4) that are looking to leverage big data platform like Azure Synapse. The data and insight team within these departments are made of data architects and data scientists. Each of the departments have their own requirement to develop, test, train and productionise machine learning model and propensity model on the core data. There is also a need to have the ability to share some of the data products between departments.

Current setup of Synapse Workspace is as below,

Dev environment - 1 workspace in total for all the Departments to use

Prod environment - 1 workspace in total for all the Departments to use

Synapse SQL Datawarehouse - 1 SQL DW with 1000 DWUs reserved in Dev and 1 SQL DW with 5000 DWUs reserved in Prod workspace

Number of pipelines in Prod - 735+ (owned by Dept 1)

Number of entities/artifacts in Prod - 2500+ (owned by Dept 1)

Departments intended to use the workspace setup - Dept 1, 2, 3 and 4 which includes both data architect and data scientists

With each team having their own requirement, would it be a good approach to have just one workspace and have everyone use the same workspace or would it be good to have single data lake store with the core data and each team having their own synapse workspace to carry out their own requirement?

I understand that each of the approach has its own pros and cons. Using single workspace would mean there will be resource contentions and there are hard limits on the numbers of pipelines and entities allowed in a single workspace. We have one Dept that has close to 735+ pipelines and about 2500+ entities in this single workspace. If the remaining Dept are to use the same workspace, we will soon run out of limit and Microsoft have confirmed that these are hard limits and cannot be increased through quota request like soft limits. Using a multi workspace with single data lake topology means these workspaces need to be setup and secured via different VNet but this will reduce the issues related to resource contention, with better scalability, data governance, and different dept able to promote changes in much agile and sleek release process and able to use the same core data and are also able to share the data products in the single data lake enabling other Dept to make use of the common data products. Cross data query is possible with serverless sql pool which make data sharing feasible with multi workspace implementation.

I had gone through previous Microsoft link by the programme manager (JovanPop - https://techcommunity.microsoft.com/blog/azuresynapseanalyticsblog/the-best-practices-for-organizing-synapse-workspaces-and-lakehouses/3002506) and Microsoft's chief data officer (https://medium.com/data-science/best-practices-for-organizing-synapse-workspaces-977fe14b1fdb) leaning towards the multiple workspace - single data lake topology as the best approach for such large organisations but would be good to have more inputs while I try and understand which approach would be the best suited for the above mentioned scenario with some architectural inputs.

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T08:46:18.11+00:00

@Parameswaran Serussery Narayanan

Just checking in to see if the below suggestion helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

As your feedback is valuable and can assist others in the community facing similar issues.

Accepted answer

2 additional answers

Your answer

Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T08:46:18.11+00:00

@Parameswaran Serussery Narayanan

Just checking in to see if the below suggestion helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

As your feedback is valuable and can assist others in the community facing similar issues.

Answer 1

Chandra Boorla 14,510 Microsoft External Staff Moderator

@Parameswaran Serussery Narayanan

Thank you for the detailed context, your analysis and understanding of the scenario are spot on.

Given the scale at which Dept 1 is already operating (735+ pipelines and 2500+ artifacts), and with multiple other departments planning to use the same workspace, continuing with a single Synapse workspace architecture will likely lead to resource contention and hard platform limits, as you've rightly pointed out.

Based on your scenario and Microsoft's documented best practices (including those shared by Jovan Pop and others), the recommended approach would be:

Multi-Workspace Architecture with a Shared Data Lake (ADLS Gen2)

This model provides:

Scalability by isolating workloads per department,
Autonomy for individual Dev/Test/Prod cycles and CI/CD per team,
Governance and control via centralized security and Purview integration,
Data sharing and reuse through Serverless SQL Pool queries and shared curated zones in the data lake.

While this does introduce more complexity in setup (VNet integration, RBAC, and lake ACLs), the benefits in terms of long-term agility, performance, and maintainability outweigh those challenges, especially for large enterprises like yours.

I hope this information helps. Please do let us know if you have any further queries.

Kindly consider upvoting the comment if the information provided is helpful. This can assist other community members in resolving similar issues.

Thank you.

Parameswaran Serussery Narayanan 35 Reputation points

2025-06-25T17:47:02.24+00:00

Hi Chandra, Thank you for the response and appreciate taking time in detailing it out.

I have a question with respect to VNet, RBAC and ACLs. If we create individual workspaces for each of the department within the same resource group, would that reduce the admin work further as the roles set as resource group would cascade to the existing resources in the group? I understand that each of the subscriptions can have up to 20 synapse workspaces. We have team specific RBACs created to control the access but for sharing the data products between teams, would that mean extending the roles to further synapse workspaces if they need access instead of creating new roles? I understand that the VNet will still be need to be setup along with private end points as that’s the practice followed to secure the system.

Regards,

Param
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-25T18:54:43.55+00:00
@Parameswaran Serussery Narayanan

Thank you for the kind words, I’m glad the response was helpful!

Great question regarding the use of a shared resource group and how it impacts RBAC, ACLs, and VNet setup in a multi-workspace architecture.

RBAC & Resource group strategy

Yes, placing all Synapse workspaces within the same resource group can help reduce administrative overhead for managing resource-level access:

RBAC roles assigned at the resource group level (e.g., Synapse Contributor, Reader) will automatically apply to all current and future resources within that group, including new workspaces.

This approach works well for platform teams, DevOps, or support roles that need consistent permissions across all workspaces.

However, for team-specific or data-level access, you’ll still need to manage permissions more granularly.

Sharing data between workspaces

When teams need to consume each other’s data products from a shared ADLS Gen2:

RBAC controls the access to the storage account itself.

ACLs (Access Control Lists) on folders control read/write access to data for specific users or Synapse managed identities.

Rather than creating new roles, you can extend existing team-specific roles to include additional workspace managed identities if cross-access is needed.

This way, access is explicit, auditable, and flexible without creating unnecessary complexity.

VNet & Private Endpoints

You're absolutely right, even in a shared resource group:

Each Synapse workspace will still require its own VNet integration and private endpoints to secure communication with the Data Lake, Key Vault, and other services.

This ensures network isolation, data exfiltration protection, and adheres to best practices for enterprise security.

If you're using Infrastructure-as-Code (like Bicep or ARM templates), this process can be standardized and automated to simplify deployment across multiple workspaces.

Summary

Using a shared resource group simplifies initial RBAC setup, especially for shared roles.

For cross-team collaboration, you can extend existing roles and ACLs to new Synapse workspaces, no need to create entirely new ones.

VNet and private endpoint setup remains necessary per workspace, but it can be templatized for consistency.

I hope this information helps.
Parameswaran Serussery Narayanan 35 Reputation points

2025-06-25T21:32:18.3166667+00:00

@Chandra Boorla I once again thank you for taking time to confirm that my comprehension of the Azure system and platform is correct. I have few more questions with respect to underlying VNet and Private End Point that will need to be setup for these additional workspaces.

I understand that Microsoft doesn't change the VNet address spaces and Private End Points once assigned to a resource as these are static until and unless these are deleted and recreated for specific reasons. Having said that, is it accurate to say that the VNet and Private End Point setup will be a one time activity for these new workspaces and if set they need very minimal to negligible maintenance.

Also, with respect to the cost, I understand that with cloud it is either pay-as-you-go/use service or reserve the resources over a period of years (periods: 1,3,5) to have cost effectiveness. Is this right to say that the cost of running 10 pipelines in single workspace versus running 1 pipeline in each of the 10 workspaces would still be the same (like for like pipeline comparison). This will help me further understand the cost analysis part of this approach.
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-26T19:12:28.27+00:00
@Parameswaran Serussery Narayanan

Thank you once again for your thoughtful questions, and it’s great to see how deeply you’re thinking through both the architectural and operational aspects.

VNet and Private Endpoint – One-Time Setup?

Yes, your understanding is accurate. Once the VNet and Private Endpoints are configured for a Synapse workspace (e.g., for ADLS Gen2, SQL pools, Key Vault, etc.), they are generally:

Static and persistent, Microsoft does not modify them unless you explicitly delete and recreate them.

One-time setup per workspace, assuming there’s no change in networking architecture or security policy.

Minimal to negligible maintenance afterward, perhaps occasional DNS zone updates or NSG tweaks if your governance model evolves.

This setup can be further standardized using Infrastructure-as-Code (Bicep/ARM templates), which simplifies rollout for additional departments.

Pipeline Cost: One Workspace vs. Many?

Again, you're right in your assumption. Azure Synapse pipelines are billed based on execution (activity run time, IR usage, etc.), not on the number of pipelines or workspaces. So, running 10 pipelines in one workspace vs. 1 pipeline each across 10 workspaces, assuming identical logic, activity type, compute used, and frequency, will incur similar costs.

However, do note:

Costs may slightly increase if each workspace has its own dedicated Spark pool, IR, or if you're duplicating infrastructure components.

Serverless SQL and Copy activities behave predictably across workspaces and do not create additional overhead by default.

For data movement, using a shared IR across workspaces can help reduce costs and simplify monitoring.

So overall, the cost of pipeline execution is consistent, but it's good to plan for workspace-level infrastructure costs, especially if certain departments require heavy compute.

Conclusion

VNet/Private Endpoint setup is a one-time activity per workspace with minimal maintenance.

Pipeline execution costs remain like-for-like, regardless of how many workspaces are involved.

Any additional cost differences would come from infrastructure duplication (e.g., Spark pools, IRs), not from pipelines themselves.

I hope this information helps.
Parameswaran Serussery Narayanan 35 Reputation points

2025-06-26T20:39:23.91+00:00

@Chandra Boorla Thanks once again for the detailed explanation and appreciate it. In regards to the infrastructure setup like Spark Pool and IRs, in case of a single workspace, the Spark Pools will be created separately for the departments to isolate load and to ensure critical processes are not clashing for underlying resources. Example: If it is a single workspace, Dept 1 will have Spark Pool 1, Dept 2 will have Spark Pool 2 and so on within the same workspace but if each of the departments have a separate workspace, these Spark Pools will be created in their respective workspace. If we go with this approach, then would this possibly negate the infra duplication, is that right? Also, with respect to the multi workspace-single data lake store (Gen2) approach, the core data will be shared between workspaces and cross workspace query can still be achieved using the serverless sql pool which will reduce the dependency on the IRs. In this case as well, we will not be creating duplicate infra components there by reducing the changes of duplicating the infra.
Chandra Boorla 14,510 Reputation points Microsoft External Staff Moderator

2025-06-27T02:24:52.9266667+00:00
@Parameswaran Serussery Narayanan

Thank you once again, and you're absolutely right in your line of thinking.

Spark Pools – Isolation vs Duplication

Yes, your approach is spot on. Whether in a single workspace or multiple workspaces, allocating separate Spark pools per department is a sound practice for isolating workloads and ensuring critical processes aren't impacted by competing jobs.

In the single workspace setup, having SparkPool_Dept1, SparkPool_Dept2, etc., does provide logical separation, but the risk of shared workspace-level metadata contention and potential for overlapping role boundaries still exists.

In the multi-workspace setup, each department manages its own workspace and Spark pool, which:

Maintains the same isolation benefits, and

Avoids infrastructure duplication, since each pool serves a distinct team with its own compute needs.

So yes, in both setups, this isn’t considered redundant infrastructure but rather purpose-built isolation to maintain performance and control.

Integration Runtimes (IRs) and Cross-Workspace Access

You’re also correct about reducing the reliance on Integration Runtimes in the multi-workspace + shared ADLS Gen2 model:

Data sharing between departments can happen through Serverless SQL Pools querying shared curated zones in the lake.

This makes data accessible across workspaces without needing IR-based data movement, especially for read-heavy or analytical workloads.

IRs would only be needed for external data movement or integration scenarios, and even then, they can often be centralized or shared via VNet configuration.

This approach significantly reduces infra redundancy and simplifies both cost management and governance.

Summary

Using separate Spark pools per department, whether in one workspace or across multiple, is a scalable and intentional isolation strategy, not duplication.

With a shared ADLS Gen2 lake and Serverless SQL Pools, cross-workspace data access becomes seamless and reduces reliance on IRs.

Overall, your proposed model supports better workload management, clearer separation of concerns, and operational efficiency.

I hope this information helps. Please do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

As your feedback is valuable and can assist others in the community facing similar issues.

Thank you.
Parameswaran Serussery Narayanan 35 Reputation points

2025-06-27T06:41:34.9466667+00:00

@Chandra Boorla Thanks for all the valuable inputs and time taken in providing detail information for each of my queries. I appreciate it and these will be really helpful in making the architectural decision.

Answer 2

Deleted

This answer has been deleted due to a violation of our Code of Conduct. The answer was manually reported or identified through automated detection before action was taken. Please refer to our Code of Conduct for more information.

Comments have been turned off. Learn more

Answer 3

Hi,

I would suggest to implement multi workspace ( workspace for each department ) and a shared data lake

this would be better for scalability and better governance

Security can be done by ACLs and easy to manage

Workspaces should be like

syn-dept1-dev

syn-dept1-prod

syn-dept2-dev etc

Since (assuming )all departments or team share same data lake they can use

data product contracts
shared curated data sets
Shared tables etc

Risk of single workspace as mentioned above would be

1.unscalable when new team gets onboarded

Cross team conflicts
Slower releases
Difficult to enforce fine-grained team-level RBAC and networking
Shared pipelines, Spark/Synapse pools may lead to degraded performance

Share via

What type of Synapse architecture should be implemented for organisations that are large and have multiple departments working on the same data?

2 additional answers

Your answer