Edit

Share via


Microsoft Fabric capacity planning guide: Plan your first deployment

This article is part 1 of the Microsoft Fabric capacity planning guide. It helps Microsoft Fabric capacity admins, tenant admins, Center of Excellence (COE) leads, and analytics team leaders plan Microsoft Fabric capacity for your organization's first deployment. Strategic capacity planning lets you budget, scale, and optimize your analytics solution from proof of concept (POC) to production.

Scenario

An organization is evaluating Microsoft Fabric as a unified analytics platform. The Fabric tenant admin studies the Microsoft Fabric Adoption Roadmap. Now, they're ready to build their first analytics solution on Fabric, like a sales dashboard and data science solution, and need to plan for capacity: How many capacity units can they need? Should they start small or large? How should they budget for capacity as the project grows?

Flowchart of a Microsoft Fabric analytics dashboard that shows capacity planning metrics.

Guidance for Microsoft Fabric capacity planning

When you start with Fabric, it's smart to start small, learn, and scale gradually. The Microsoft Fabric Adoption Roadmap recommends starting with early projects that involve much exploration and experimentation. Use a trial or small capacity for a proof of concept, then gradually increase capacity as you move to pilot and production. This phased approach manages cost and risk, so you only invest in larger capacity when you have data to justify it.

Phase 1: Proof of concept (POC)

In phase 1, the goal is to validate your solution idea on Fabric with minimal cost and risk. You typically run a small-scale project to test Fabric's capabilities and get early feedback. Microsoft's adoption roadmap calls this the “exploration” stage of a solution—characterized by experimentation, a narrow scope, and involvement of only a small group of users.

Key actions in phase 1

  • Use the free trial capacity: Start with the Microsoft Fabric trial capacity, which lasts 60 days and provides compute at no cost. This option is perfect for a proof of concept. It lets you build and run Fabric content (Lakehouse, pipelines, and more) without buying capacity up front.

    Important

    Make sure the team reviews considerations, limitations, and FAQs for trial capacity during planning to avoid issues later. For example, check if the proof of concept scope includes AI and Copilot, which region to create the trial capacity in, and which region to use for pilot and production capacity. The types of items in the workspace can affect your ability to change license modes or move the workspace to a capacity in a different region. See Moving data around for details.

  • Keep the scope narrow: Define a focused use case for the proof of concept. This use case might be a single report or a subset of data. Limit the data volume and user count so the trial capacity can easily handle it. Microsoft Fabric adoption phase guidance suggests a proof of concept should be "purposely narrow in scope… A small group of users test the proof of concept solution and provide feedback."

  • Isolate the proof of concept environment: Use a dedicated workspace or capacity (preferably trial capacity) for the proof of concept. This setup ensures you don't impact any production systems (if they exist) and clearly signals to users that this environment is a test.

    Animation of isolating a proof of concept environment in Microsoft Fabric.

    In the isolation animation, we see the process of isolating a proof of concept environment in Microsoft Fabric:

    • Approach:
      • Provide isolated capacity for key items built by experienced developers
    • Pros:
      • Easy
      • Provides isolation from items built by inexperienced developers and rapid unplanned usage growth
      • Flexibility in capacity settings and governance
    • Cons:
      • Cost
      • Can lead to frustration of lower priority content developers and consumers
  • Measure usage and feedback: Even at this early stage, start using the Fabric Capacity Metrics app to monitor how the proof of concept uses resources. Track the capacity units used during background and interactive operations like pipeline runs for data refreshes, or interactive report runs. Also, gather feedback from the proof of concept users: Was their experience with the solution fast? Useful? Any issues? The combination of metrics and user feedback shows whether the solution approach works and guides improvements. If the trial capacity shows signs of strain (for example, approaching its limits), review the scope, simplify the proof of concept, or plan for more capacity in the next phase. Typically, a proof of concept on a trial runs comfortably if scoped right.

Phase 2: Development to pilot

Phase 2 involves building out the full solution and testing it in a controlled pilot. Now that the concept is proven, get a suitable capacity and onboard more users, but not the whole company. The objectives in this phase are to properly size your capacity, finish development, and validate with a broader audience before the big production launch.

Key actions in phase 2

  • Estimate required capacity and acquire it: Based on proof of concept learnings, estimate what capacity stock-keeping unit (SKU) you need for the full solution. Microsoft provides the Fabric SKU Estimator (Preview) tool to help with this. You input expected data volumes, usage patterns, number of users, and it suggests an SKU (for example, it might suggest an F16 or F32).

    Note

    Fabric SKU Estimator is a starting point, not an absolute answer. As mentioned in Evaluate and optimize your Microsoft Fabric capacity, capacity planning is a continuous exercise until you find the right balance between optimization and cost. It's always a good strategy to start small and then gradually increase the size as needed.

  • Develop on a separate capacity: It's best practice to do development and testing on a non-production capacity. This environment gives them more flexibility and since it's not the final production capacity, they can push it and even overload it using stress tests without impacting end users.

  • Gradually roll out a pilot: With the solution fully built and internally tested, run a pilot on the smaller but separate capacity. Invite a larger set of users (for example, a particular department or 10-15 percent of the eventual user base) to use the solution in their day-to-day work. During the pilot, monitor the capacity closely using Fabric Capacity Metrics app to proactively monitor usage during the rollout. Look for peak utilization percentages and any signs of throttling (the metrics app shows if any operations were delayed or dropped due to overuse). Ideally, you want to see high but not maxed-out usage.

  • Optimize and fine-tune: Use the pilot phase to catch performance issues and optimize. If certain actions or times of day cause heavy load, adjust accordingly. When a capacity is under strain, it's recommended to use one of the three strategies: optimize content, scale up, or scale out. During pilot, you mainly focus on optimization (tuning queries, repartitioning data, and so on) since scaling up or out comes when going production. By the end of the pilot, you should have a well-tuned solution and a clear idea of how much capacity is needed for the full launch.

    Screenshot of optimizing capacity in Microsoft Fabric during pilot phase.

    In the optimization animation, we see the process of optimizing capacity in Microsoft Fabric during the pilot phase:

    • Approach:
      • Work with content creators to follow best practices and reduce capacity unit (CU) consumption
    • Pros:
      • Avoids increased cost
      • Learning carries over to future content
    • Cons:
      • Can be difficult or time consuming
  • Forecast for production: Analyze the pilot's data to extrapolate what happens with full user count. This can be as simple as linear scaling (four times more users might ~four times the load), but consider usage patterns - not everyone accesses it at once.

    Note

    We recommend proactively monitoring using Capacity Metrics app to plan and decide capacity: "Scale up your capacity so that it covers your utilization" observed in pilot. In other words, pick a capacity size (also called SKU) where the highest usage from the pilot sits comfortably under 100 percent on that SKU.

Phase 3: Pilot to production

In phase 3, deploy your Fabric solution to a production capacity and open it to all intended users. Focus on reliability, performance, and governance at scale. Use what you learn to build a robust operational setup.

Key actions in phase 3

  • Scale up for production: Switch to a production capacity that handles the full load. You might upgrade your pilot capacity or buy a new one at a higher SKU.

  • Monitor continuously and set alerts: When it's live, continuous monitoring capacities is critical. Use the Fabric Capacity Metrics app to monitor capacity usage and throttling events. Set up notification for capacity usage exceedance so admins or specified contacts get notified when usage exceeds the configured threshold. Use the Capacity troubleshooting guide when admins get notified. Review Monitor capacities to learn about bursting and smoothing.

  • Use Fabric's auto-management features: Fabric capacities have bursting and smoothing built in. Unlike traditional systems that stay within hard compute limits and let jobs wait, fail, or run suboptimally, capacities self-manage to some extent. Bursting lets jobs succeed, running at peak performance to finish fast. Smoothing spreads the cost of jobs over a longer time period, preventing scheduling issues. These features absorb occasional peaks. In production, you might notice some burst usage—it's fine if it's infrequent. Sustained overload still leads to throttling. Use the Microsoft Fabric Metrics app to check if your capacity is throttling.

    Bursting lets jobs run at peak performance. Fewer delays reduce the perception of slowness. Users are happier because jobs finish faster. Smoothing reduces the impact of spikes in compute. Pay for the compute from your future capacity. There's no need to schedule jobs after another one finishes.

    Animation that shows bursting and smoothing features in Microsoft Fabric capacity management.

  • Continuous optimizations through Center of Excellence: When a capacity is under strain, use one of three strategies: optimize content, scale up, or scale out. Even in production, optimization doesn't stop. Keep looking for ways to improve efficiency. Follow optimization best practices for each workload like Power BI, Warehouse, Spark, and Data Factory. Use community practices like Center of Excellence to encourage others to use best practices and achieve excellence through communities in organizations. Over time, solutions change—more data, more users, new reports—so ongoing tuning helps capacity keep up without frequent upgrades.

    Animation that shows scaling up capacity in Microsoft Fabric.

    In the scale-up animation, we see the process of scaling up capacity in Microsoft Fabric:

    • Approach:
      • Move to a bigger F SKU
      • Schedule scale using Azure Automation, Fabric CLI, or notebook
    • Pros:
      • Add CUs for all items
      • Easy
    • Cons:
      • Cost
      • Noisy neighbors (items with unintentionally high CU burn) can still be a problem
  • Plan for scale-up or scale-out: As adoption grows, plan a strategy for scaling. If more users or projects join, decide whether to scale up the existing capacity (for example, go from F32 to F64) or scale out by adding another capacity and splitting workloads. See Optimize capacity for guidance on scale up versus scale out. Scaling up is straightforward—one bigger pool for everything. Scaling out gives isolation. Using multiple capacities is a good way to isolate compute for high-priority items and for self-service or development content.

    Animation that shows scaling out capacity in Microsoft Fabric.

    In the scale-out animation, we see the process of scaling out capacity in Microsoft Fabric:

    • Approach:
      • Create multiple smaller F SKUs based on organization, type of work, and so on
    • Pros:
      • Easy
      • Provides some isolation from noisy neighbors (items with unintentionally high CU burn)
      • Flexibility in capacity settings/governance
    • Cons:
      • Cost
      • High CU items have an increased chance of throttling
      • Workload thresholds should be observed
  • Implement governance: Running in production means setting up governance and management processes. Assign clear responsibility for capacity monitoring (who checks metrics and when). Use surge protection (a Fabric feature that automatically protects against background over usage) as a safety net. Governance can also include regular reviews with stakeholders about capacity costs and needs, like monthly reports showing how the capacity is used to justify ROI.

Conclusion

By following this three-phase approach, organizations introduce Microsoft Fabric in a controlled, cost-effective way and build confidence in their capacity planning. This approach sets a strong foundation for expanding Fabric to more solutions.

The next articles cover advanced capacity planning topics like scaling strategies for multiple solutions, cost management, and capacity governance at enterprise scale. Now that your first Fabric solution is running, you've navigated the journey from proof of concept to production, and you're ready to build on this success.

Next step