Elastic data map in Microsoft Purview
The Microsoft Purview Data Map provides the foundation for data discovery and data governance. It captures metadata about data present in analytics, software-as-a-service (SaaS), and operational systems in hybrid, on-premises, and multicloud environments. The data map stays up to date with its built-in scanning and classification system.
All Microsoft Purview accounts have a data map that starts at one capacity unit, and can elastically grow. They scale up and down based on request load and metadata stored within the data map.
The elastic data map has two components- metadata storage and operation throughput, represented as a capacity unit (CU). All Microsoft Purview accounts, by default, start with one capacity unit and elastically grow based on usage. Each data Map capacity unit includes a throughput of 25 operations/sec and 10 GB of metadata storage limit.
Operations are the throughput measure of the Microsoft Purview Data Map. They include any Create, Read, Write, Update, and Delete operations on metadata stored in the Data Map. Some examples of operations are:
- Create an asset in Data Map
- Add a relationship to an asset such as owner, steward, parent, lineage, etc.
- Edit an asset to add business metadata such as description, glossary term, etc.
- Keyword search returning results to search result page.
Storage is the second component of Data Map and includes the storage of technical, business, operational, and semantic metadata.
The technical metadata includes schema, data type, columns, and so on, that are discovered during Microsoft Purview scanning. The business metadata includes automated (for example, promoted from Power BI datasets, or descriptions from SQL tables) and manual tagging of descriptions, glossary terms, and so on. Examples of semantic metadata include the collection mapping to data sources, or classifications. The operational metadata includes Data factory copy and data flow activity run statuses, and runs time.
Elastic Data Map with auto-scale – you start with a Data Map as low as one capacity unit that can autoscale based on load. For most organizations, this feature leads to increased savings and a lower price point for starting data governance projects. This feature impacts pricing.
Enhanced scanning & ingestion – you can track and control the population of the data assets, classification, and lineage across both the scanning and ingestion processes. This feature impacts pricing.
Claudia is an Azure admin at Contoso who wants to create a new Microsoft Purview account from Azure portal. She doesn’t know the required size of Microsoft Purview Data Map to support the future state of the platform. However, she knows that the Microsoft Purview Data Map is billed using Capacity Units, which are affected by storage and operations throughput. She wants to create the smallest Data Map to keep the cost low and grow the Data Map size elastically based on consumption.
Claudia can create a Microsoft Purview account with the default Data Map size of one capacity unit that can automatically scale up and down. The autoscaling feature also allows for capacity to be tuned based on intermittent or planned data bursts during specific periods. Claudia follows the next steps in the creation experience to set up network configuration and completes the creation.
In the Azure portal, in the metrics tab for the Microsoft Purview account, Claudia can see the consumption of the Data Map storage and operations throughput. She can further set up an alert when the storage or operations throughput reaches a certain limit to monitor the consumption and billing of the new Microsoft Purview account.
Customers are billed for one capacity unit (25 ops/sec and 10 GB) and extra billing is based on the consumption of each extra capacity unit rolled up to the hour. The Data Map operations scale in the increments of 25 operations/sec and metadata storage scales in the increments of 10-GB size. Microsoft Purview Data Map can automatically scale up and down within the elasticity window (check current limits). However, to get the next level of elasticity window, a support ticket needs to be created.
Data Map capacity units come with a cap on operations throughput and storage. If storage exceeds the current capacity unit, customers are charged for the next capacity unit even if the operations throughput isn't used. The below table shows the Data Map capacity unit ranges. Contact support if the Data Map capacity unit goes beyond 100 capacity unit.
Data Map Capacity Unit | Operations/Sec throughput | Storage capacity in GB |
---|---|---|
1 | 25 | 10 |
2 | 50 | 20 |
3 | 75 | 30 |
4 | 100 | 40 |
5 | 125 | 50 |
6 | 150 | 60 |
7 | 175 | 70 |
8 | 200 | 80 |
9 | 225 | 90 |
10 | 250 | 100 |
.. | .. | .. |
100 | 2500 | 1000 |
Microsoft Purview Data Map’s operation throughput for the given hour is less than or equal to 25 Ops/Sec and storage size is 1 GB. Customers are billed for one capacity unit.
Microsoft Purview Data Map’s operation throughput for the given hour is less than or equal to 25 Ops/Sec and storage size is 15 GB. Customers are billed for two capacity units.
Microsoft Purview Data Map’s operation throughput for the given hour is 50 Ops/Sec and storage size is 15 GB. Customers are billed for two capacity units.
Microsoft Purview Data Map’s operation throughput for the given hour is 50 Ops/Sec and storage size is 25 GB. Customers are billed for three capacity units.
Microsoft Purview Data Map’s operation throughput for the given hour is 250 Ops/Sec and storage size is 15 GB. Customers are billed for 10 capacity units.
The Data Map billing example shows a Data Map with growing metadata storage and variable operations per second over a six-hour window from 12 PM to 6 PM. The red line in the graph is operations per second consumption, and the blue dotted line is metadata storage consumption over this six-hour window:
Each Data Map capacity unit supports 25 operations/second and 10 GB of metadata storage. The Data Map is billed hourly. It's billed for the maximum Data Map capacity units needed within the hour, with a minimum of one capacity unit. At times, you might need more operations/second within the hour, and more operations increase the number of capacity units needed within that hour. At other times, your operations/second usage could be low, but you might still need a large volume of metadata storage. The metadata storage is what determines how many capacity units you need within the hour.
The table shows the maximum number of operations/second and metadata storage used per hour for this billing example:
Based on the Data Map operations/second and metadata storage consumption in this period, this Data Map would be billed for 22 capacity-unit hours over this six-hour period (1 + 3 + 4 + 5 + 6 + 3):
Important
Microsoft Purview Data Map can automatically scale up and down within the elasticity window (check current limits). To get the next level of the elasticity window, a support ticket needs to be created.
The default limit for maximum operations per second is 10 capacity units. If you're working with a large Microsoft Purview environment and require a higher throughput, you can request a larger capacity of elasticity window by creating a quota request. Select "Data map capacity unit" as the quota type. Provide as much relevant information as you can about your environment and the extra capacity you would like to request.
Important
There's no default limit for metadata storage. As you add more metadata to your data map, it will elastically increase.
Increasing the operations throughput limit also increases the minimum number of capacity units. If you increase the throughput limit to 20, the minimum capacity units you're charged is 2 CUs. The below table illustrates the possible throughput options. The number you enter in the quota request is the minimum number of capacity units on the account.
Minimum capacity units | Operations throughput limit |
---|---|
1 | 10 (Default) |
2 | 20 |
3 | 30 |
4 | 40 |
5 | 50 |
6 | 60 |
7 | 70 |
8 | 80 |
9 | 90 |
10 | 100 |
The metrics data map capacity units and the data map storage size can be monitored in order to understand the data estate size and the billing.
Go to the Azure portal, and navigate to the Microsoft Purview accounts page and select your Purview account
Select Overview and scroll down to observe the Monitoring section for Data Map Capacity Units and Data Map Storage Size metrics over different time periods
For other settings, navigate to the Monitoring --> Metrics to observe the Data Map Capacity Units and Data Map Storage Size.
Select the Data Map Capacity Units to view the data map capacity unit usage over the last 24 hours. Observe that hovering the mouse over the line graph indicates the data map capacity units consumed at that particular time on the particular day.
Select the Local Time: Last 24 hours (Automatic - 1 hour) at the top right of the screen to modify time range displayed for the graph.
Customize the graph type by selecting the option:
Select the New chart to add the graph for the Data Map Storage Size chart.
With elastic Data Map, Microsoft Purview provides low-cost barrier for customers to start their data governance journey. Microsoft Purview Data Map can grow elastically with pay as you go model starting from as small as one Capacity unit. Customers don’t need to worry about choosing the correct Data Map size for their data estate at creation time.