Редагувати

Поділитися через


Mirroring Azure Cosmos DB (Preview)

Mirroring in Microsoft Fabric provides a seamless no-ETL experience to integrate your existing Azure Cosmos DB data with the rest of your data in Microsoft Fabric. Your Azure Cosmos DB data is continuously replicated directly into Fabric OneLake in near real-time, without any performance impact on your transactional workloads or consuming Request Units (RUs).

Data in OneLake is stored in the open-source delta format and automatically made available to all analytical engines on Fabric.

You can leverage built-in Power BI capabilities to access data in OneLake in DirectLake mode. With Copilot enhancements in Fabric, you can use the power of generative AI to get key insights on your business data. In addition to Power BI, you can use T-SQL to run complex aggregate queries or use Spark for data exploration. You can seamlessly access the data in notebooks and use data science to build machine learning models.

Important

Mirroring for Azure Cosmos DB is currently in preview. Production workloads aren't supported during preview. Currently, only Azure Cosmos DB for NoSQL accounts are supported.

Why use mirroring in Fabric?

With Mirroring in Fabric, you don't need to piece together different services from multiple vendors. Instead, you can enjoy a highly integrated, end-to-end, and easy-to-use product that is designed to simplify your analytics needs and built for openness.

If you're looking for BI reporting or analytics on your operational data in Azure Cosmos DB, mirroring provides:

  • No-ETL, cost-effective near real-time access to your Azure Cosmos DB data without effecting your request unit consumption
  • Ease of bringing data across various sources into Fabric OneLake
  • Delta table optimizations with v-order for lightning-fast reads
  • One-click integration with Power BI with Direct Lake and Copilot
  • Rich business insights by joining data across various sources
  • Richer app integration to access queries and views

OneLake data is stored in the open-source Delta Lake format, allowing you to use it with various solutions within and outside of Microsoft. This data format helps make it easier to build a single data estate for your analytical needs.

What analytics experiences are built in?

Mirrored databases are an item in Fabric Synapse Data Warehousing distinct from the Warehouse and SQL analytics endpoint.

Diagram of Fabric Mirroring for Azure Cosmos DB.

Every Mirrored Azure Cosmos DB database has three items you can interact with in your Fabric workspace: 

  • The mirrored database item. Mirroring manages the replication of data into OneLake and conversion to Parquet, in an analytics-ready format. This enables downstream scenarios like data engineering, data science, and more.
  • SQL analytics endpoint, which is automatically generated
  • Default semantic model, which is automatically generated

Mirrored database

The mirrored database shows the replication status and the controls to stop or start replication in Fabric OneLake. You can also view your source database, in read-only mode, using the Azure Cosmos DB data explorer. Using data explorer, you can view your containers in your source Azure Cosmos DB database and query them. These operations consume request units (RUs) from your Azure Cosmos DB account. Any changes to the source database are reflected immediately in Fabric's source database view. Writing to the source database isn't allowed from Fabric, as you can only view the data.

SQL analytics endpoint

Each mirrored database has an autogenerated SQL analytics endpoint that provides a rich analytical experience on top of the OneLake's Delta tables created by the mirroring process. You have access to familiar T-SQL commands that can define and query data objects but not manipulate the data from the SQL analytics endpoint, as it's a read-only copy.

You can perform the following actions in the SQL analytics endpoint:

  • Explore Delta Lake tables using T-SQL. Each table is mapped to a container from your Azure Cosmos DB database.
  • Create no-code queries and views and explore them visually without writing a line of code.
  • Join and query data in other mirrored databases, Warehouses, and Lakehouses in the same workspace.
  • You can visualize and build BI reports with a single-click based on SQL queries or views.

In addition to the Microsoft Fabric SQL Query Editor, there's a broad ecosystem of tooling. These tools include Visual Studio Code, Azure Data Studio, SQL Server Management Studio, and even GitHub Copilot. You can supercharge analysis and insights generation from the tool of your choice.

Semantic model

The default semantic model is an automatically provisioned Power BI Semantic Model. This feature enables business metrics to be created, shared, and reused. For more information, see semantic models.

How does near real-time replication work?

When you enable mirroring on your Azure Cosmos DB database, inserts, update, and delete operations on your online transaction processing (OLTP) data continuously replicates into Fabric OneLake for analytics consumption.

The continuous backup feature is a prerequisite for mirroring. You can enable either 7-day or 30-day continuous backup on your Azure Cosmos DB account. If you are enabling continuous backup specifically for mirroring, 7-day continuous backup is recommended, as it is free of cost.

Note

Mirroring does not use Azure Cosmos DB's analytical store or change feed as a change data capture source. You can continue to use these capabilities independently, along with mirroring.

It could take a few minutes to replicate your Azure Cosmos DB Data into Fabric OneLake. Depending on your data's initial snapshot or the frequency of updates/deletes, replication could also take longer in some cases. Replication doesn't affect the request units (RUs) you allocated for your transactional workloads.

What to expect from mirroring

There are a few considerations and supported scenarios you should consider before mirroring.

Setup considerations

To mirror a database, it should already be provisioned in Azure. You must enable continuous backup on the account as a prerequisite.

  • You can only mirror each database individually at a time. You can choose which database to mirror.
  • You can mirror the same database multiple times within the same workspace. As a best practice, a single copy of database can be reused across lakehouses, warehouses, or other mirrored databases. You shouldn't need to set up multiple mirrors to the same database.
  • You can also mirror the same database across different Fabric workspaces or tenants.
  • Changes to Azure Cosmos DB containers, such as adding new containers and deleting existing ones, are replicated seamlessly to Fabric. You can start mirroring an empty database with no containers, for example, and mirroring seamlessly picks up the containers added at a later point in time.

Support for nested data

Nested data is shown as a JSON string in SQL analytics endpoint tables. You can use OPENJSON, CROSS APPLY, and OUTER APPLY in T-SQL queries or views to expand this data selectively. If you're using Power Query, you can also apply the ToJson function to expand this data.

Note

Fabric has a limitation for string columns of 8 KB in size. For more information, see data warehouse limitations.

Handle schema changes

Mirroring automatically replicates properties across Azure Cosmos DB items, with schema changes. Any new properties discovered in an item are shown as new columns and the missing properties, if any, is represented as null in Fabric.

If you rename a property in an item, Fabric tables retain both the old and new columns. The old column will show null and the new one will show the latest value, for any items that are replicated after the renaming operation.

If you change the data type of a property in Azure Cosmos DB items, the changes are supported for compatible data types that can be converted. If the data types aren't compatible for conversion in Delta, they're represented as null values.

SQL analytics endpoint tables convert Delta data types to T-SQL data types.

Duplicate column names

Azure Cosmos DB supports case-insensitive column names, based on the JSON standard. Mirroring supports these duplicate column names by adding _n to the column name, where n would be a numeric value.

For example, if the Azure Cosmos DB item has addressName and AddressName as unique properties, Fabric tables have corresponding addressName and AddressName_1 columns. For more information, see replication limitations.

Security

Connections to your source database are based on account keys for your Azure Cosmos DB accounts. If you rotate or regenerate the keys, you need to update the connections to ensure replication works. For more information, see connections.

Account keys aren't directly visible to other Fabric users once the connection is set up. You can limit who has access to the connections created in Fabric. Writes aren't permitted to Azure Cosmos DB database either from the data explorer or analytics endpoint in your mirrored database.

Mirroring doesn't currently support authentication using read-only account keys, single-sign on (SSO) with Microsoft Entra IDs and role-based access control, or managed identities.

Once the data is replicated into Fabric OneLake, you need to secure access to this data.

Data protection features

Granular security can be configured in the mirrored database in Microsoft Fabric. For more information, see granular permissions in Microsoft Fabric.

You can secure column filters and predicate-based row filters on tables to roles and users in Microsoft Fabric:

You can also mask sensitive data from non admin users using dynamic data masking:

Network security

Currently, mirroring doesn't support private endpoints or customer managed keys (CMK) on OneLake. Mirroring isn't supported for Azure Cosmos DB accounts with network security configurations less permissive than all networks, using service endpoints, using private endpoints, using IP addresses, or using any other settings that could limit public network access to the account. Azure Cosmos DB accounts should be open to all networks to work with mirroring.

Disaster recovery and replication latency

In Fabric, you can deploy content to data centers in regions other than the home region of the Fabric tenant. For more information, see multi-geo support.

For an Azure Cosmos DB account with a primary write region and multiple read regions, mirroring chooses the Azure Cosmos DB read region closest to the region where Fabric capacity is configured. This selection helps provide low-latency replication for mirroring.

When you switch your Azure Cosmos DB account to a recovery region, mirroring automatically selects the nearest Azure Cosmos DB region again.

Note

Mirroring does not support accounts with multiple write regions.

Your Cosmos DB data replicated to OneLake need to be configured to handle region-wide outages. For more information, see disaster recovery in OneLake.

Explore your data with mirroring

You can directly view and access mirrored data in OneLake. You can also seamlessly access mirrored data without further data movement.

Learn more on how to access OneLake using ADLS Gen2 APIs or SDK, the OneLake File explorer, and Azure Storage explorer.

You can connect to the SQL analytics endpoint from tools such as SQL Server Management Studio (SSMS) or using drivers like Microsoft Open Database Connectivity (ODBC) and Java Database Connectivity (JDBC). For more information, see SQL analytics endpoint connectivity.

You can also access mirrored data with services such as:

  • Azure services like Azure Databricks, Azure HDInsight, or Azure Synapse Analytics
  • Fabric Lakehouse using shortcuts for data engineering and data science scenarios
  • Other mirrored databases or warehouses in the Fabric workspace

You can also build medallion architecture solutions, cleaning and transforming the data that is landing into mirrored database as the bronze layer. For more information, see medallion architecture support in Fabric.

Pricing

Mirroring is free of cost for compute used to replicate your Cosmos DB data into Fabric OneLake. Storage in OneLake is free of cost based on certain conditions. For more information, see OneLake pricing for mirroring. The compute usage for querying data via SQL, Power BI or, Spark is still charged based on the Fabric Capacity.

If you're using the data explorer in Fabric mirroring, you accrue typical costs based on request unit (RU) usage to explore the containers and query the items in the source Azure Cosmos DB database. The Azure Cosmos DB continuous backup feature is a prerequisite to mirroring: Standard charges for continuous backup apply. There are no additional charges for mirroring on continuous backup billing. For more information, see Azure Cosmos DB pricing.

Next step