Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Azure Synapse Link for Azure Cosmos DB enables users to run near real-time analytics over operational data in Azure Cosmos DB. However, there are times when some data needs to be aggregated and enriched to serve data warehouse users. Curating and exporting Azure Synapse Link data can be done with just a few cells in a notebook.
In this tutorial, you'll connect to the analytical store so there's no impact on the transactional store (it won't consume any Request Units). We'll go through the following steps:
In that example, we use an HTAP container called RetailSales. It's part of a linked service called ConnectedData, and has the following schema:
We'll aggregate the sales (quantity, revenue (price x quantity) by productCode and weekStarting for reporting purposes. Finally, we'll export that data into a dedicated SQL pool table called dbo.productsales
.
Create a Spark notebook with Scala as Spark (Scala) as the main language. We use the notebook's default setting for the session.
Read the Azure Cosmos DB HTAP container with Spark into a dataframe in the first cell.
val df_olap = spark.read.format("cosmos.olap").
option("spark.synapse.linkedService", "ConnectedData").
option("spark.cosmos.container", "RetailSales").
load()
In the second cell, we run the transformation and aggregates needed for the new dataframe before loading it into a dedicated SQL pool database.
// Select relevant columns and create revenue
val df_olap_step1 = df_olap.select("productCode","weekStarting","quantity","price").withColumn("revenue",col("quantity")*col("price"))
//Aggregate revenue, quantity sold and avg. price by week and product ID
val df_olap_aggr = df_olap_step1.groupBy("productCode","weekStarting").agg(sum("quantity") as "Sum_quantity",sum("revenue") as "Sum_revenue").
withColumn("AvgPrice",col("Sum_revenue")/col("Sum_quantity"))
In the third cell, we load the data into a dedicated SQL pool. It will automatically create a temporary external table, external data source, and external file format that will be deleted once the job is done.
df_olap_aggr.write.sqlanalytics("userpool.dbo.productsales", Constants.INTERNAL)
You can query the result using a simple SQL query such as the following SQL script:
SELECT [productCode]
,[weekStarting]
,[Sum_quantity]
,[Sum_revenue]
,[AvgPrice]
FROM [dbo].[productsales]
Your query will present the following results in a chart mode:
Events
Mar 31, 11 PM - Apr 2, 11 PM
The biggest Fabric, Power BI, and SQL learning event. March 31 – April 2. Use code FABINSIDER to save $400.
Register todayTraining
Module
Implement Azure Synapse Link with Azure Cosmos DB - Training
Implement Azure Synapse Link with Azure Cosmos DB
Certification
Microsoft Certified: Azure Cosmos DB Developer Specialty - Certifications
Write efficient queries, create indexing policies, manage, and provision resources in the SQL API and SDK with Microsoft Azure Cosmos DB.
Documentation
Analytics with Azure Synapse Link - Azure Cosmos DB
Azure Synapse Link for Azure Cosmos DB lets you run near real time analytics (HTAP) using Azure Synapse Analytics over operational data in Azure Cosmos DB.
Quickstart: Connect to Azure Synapse Link for Azure Cosmos DB - Azure Synapse Analytics
How to connect an Azure Cosmos DB to a Synapse workspace with Synapse Link
Interact with Azure Cosmos DB using Apache Spark 3 in Azure Synapse Link - Azure Synapse Analytics
How to interact with Azure Cosmos DB using Apache Spark 3 in Azure Synapse Link