Greetings @Jiacheng Zhang
You’re correct in considering the use of Azure Cosmos DB’s Analytical Store for your scenario. The Analytical Store is a fully isolated column store that enables large-scale analytics against operational data in your Azure Cosmos DB, without any impact on your transactional workloads. It’s designed to address the complexity and latency challenges that occur with traditional ETL pipelines.
The Analytical Store can automatically sync your operational data into a separate column store, which is suitable for large-scale analytical queries. This means you can run near real-time large-scale analytics on your operational data.
To connect to the Analytical Store, you can use Azure Synapse Link. This allows you to directly link to the Analytical Store from Azure Synapse Analytics.
In terms of changes to your current approach, instead of using spark.read.format('cosmos.oltp')
, you would use the serverless SQL pool in Azure Synapse Link. This allows you to analyze data in your Azure Cosmos DB containers that are enabled with Azure Synapse Link in near real-time without affecting the performance of your transactional workloads. The full SELECT surface area is supported through the OPENROWSET function.
Please note that you need to enable the Analytical Store on your Azure Cosmos DB containers. Also, ensure that your Azure Cosmos DB analytical storage is in the same region as the serverless SQL pool.
I hope this helps!