Join cosmos DB when processing streaming data from eventhub in Synapse

Jiacheng Zhang 20 Reputation points
2024-04-24T17:24:07.47+00:00

Hi Team,

Good morning! Would like to check that when we using spark notebook in Synapse to process Eventhub streaming data, we join the streaming data from Eventhub with CosmosDB data. We try both way of connecting CosmosDB transcational store with OLTP, and analytical store with OLAP, seems only OLTP will keep updating the CosmosDB data while streaming process running, OLAP way will only update Cosmos DB data when we manuelly run that connection cell again, but during the streaming process running, Cosmos DB data will not be updated. May I know why? Thanks so much!

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,396 questions
Azure Cosmos DB
Azure Cosmos DB
An Azure NoSQL database service for app development.
1,449 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Smaran Thoomu 9,845 Reputation points Microsoft Vendor
    2024-04-25T05:32:37.0333333+00:00

    Hi @Jiacheng Zhang

    Thanks for the question and using MS Q&A platform.

    It's great to hear that you are using Azure Synapse to process Eventhub streaming data and join it with CosmosDB data. I can help you understand why the OLAP way of connecting to CosmosDB is not updating the data during the streaming process.

    When you connect to the CosmosDB analytical store with OLAP, it creates a partitioned store that points to the ADLS Gen2 primary storage account linked to your Azure Synapse workspace. The partitioned store is used to store the data that is read from the analytical store. The partitioned store is not updated during the streaming process, and the data is only updated when you manually run the connection cell again.

    On the other hand, when you connect to the CosmosDB transactional store with OLTP, it updates the data in real-time during the streaming process. This is because the transactional store is used to write data into an Azure Cosmos DB container from Spark, and this process happens through the transactional store of Azure Cosmos DB.

    Regarding your question about custom queries when connecting to the analytical store, you can use the option method to specify custom partitioning keys. You can refer to the Azure documentation on Configure custom partitioning to partition analytical store data for more information.

    I hope this helps! Let me know if you have any further questions.