Join cosmos DB when processing streaming data from eventhub in Synapse

Question

Hi Team,

Good morning! Would like to check that when we using spark notebook in Synapse to process Eventhub streaming data, we join the streaming data from Eventhub with CosmosDB data. We try both way of connecting CosmosDB transcational store with OLTP, and analytical store with OLAP, seems only OLTP will keep updating the CosmosDB data while streaming process running, OLAP way will only update Cosmos DB data when we manuelly run that connection cell again, but during the streaming process running, Cosmos DB data will not be updated. May I know why? Thanks so much!

Answer

Hi @Jiacheng Zhang

Thanks for the question and using MS Q&A platform.

It's great to hear that you are using Azure Synapse to process Eventhub streaming data and join it with CosmosDB data. I can help you understand why the OLAP way of connecting to CosmosDB is not updating the data during the streaming process.

When you connect to the CosmosDB analytical store with OLAP, it creates a partitioned store that points to the ADLS Gen2 primary storage account linked to your Azure Synapse workspace. The partitioned store is used to store the data that is read from the analytical store. The partitioned store is not updated during the streaming process, and the data is only updated when you manually run the connection cell again.

On the other hand, when you connect to the CosmosDB transactional store with OLTP, it updates the data in real-time during the streaming process. This is because the transactional store is used to write data into an Azure Cosmos DB container from Spark, and this process happens through the transactional store of Azure Cosmos DB.

Regarding your question about custom queries when connecting to the analytical store, you can use the option method to specify custom partitioning keys. You can refer to the Azure documentation on Configure custom partitioning to partition analytical store data for more information.

I hope this helps! Let me know if you have any further questions.

Join cosmos DB when processing streaming data from eventhub in Synapse

1 answer