What is the max degree of Parallelism suppoted by stream analytics?

Question

Hi,

We are using a stream analytics job with 1 streaming unit. The input is a IoT Hub and the output is Cosmos DB container.
IoT hub is receiving 8 messages at the same time which is the Input in Stream Analytics, I want to know, are the messages processed parallelly by stream analytics, if yes then is there a limit to the degree of parallelism.

Thanks.

Answer

Hello @Satyam Chauhan ,
The max degree of parallelism depends on the three components of a Stream Analytics Job: Input, Query and Output.
I recommend reading the documentation on Optimizing your Stream Analytics Job, especially stream-analytics-streaming-unit-consumption and stream-analytics-parallelization. Besides making use of the Partitions as well as partition keys of your input and output, your query needs to be aligned.

In your specific case, max parallelism for 8 simultaneous messages is achieved by 8 partitions in IoT Hub/Event Hub. This would allow 8 parallel readers with their own cursor on a single partition. Depending on the message frequency and your processing requirements your solution might run well with less.

Be aware: If the messages come from the same device, IoT Hub will add them to the same partition to ensure in-order processing. Please keep this in mind, also when your devices have a very different or varying send behavior. For example, in a connected vehicle scenario one device might send many messages when the engine is on while the other 7 vehicles are not powered and send only keep-alive messages occasionally. In this case the partition for the active vehicle would be flooded what could result in backlogged messages while the others are empty.

As written in the docs, figuring out the right amount of SUs is very much trial and error as it depends on the data stream to be processed and the complexity of the query.

Answer

Hello @Satyam Chauhan ,
Thanks for the question and using MS Q&A platform.
As we understand the ask here is to how to set parallel please do let us know if its not accurate.
I think you should explore the option of using the PARTIITION BY clause and that should help .

https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-parallelization#partitions-in-inputs-and-outputs
One quick thing since you have Cosmos DB as sink , please keep an eye on the RU's .

Please do let me if you have any queries.
Thanks
Himanshu

Please don't forget to click on or upvote button whenever the information provided helps you. Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
- If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

What is the max degree of Parallelism suppoted by stream analytics?

2 answers