Hi @Khamylov, Oleksandr , My understanding is that you are trying to copy data from CosmosDB to a Sink while preserving the order of events. You have added a Sort block between the Source and Sink with the Partition option set to Single partition. However, the data in the Sink is not in the expected order, even though the data preview shows the expected order. You were able to achieve the expected result only when you set Batch size = 1 in the Sink configuration, but the processing speed was extremely low. You are wondering if there is a way to have ordered events in the Sink without a workaround of Batch size = 1 and with reasonable throughput.
Based on the information you have provided; it seems that you have taken the right approach by using the Sort block to sort the incoming rows on the current data stream. However, as you have mentioned, the data preview shows the expected order, but the data in the Sink is not in the expected order. This could be because the data flow is executed on Spark clusters, which distribute data across multiple nodes and partitions. If you choose to repartition your data in a subsequent transformation, you may lose your sorting due to reshuffling of data.
To maintain the sort order in your data flow, as you did, we will have to set the Single partition option in the Optimize tab on the Sort transformation and keep the Sort transformation as close to the Sink as possible. This will ensure that the data is sorted before it is written to the Sink.
In general, it is recommended increasing the Batch size in the Sink configuration to improve the processing speed. However, in contradiction, increasing the Batch size may affect the order of events in the Sink.
I'm reaching out to internal team to see if there are any other workarounds that would help preserve the order of events with improved throughput and will get back to you as soon as I hear back from them.
Thank you for your patience.