Hello @Rohit Sapru ,
Welcome to the Microsfot Q&A platform.
Yes, this is expected behavior when you call multiple foreachBatch operations in a single query.
The foreachBatch operations allow you to apply arbitrary operations and writing logic on the output of a streaming query. They have slightly different use cases - while foreach allows custom write logic on every row, foreachBatch allows arbitrary operations and custom logic on the output of each micro-batch.
ForeachBatch: foreachBatch(...) allows you to specify a function that is executed on the output data of every micro-batch of a streaming query. Since Spark 2.4, this is supported in Scala, Java and Python. It takes two parameters: a DataFrame or Dataset that has the output data of a micro-batch and the unique ID of the micro-batch.
Syntax:
def foreach_batch_function(df, epoch_id):
# Transform and write batchDF
pass
streamingDF.writeStream.foreachBatch(foreach_batch_function).start()
For more details, refer to the below links which explain more about the usage of the foreachbatch:
https://albertusk95.github.io/posts/2019/11/spark-structured-streaming-multiple-queries/
https://docs.databricks.com/spark/latest/structured-streaming/foreach.html
Hope this helps. Do let us know if you any further queries.
------------
- Please accept an answer if correct. Original posters help the community find answers faster by identifying the correct answer. Here is how.
- Want a reminder to come back and check responses? Here is how to subscribe to a notification.
Hello @Rohit Sapru ,
Glad to know that it helped.