Optimizing spark streaming applications

vikranth-0706 100 Reputation points

I am investigating the process of submitting and executing Spark streaming applications within Synapse Spark pools.

In Synapse pipelines, the 'timeout' parameter for the 'spark job definition' activity specifies the maximum duration an activity can run. By default, this duration is set to 12 hours, with a maximum limit of 7 days. However, since streaming applications operate continuously, the 7-day limit is not suitable.

What are the recommended best practices for effectively running Spark streaming applications within Synapse, considering the continuous nature of these applications and the limitations imposed by the timeout parameter?

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,320 questions
0 comments No comments
{count} votes

Accepted answer
  1. Smaran Thoomu 8,885 Reputation points Microsoft Vendor

    Hi @vikranth-0706

    Thank you for reaching out to the community forum with your query.

    When you start the Spark job using the pipeline, it's set to run for 12 hours by default, and it can run for a maximum of seven days. This is because the pipeline isn't meant for continuous streaming but for processing data in batches. Running a pipeline indefinitely isn't recommended.

    For batch processing, it's a good idea to break down the pipeline into smaller jobs.

    If a streaming application needs to run for more than seven days, we should automate the restart process using Azure Functions or logic apps. These tools offer more flexibility in scheduling the jobs.

    Hope this helps. Do let us know if you any further queries.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful