spark streaming in synapse spark pools

parag kesar 20 Reputation points
2023-06-02T23:46:30.6633333+00:00

I’m looking into how to submit & run spark streaming applications in Synapse spark pools.

Below is the 'timeout' for the ‘spark job definition’ activity in a synapse pipeline:

‘timeout’ -> Maximum amount of time an activity can run. Default is 12 hours, and the maximum amount of time allowed is 7 days. Format is in D.HH:MM:SS

streaming applications are continuously running, so a 7 day limit will not work.

What are the recommended best practice for running spark streaming applications in synapse?

User's image

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
{count} votes

Answer accepted by question author
  1. Bhargava-MSFT 31,361 Reputation points Microsoft Employee Moderator
    2023-06-05T21:43:33.1266667+00:00

    Hello parag kesar,

    Welcome to the MS Q&A platform.

    When you run the Spark job definition using the pipeline, the default is 12 hours, and the maximum time allowed is seven days. The reason is the pipeline is not a streaming service but rather a batch service, and it is not advised to run any pipeline forever.

    For batch processing, dividing the pipeline into multiple smaller jobs is a good approach.

    If a streaming application requires execution beyond the 7-day limit, we need to automate the restart process using Azure Functions or logic apps. They can provide more flexibility in scheduling the jobs.

    I hope this helps. Please let me know if you have any further questions.

    1 person found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.