Self-hosted integration Runtime in Synapse notebooks

Question

Hi I have few REST APIs that are only accessible through self-hosted integration runtime.

Need a solution for: I am having trouble finding a way to use that SHIR to make the api call from PySpark Synapse Notebooks.

or

Need a solution: I need a way to get the spark sessions for the notebook activities in my pipelines to start up much faster.

Works but not optimized: I can however create a web-activity in a pipeline and select the SHIR and run the api there. But this web-activity is dependent on a Synapse notebook and is also supplying the response to another Notebook.

Each notebook starts its own spark session making whole very pipeline inefficient time wise.

Answer

Hi Venkat Ram Saran Tummala,

Thanks for reaching out to Microsoft Q&A.

To optimize the startup time of Spark sessions for notebook activities in pipelines, consider the following:

Warm-up Notebooks:

Create a "warm-up" notebook that runs lightweight operations to pre-warm the Spark cluster. You can schedule this notebook to run periodically or just before your main pipeline.

Pipeline Design Optimization:

If multiple notebooks are dependent on each other, consider consolidating operations within a single notebook where possible. This reduces the number of separate Spark sessions that need to start.

Keep Spark Session Alive:

You might configure a longer idle timeout for your Spark pool, keeping the session alive longer between notebook executions.

Use Synapse Pipeline Caching:

For the data that is repeatedly accessed, use Synapse's built-in caching features or leverage Delta Lake capabilities to store intermediate results, reducing the need to recompute in subsequent notebooks.

By implementing these optimizations, you can significantly reduce the startup time and improve the overall efficiency of your Synapse notebook pipelines.

Please 'Upvote'(Thumbs-up) and 'Accept' as an answer if the reply was helpful. This will benefit other community members who face the same issue.

Share via

Self-hosted integration Runtime in Synapse notebooks

1 answer

Your answer