Facing issue while calling mssparkutils.notebook.runMuktiple([notebook_list along with parameters]) in azure synapse analytics

Question

Facing issue while calling mssparkutils.notebook.runMuktiple([notebook_list along with parameters]) in azure synapse analytics

karthik raja 0

Here I try to call few notebooks in another common notebook. Where magic commands is not working to have control over the notebooks when it comes execute the required notebook. So I tried write through mssparkutils.notebook.runMultiple(), in this scenario I want those notebooks to run parallel without waiting for other notebooks to complete. But it's failing with the error msg which I added in the picture

PRADEEPCHEEKATLA 91,861 Reputation points

2024-10-16T11:49:15.8066667+00:00

@karthik raja - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

1 answer

Your answer

PRADEEPCHEEKATLA 91,861 Reputation points

2024-10-16T11:49:15.8066667+00:00

@karthik raja - Just checking in to see if the below answer helped. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 1

@karthik raja - Thanks for the question and using MS Q&A platform.

It seems like you are facing an error while running a notebook in Azure Synapse Analytics. The error message "py4JJavaError saying failed with status code: 400, response: system level submitter mismatch. Kernel" indicates that there is a mismatch between the submitter of the job and the system level submitter. This error can occur when the user who submitted the job is different from the user who is running the job.

To resolve this issue, you can try the following steps:

Transient issues can sometimes occur in Azure Synapse Analytics, and retrying the operation can often resolve the issue.
You can try creating a new notebook or a new Spark SQL pool and see if the issue persists.
If the issue continues to occur, you may want to check if there are any service outages or maintenance activities that could be causing the issue. Please do share the Synapse Spark runtime verision and the region of the synapse workspace.

I tried to run the mssparkutils.notebook.help("runMultiple") from our end and able to execute without any issues.

The method mssparkutils.notebook.runMultiple() allows you to run multiple notebooks in parallel or with a predefined topological structure. The API is using a multi-thread implementation mechanism within a spark session, which means the compute resources are shared by the reference notebook runs.

User's image

Here is the status view of notebook run: Notebook1

User's image

Here is the status view of notebook run: Notebook2

User's image

In the above example both the notebooks named(Notebook1 and Notebook2) ran using the same Apache spark application named Livy ID 12

In case, if you experiencing the same issue - I would suggest you to share the Synapse Spark runtime verision and the region of the synapse workspace along with the screenshot of the error message to investigate further.

Hope this helps. Do let us know if you have any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

karthik raja 0 Reputation points

2024-10-11T15:54:27.9433333+00:00

Hi @PRADEEPCHEEKATLA I tried multiple times yesterday, it failed all the time. But it succeeded now. And I saw the reference pictures you added. In which it was in synapse live mode. But in my case I added git and working on feature branch(not main). I'm about to ask, is there any restrictions like this will work only on main? But not now since it worked in feature branch. Ive converted this notebook into pipeline and ran it failed in that case as well yesterday hope it will work now. I don't get the proper solution But it's working now anyway. However please ensure from your side with small size spark cluster(3 to 64nodes) whether is it because of small size spark cluster. I tried restarted the cluster multiple times eventhough it failed. And in the syntax for mssparkutils.notebook.runMultiple() it was mentioned as default timer is 90seconds. But I want my notebook to complete even if it takes 10mins. So if I didn't mention the timer inside dag while creating it. Will it fail?
PRADEEPCHEEKATLA 91,861 Reputation points

2024-10-14T06:04:41.41+00:00

I'm glad to hear that the issue is resolved now. Regarding your question about the restrictions on working with feature branches, there are no such restrictions. You should be able to work with feature branches without any issues.

Regarding your question about the mssparkutils.notebook.runMultiple() function, the default timeout for this function is 90 seconds. If you want your notebook to complete even if it takes more than 90 seconds, you can specify a higher timeout value by passing the timeout parameter to the function. For example, if you want to set the timeout to 10 minutes (600 seconds), you can call the function like this:

mssparkutils.notebook.runMultiple(notebook_list, timeout=600)

If you don't specify a timeout value, the function will use the default timeout of 90 seconds.

Regarding your question about running the notebook as a pipeline, if the notebook is taking longer than the default timeout value, the pipeline may fail. In this case, you can specify a higher timeout value for the pipeline by setting the timeout parameter when you create the pipeline. For example, if you want to set the timeout to 10 minutes (600 seconds), you can create the pipeline like this:

pipeline = Pipeline(workspace=ws, steps=[step1, step2], timeout=600)

If you don't specify a timeout value, the pipeline will use the default timeout value, which is 7 days.

Hope this helps. Do let us know if you have any further queries.

Share via

Facing issue while calling mssparkutils.notebook.runMuktiple([notebook_list along with parameters]) in azure synapse analytics

1 answer

Your answer