Trying to retrieve synapse node manager logs.

Question

Trying to retrieve synapse node manager logs.

David Beavon 996

Does anyone know how to open the yarn node manager logs in Synapse Spark?

I found the place where I can download the "events" data after the fact, but I haven't found the raw logs from the node manager. I believe they should be available, after a Spark job has completed. Here is the Spark configuration for it, that is found in the Spark U/I:

Log Location

Any help would be appreciated.

More info: Lately I have been struggling with a large number of job failures that are apparently due to the unexpected death of the VMs (and the executors running on those VMs). The unexpected loss of VMs is a big problem for us, primarily because that loss is not directly discernable, and also because of the use of various Spark session features like "persist()" and "localCheckpoint()". Those features don't handle the loss of any VM very well, and they surface other types of obscure problems after the fact. We have already used the "idle timeout" settings that are normally supposed to prevent my dynamically-allocated-executors from being decommissioned; but those settings don't appear to withstand the sudden death of the parent VM.

I suspect that only the Yarn node manager would understand why my VM suddenly go "poof". (The stderr file on the driver mentions that it observed the loss of the VM. But it doesn't give any explanation about the reason for the loss of the VM.)

This problem is growing worse over time. In the past I had observed that our Spark Pool VMs were dying once a month but now it is many times each week. It seems serious for VMs to suddenly die for no apparent reason. I wish this was surfaced prominently in the "monitor" blade. It is probably the type of thing that is NOT expected to happen, and is that the bottom of the list of things for the U/I team to work on.

AnnuKumari-MSFT 34,571 Reputation points Microsoft Employee Moderator

2023-12-14T05:48:04.0433333+00:00

Hi David Beavon ,

Thankyou for using Microsoft Q&A platform and thanks for posting your query here.

I am taking your query forward to the internal team. Will keep you posted once I hear back from them regarding the ask. Thankyou.
David Beavon 996 Reputation points

2023-12-21T18:32:22.1266667+00:00

Hi @AnnuKumari-MSFT

Were you able to determine where the yarn logs might be uncovered? I am still waiting for an update.

Please let me know.

Thanks, David