An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
Trying to retrieve synapse node manager logs.
Does anyone know how to open the yarn node manager logs in Synapse Spark?
I found the place where I can download the "events" data after the fact, but I haven't found the raw logs from the node manager. I believe they should be available, after a Spark job has completed. Here is the Spark configuration for it, that is found in the Spark U/I:
Any help would be appreciated.
More info: Lately I have been struggling with a large number of job failures that are apparently due to the unexpected death of the VMs (and the executors running on those VMs). The unexpected loss of VMs is a big problem for us, primarily because that loss is not directly discernable, and also because of the use of various Spark session features like "persist()" and "localCheckpoint()". Those features don't handle the loss of any VM very well, and they surface other types of obscure problems after the fact. We have already used the "idle timeout" settings that are normally supposed to prevent my dynamically-allocated-executors from being decommissioned; but those settings don't appear to withstand the sudden death of the parent VM.
I suspect that only the Yarn node manager would understand why my VM suddenly go "poof". (The stderr file on the driver mentions that it observed the loss of the VM. But it doesn't give any explanation about the reason for the loss of the VM.)
This problem is growing worse over time. In the past I had observed that our Spark Pool VMs were dying once a month but now it is many times each week. It seems serious for VMs to suddenly die for no apparent reason. I wish this was surfaced prominently in the "monitor" blade. It is probably the type of thing that is NOT expected to happen, and is that the bottom of the list of things for the U/I team to work on.