Spark U/I is not loading for recently "terminated" job clustsers

David Beavon 971 Reputation points
2021-04-07T20:49:06.167+00:00

I've been having trouble with a feature in Azure Databricks. The Spark U/I will not be shown for job clusters that have recently completed ("terminated"). There is a link to the Spark U/I in the databricks portal. But when you click the link, you are presented with a status message that just says "Loading" :

Loading old UI for cluster "whatever"... This may take a few minutes.

This screen will remain the same for a long period of time (hours) and I will eventually lose patience and close the window. I haven't yet tried to wait overnight ... and even if I tried, I'm not sure if it would be reasonable to wait that amount of time for the U/I to respond.

When I previously encounter this issue, and opened a support ticket with databricks/azure-databricks, they were not able to confirm any outages during the period in question. So far we have established that there is a "Spark History Server UI" which is a shared resource that can become congested with requests from multiple customers. I'm assuming this implies that the issue is simultaneously affecting multiple customers ... although we haven't yet established that for certain.

I've been using Azure Databricks in production for a few months now, and I'm not familiar enough to know if the issue could be specific to us, or if it might be a chronic issue that affects others in the same region as well. I googled for the problem and was unable to find any results for my search. So I thought it would be good to start a new discussion about the problem here in the Q&A.

Please let me know if anyone has an explanation, or has experienced this themselves. I'm also eager to hear if there are any tricks to get the workspace to start working properly. Whenever I encountered this issue, the problem wouldn't go away on its own until a day or two had passed. I haven't yet gotten any acknowledgement of these outages from Microsoft's perspective.

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
1,928 questions
0 comments No comments
{count} votes

7 answers

Sort by: Most helpful
  1. CB 0 Reputation points
    2023-02-07T08:48:25.3166667+00:00

    I have got this error for the first time (it eventually stops loading with "An error occurred while loading the UI for driver...".

    I'll be enabling logs to enable replaySparkEvents, but thought it appropriate to flag here that the initial error does still occur.

    0 comments No comments

  2. David Beavon 971 Reputation points
    2023-02-07T14:27:38.47+00:00

    Hi CB,

    I worked on this support case for two years. I did my fair share of work to nag at this company and get it fixed. Perhaps I shouldn't have, but I closed my ticket pretty soon after they claimed they fixed the bug, and I didn't see any new recurrences.

    (In the place of databricks, we are actually trying to transition our Spark workloads to Synapse.
    But it is a large undertaking.)

    Can you please open your own ticket with this company? If you are spending more money on the service than we are, then maybe it won't take them two years to work on your support case. Here was the internal ID they gave me when they said this was fixed:

    (DB-I-3506).

    I think that opening support tickets is more likely to lead to a permanently fix (rather than complaining about the problems here in the community). I'm not sure how far you got with that "replaySparkEvents" workaround, but it is a massive pain and is probably a waste of your time as well. Just my 2 cents.

    0 comments No comments