Determine which cluster node failed for Analytics Platform System
This article describes how to determine the name of the Analytics Platform System (PDW) node that failed after a cluster failover has occurred and a cluster failover alert has been raised. As part of troubleshooting a cluster failover, you must determine the name of the node that failed before contacting Microsoft to help resolve the problem.
Background
For high availability in SQL Server PDW, the Control node and the Compute nodes are configured as active or passive components of Windows failover clusters. When an active server fails to respond to critical system requests, the passive server fails over and performs the functions of the server that failed.
After a cluster failover, when SQL Server PDW reports on node status, the passive server has a failed over status. However, it is not obvious which server or node failed, especially if the server that failed is still online. To troubleshoot the cluster failure, you must determine the name of the node that failed over.
Admin Console solution
Find the name of the node that failed
- Open the Admin Console. For more information about the Admin Console, see Monitor the Appliance by Using the Admin Console (Analytics Platform System). After failover occurs, the failover event is included in the number of alerts on the HEALTH page. There is an HEALTH page for the PDW region and for the fabric region of the appliance. Each Health page has an ALERTS tab. To learn more about an alert, select the Health page, the Alerts tab, and then select an alert.
System view solution
The following SQL statement shows how to use the sys.dm_pdw_component_health_active_alerts system view to find the name of the server that failed.
SELECT
SUBSTRING( component_instance_id, 2, charindex(' ', component_instance_id, 1)-2) AS failed_node_name,
create_time AS failover_time
FROM sys.dm_pdw_component_health_active_alerts
WHERE alert_id = 500139
ORDER BY failed_node_name;