SharePoint and Mirroring: Taking action on failovers
You might have seen the database mirroring whitepaper for SharePoint. It mentions a great article from the SQL folks on alerting on database mirroring events. One might think that monitoring the WMI events in SQL is a good or maybe the only way to take action on failovers. However, I propose a better method: Query the [Master].sys.database_mirroring view via a task or process. Here's the query:
Select mirroring_role from sys.database_mirroring where mirroring_role is not null
If the server is principal each database will return 1. If mirror, each database will return 2. Unfortunately, this isn't the only state you have to handle. Because each SQL server can be both the mirror for one database and principal for another, you have to monitor for this situation or SharePoint will be most unhappy. The query to do this is:
Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null
If this query returns more than 1, you know that you have what we call a database mirroring split situation and you need to fail over some of the databases to return to a good state. However, you can't simply failover each principal database because it's assumed that the partial failover occurred for a reason. Take for instance, the situation where one of your content databases resides on a storage array that has failed. In that situation, you would not want to fail the database back over to the original node as it would not come online until the array is repaired. To handle that situation, you want to know the original state of SQL before running your failover logic. If the role is principal, but one of the databases has failed over, you want to fail over the rest of the databases. However, if the node is the mirror, you want to ignore the situation and let the principal handle it. This means your application or process to handle failovers needs to be stateful. I've been storing this state in a user created table called roles in the MSDB database, but I will be moving this to the registry as this seems like a more robust way to handle it.
BTW.. I experimented with triggering off of the database mirroring events. I found that there were quite a few events that fired when a database fails over. It was confusing determining how to use this information. Additionally, the events didn't tell me the information I wanted to know. I ended up having to query the sys.database_mirroring table regardless. After spending some time on this, I decided it was best just to create a windows service that queries the table as described above. Additionally, I created a simple SQL agent job that tried to do the same thing as the service. Unfortunately, my SQL skills are pretty poor as I prefer C#, but it's the thought that counts right? This TSQL sample will create a SQL agent job that runs once a minute and writes the current state to a table called roles residing in MSDB. It will also attempt to handle a database mirroring split exception. This is where the code breaks down as I noticed that instead of failing over the database to the mirror, it fails it back over to the principal. Anyway's, here you go. Have fun. Oh.. and as always this code is unsupported and is provided as a sample for you to create your own failover mechanism. I left a gotcha in the code so that you'll have to clean it up before you can use it. ha HAAA! as Phil Seben says.
1: USE [msdb]
2: GO
3: /****** Object: Job [Mirroring_State_Monitor] Script Date: 10/15/2007 11:46:50 ******/
4: BEGIN TRANSACTION
5: DECLARE @ReturnCode INT
6: SELECT @ReturnCode = 0
7: /****** Object: JobCategory [[Uncategorized (Local)]]] Script Date: 10/15/2007 11:46:51 ******/
8: IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1)
9: BEGIN
10: EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]'
11: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
12:
13: END
14:
15: DECLARE @jobId BINARY(16)
16: EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'Mirroring_State_Monitor',
17: @enabled=1,
18: @notify_level_eventlog=0,
19: @notify_level_email=0,
20: @notify_level_netsend=0,
21: @notify_level_page=0,
22: @delete_level=0,
23: @description=N'No description available.',
24: @category_name=N'[Uncategorized (Local)]',
25: @owner_login_name=N'domain\username', @job_id = @jobId OUTPUT
26: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
27: /****** Object: Step [Check State] Script Date: 10/15/2007 11:46:52 ******/
28: EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Check State',
29: @step_id=1,
30: @cmdexec_success_code=0,
31: @on_success_action=1,
32: @on_success_step_id=0,
33: @on_fail_action=2,
34: @on_fail_step_id=0,
35: @retry_attempts=0,
36: @retry_interval=0,
37: @os_run_priority=0, @subsystem=N'TSQL',
38: @command=N'
39: -- This script monitors the state of the mirror partners and takes action to fix any problems
40: -- This script uses a user created table [MSDB].dbo.role to store the state of the individual node
41: -- Principal = 1; Mirror = 2;
42: --
43: -- Identify if we have a split state
44: -- Get the role from the role table and make sure we don''t have more than 1 database role
45: IF (Select r.role from [MSDB].dbo.role r ) = 1 AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) > 1
46: BEGIN
47: --Generate list of principal databases
48: DECLARE @principalDB as table(dbname varchar(255) NOT NULL);
49: --Insert the names of the principal databases into the table variable
50: INSERT INTO @principalDB SELECT d.name
51: from sys.database_mirroring s, sys.databases d
52: WHERE s.database_id = d.database_id and s.mirroring_role = 1;
53: --Enumerate through the principal databases and fail over
54: WHILE (Select count(*) from @principalDB) > 0
55: BEGIN
56: --Get the first database name from the table
57: DECLARE @dbName as varchar(100)
58: SET @dbName = (SELECT Top 1 dbname from @principalDB)
59: --Create a string to store the failover command
60: Declare @cmdString as varchar(255)
61: Set @cmdString = ''USE MASTER; ALTER DATABASE '' + @dbName + '' SET PARTNER FAILOVER;''
62: --Execute the command string to failover the databases
63: EXECUTE (@cmdString);
64: --Delete the database from the list since it has aleady been failed over
65: DELETE FROM @principalDB WHERE dbname = @dbName
66: END
67: --Finally, update the role to mirror
68: Update [MSDB].dbo.role SET role = 2
69: END
70: ELSE IF (Select r.role from [MSDB].dbo.role r ) = 1
71: AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) = 1
72: AND (Select count(mirroring_role) from sys.database_mirroring where mirroring_role = 1) = 0
73: -- This server is principal in name only but should be the mirror
74: -- Update the role to mirror
75: Update [MSDB].dbo.role SET role = 2
76: IF (Select r.role from [MSDB].dbo.role r ) = 2
77: AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) = 1
78: AND (Select count(mirroring_role) from sys.database_mirroring where mirroring_role = 2) = 0
79: -- This server is mirror in name only but should be the principal
80: -- Update role to principal
81: Update [MSDB].dbo.role SET role = 1
82:
83: -- We are done
84: -- Other potential states exist, but we don''t need to perform any action.
85: Update [MSDB].dbo.role SET role = 2
86: IF (Select r.role from [MSDB].dbo.role r ) = 2
87: AND (Select count(distinct mirroring_role) from sys.database_mirroring where mirroring_role is not null) = 1
88: AND (Select count(mirroring_role) from sys.database_mirroring where mirroring_role = 2) = 0
89: -- This server is mirror in name only but should be the principal
90: -- Update role to principal
91: Update [MSDB].dbo.role SET role = 1
92:
93: -- We are done
94: -- Other potential states exist, but we don''t need to perform any action.',
95: @database_name=N'master',
96: @flags=0
97: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
98: EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
99: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
100: EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Schedule 1',
101: @enabled=1,
102: @freq_type=4,
103: @freq_interval=1,
104: @freq_subday_type=4,
105: @freq_subday_interval=1,
106: @freq_relative_interval=0,
107: @freq_recurrence_factor=0,
108: @active_start_date=20071015,
109: @active_end_date=99991231,
110: @active_start_time=0,
111: @active_end_time=235959
112: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
113: EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
114: IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
115: COMMIT TRANSACTION
116: GOTO EndSave
117: QuitWithRollback:
118: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
119: EndSave:
Comments
Anonymous
January 24, 2008
Hello all, I've been meaning to get around to posting this script for the SharePoint community. IAnonymous
January 14, 2009
Actually, I think WMI events are a lot better.. you don't have to store anything anywhere (you just react to the events)... it also does not require any polling since WMI will notify you when failovers happen. I am covering the events and posting samples in my blog (new, so bare with me).