How to wait for a timer job to complete in a multi-server farm?

In our test environment several projects deploy solutions to a single farm. Lately we have had some synchronization issues in multi-server farms. We have created a feature for updating web.config files that are used by several projects. The problem is that in multi-server farms, the timer jobs created by the web.config update mechanism in SharePoint, keep stepping on each other’s toes. So we needed a way to ensure that each update job has completed on all servers before continuing. This issue has received some attention from other bloggers earlier, but we were not able to find an implementation that we fully trust. So we had to come up with a solution ourselves, for once ;-)

So what are the steps to accomplish this?

1. We need to know the name of all the front-end servers in the farm.

private List<string> GetFrontEndServerNames(SPFarm farm)
{
return (from server in farm.Servers
from instance in server.ServiceInstances
where (instance.TypeName.Equals(
"Windows SharePoint Services Web Application"))
|| (instance.TypeName.Equals(
"Microsoft SharePoint Foundation Web Application"))
select server.Name).ToList(); }

2. We need to check whether the timer job is completed for a single server. This is the essential part. What we do is check if the job shows up in the history after a given point in time.

private bool IsJobCompleted(SPFarm farm, string serverName, string serviceTypeName,
string jobTitle, DateTime startTime)
{
var result = false;
SPService service = farm.Services.Where(f =>
f.TypeName.Equals(serviceTypeName)).FirstOrDefault();

if (service != null)
{
if (service.JobHistoryEntries.Any(
             job =>
job.JobDefinitionTitle.Equals(jobTitle) &&
job.ServerName.Equals(serverName) &&
job.StartTime > startTime))
{
result = true;
}
}
return result;
}

3. We need to keep track of timer job status on all servers, and keep checking until all the jobs are done, or a timeout is reached.

private void WaitForOnetimeJobToComplete(
SPFarm farm,
ICollection<string> serverNames,
string serviceTypeName,
string jobTitle,
TimeSpan timeout,
int secondsBeforeRetry)
{
try
{
var startTime = DateTime.UtcNow;
while (serverNames.Count > 0 && DateTime.UtcNow - startTime < timeout)
{
var tempServerNames = new List<string>();
tempServerNames.AddRange(serverNames);
foreach (var serverName in tempServerNames)
{
var isJobCompleted =
IsJobCompleted(farm, serverName, serviceTypeName,
jobTitle, startTime);
if (isJobCompleted)
{
serverNames.Remove(serverName);
}
}
if (serverNames.Count <= 0) continue;
Thread.Sleep(1000 * secondsBeforeRetry);
}
}
catch (Exception ex) { //Log error }
}

4. The above method was meant to be generalized (although not verified), so we need a more to-the-point wrapper method, like this:

private void WaitForWebConfigUpdateTimerJobToComplete(SPFarm farm)
{
var frontEndServerNames = GetFrontEndServerNames(farm);
const string typeName = "Microsoft SharePoint Foundation Web Application";
const string jobTitle = "Microsoft SharePoint Foundation Web.Config Update";
var timeout = new TimeSpan(0, 2, 0);
const int secondsBeforeRetry = 5;
WaitForOnetimeJobToComplete(farm, frontEndServerNames, typeName,
jobTitle, timeout, secondsBeforeRetry);
}

That’s it! Thanks to my colleague Roy Lofthus for providing valuable input from his experience in SharePoint deployment.