Workflow Timer Job (job-workflow) is dead SharePoint 2019

Sumeet Singhal 6 Reputation points
2023-11-06T14:04:38.6066667+00:00

Hi,

No Workflow timer job have been running on my SharePoint 2019 production server since 4 days, all other timer jobs are fine & working.

Environment Details: 1 App, 1 WFE

Role: Custom

Microsoft SharePoint Foundation Workflow Timer Service - In Started state on both servers

It was working ever since it was created, recently there is big slowness been cascaded to SharePoint from database backup server. As backup system was currupted the started again from begining & as all DB size reached to 1 TB it made SharePoint slow and we started facing timeout errors in workflows. Once that identified system team paused in day and ran in night, which completed successfully and since then there is just incremental backup is happening. However workflow time job crashed and never came back.

Tried different solutions as mentioned below.

  • Restarted Servers
  • Manually Restarted Windows SharePoint Timer Service
  • Resetted configuration Cache using commands(shown below)
#Get all SharePoint Servers
write-host "Resetting configuration cache on all servers...`n" -ForegroundColor Yellow
$Servers = Get-SPServer | ? {$_.Role -ne "Invalid"} | Select -ExpandProperty Address
  
#Iterate through each server and reset SharePoint config cache
Invoke-Command -ComputerName $Servers -ScriptBlock {
try {
        Write-Host "$env:COMPUTERNAME - Stopping timer service" -ForegroundColor Red
        Stop-Service SPTimerV4
  
        #Get Config Cache Folder
        $ConfigDbId = [Guid](Get-ItemProperty 'HKLM:\SOFTWARE\Microsoft\Shared Tools\Web Server Extensions\16.0\Secure\ConfigDB' -Name Id).Id #Path to the '16 hive' ConfigDB in the registry
        $CacheFolder = Join-Path -Path ([Environment]::GetFolderPath("CommonApplicationData")) -ChildPath "Microsoft\SharePoint\Config\$ConfigDbId"
  
        Write-Host "$env:COMPUTERNAME - Clearing cache folder $CacheFolder"
        #Delete all XML Files
        Get-ChildItem "$CacheFolder\*" -Filter *.xml | Remove-Item
  
        Write-Host "$env:COMPUTERNAME - Resetting cache ini file"
        $CacheIni = Get-Item "$CacheFolder\Cache.ini"
        Set-Content -Path $CacheIni -Value "1"
    }
finally{
        Write-Host "$env:COMPUTERNAME - Starting timer service" -ForegroundColor Green
        Start-Service SPTimerV4
    }
}
  • Checked SharePoint Admin Service Status (SPAdminV4)
  • Checked SharePoint Admin Service Status (SPTimerV4)
  • Checked Application Server Administration Service Timer Job Status(job-application-server-admin-service)

Results of PowerShell Commands related to 3 points above

Now checking SharePoint ADMINISTRATION Service SPAdminV4...

Server: WFE Server

Status: Online

Server: APP Server

Status: Online

All Administration Service Instances in the farm are online. No problems found with SPAdminV4.

Now checking SharePoint TIMER Service SPTimerV4...

Server: WFE Server

Status: Online

Allow Service Jobs: False (Do I need to make this true as well?)

Allow Content DB Jobs: False (Do I need to make this true as well?)

Server: App Server

Status: Online

Allow Service Jobs: True

Allow Content DB Jobs: True

All Timer Service Instances in the farm are online. No problems found with SPTimerV4.

Checking Application Server Administration Service Timer Status...

IsDisabled : False

Earlier it was showing last run, however now its showing as NA, images attached, I tried manually click run now but no luck.

Looking for someone's expert advice and direction to fix this.

Thanks & Regards,

Sumeet

User's image

User's image

Microsoft 365 and Office | SharePoint Server | For business
Microsoft 365 and Office | SharePoint | For business | Windows
{count} votes

1 answer

Sort by: Most helpful
  1. Yanli Jiang - MSFT 31,611 Reputation points Microsoft External Staff
    2023-11-16T09:44:57.6166667+00:00

    Hi @Sumeet Singhal ,

    This issue is relatively complicated and I could not reproduce it, I also could not get some inspiration according to the error.

    As I said before, it is recommended that you contact Microsoft directly.

    The support team over there has the correct escalation channel, they can involve more resource and investigate the behavior from back end as fast as possible. They also can check the behavior on your end remotely. This is the most efficient way for handling this thread as per the situation. I am sure that our Expert Engineers from that side can address this issue effectively and accurately.

    Hope this issue can be resolved as soon as possible.

    Any progress is appreciated to be shared here.


    If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.