SharePoint Timer jobs not running(Specially one-time timer jobs)

Problem description:
====================
In your SharePoint farm you start to experience that the timer jobs are not running normally. Like custom timer jobs or one-time timer jobs do not run once you try running it. This also affects the UPA sync service not getting started or provisioning as they create one-time timer jobs. Another good example to identify the problem is to run "Merge-SPLogFile" command that creates one-time timer jobs to collect logs from all the servers in the farm.

Cause:
======
1. Although the SharePoint Timer service was started in services.msc, the timer service instance object (this is a SharePoint farm object) may be set to “Disabled”.Use below script to get the status of all the timer instances in the farm.

$farm = Get-SPFarm
$FarmTimers = $farm.TimerService.Instances
foreach ($FT in $FarmTimers)
{
write-host "Server: " $FT.Server.Name.ToString();
write-host "Status: " $FT.status;
write-host "Allow Service Jobs: " $FT.AllowServiceJobs;
write-host "Allow Content DB Jobs: " $FT.AllowContentDatabaseJobs;"`n"
}

2. The "TimerJobHistory" table in SharePoint Config DB may have too many records which causes timer jobs not to function correctly. This is followed because "Delete job history" timer job not running for a long time. This job is supposed to purge timer job history older than 7 days.Use below SQL queries and commands to validate the possibility.

>>SQL query on your SharePoint Config DB.

Select COUNT(*) from [SharePoint_Config].[dbo].[TimerJobHistory] with(nolock)
Select Top 100 * from [SharePoint_Config].[dbo].[TimerJobHistory] with(nolock) order by StartTime

Output will show huge number of records e.g. 5 Million records. Also, will also notice that the oldest timer job history record("StartTime" column) in DB is for an older date (normal would be 7 days old). Older records are supposed to be purged by "Delete job history" timer job which, you can validate below.

>>PowerShell command on your SharePoint server.

Get-SPTimerJob | ?{$_.DisplayName -eq "Delete job history"} | fl

This will return the "LastRunTime" of the timer job. Normally this will be date when the issue started

Solution:
==========
>> For the 1st issue, you run below PowerShell command to re-provision the instance to "Online" state.

$farm = Get-SPFarm
$FarmTimers = $farm.TimerService.Instances
foreach ($FT in $FarmTimers)
{
write-host "Server: " $FT.Server.Name.ToString();
write-host "Status: " $FT.status;
write-host "Allow Service Jobs: " $FT.AllowServiceJobs;
write-host "Allow Content DB Jobs: " $FT.AllowContentDatabaseJobs;"`n"
}
$disabledTimers = $farm.TimerService.Instances | where {$_.Status -ne "Online"}
if ($disabledTimers -ne $null)
{
foreach ($timer in $disabledTimers)
{
Write-Host -ForegroundColor Red "Timer service instance on server " $timer.Server.Name " is NOT Online. Current status:" $timer.Status
Write-Host -ForegroundColor Green "Attempting to set the status of the service instance to online..."
$timer.Provision()
$timer.Start()
write-host -ForegroundColor Red "You MUST now go restart the SharePoint timer service on server " $FT.Server.Name}}
else
{
Write-Host -ForegroundColor Green "All Timer Service Instances in the farm are online. No problems found!"
}

>> For the 2nd issue, you will need Microsoft's engagement to clear out the "TimerJobHistory" table in SharePoint Config DB. This cannot be done as this is not a Microsoft supported DB operation. So, better engage Microsoft after opening a support ticket with them at https://support.microsoft.com/.
Refer this article - Support for changes to the databases that are used by Office server products and by Windows SharePoint Services

Note: Dealing with Timer instances could cause other issues with the farm. Better to open ticket with Microsoft for such issues as these may be way complex as you think.