question

carlchristianson-2540 avatar image
0 Votes"
carlchristianson-2540 asked pituach edited

SQLAgent job randomly cancels

We have an ETL process that runs nightly via SQL Agent. I'd say about 1 out of 5 times the job just stops. The status is failed in the job history but no real error message to see. Using an output file the only message I get is: "Msg 0, Sev 0, State 1: Operation canceled [SQLSTATE HY008]"

If i re-run the job it will start over and run successfully. Looking at that message the only thing I see is that basically an async job was told to cancel and did (para pharasing). There are no errors in the SQL Server log, windows logs, sql agent logs. Nothing that helps me figure out why this job was cancelled. It also doesn't always cancel at the same spot.

The ETL process we use is pretty much a series of sprocs that get called that use select ... into ... < table > type of statements. Nothing too fancy going on. It's completely maddening as it's not consistent in any way. How do I get more information on WHY the job was cancelled? Really any more info would be better then I have now. Very frustrated with sql server/windows at the moment...

sql-server-generalsql-server-transact-sql
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

What type of job step is this?

What is the output from "SELECT @@version"?

0 Votes 0 ·

Hi,

Usually I do not like to enable Extended Events for long time or for monitoring wide range of events since it can impact performance. But in this case we surly need more information and there are two reason whv this will probably can be done smart for short period of time and will probably catch the time that the issue appears

(1) I assume that you scheduled the job to run when the server is less active. So enabling EE will not be a big deal at that time. (2) It fails 1 out 5 means that there is a good chance 20% to catch the issue if you monitor it when the job run. Therefore, (1) + (2) sound like it is probably makes sense to use Extended Events. Enable it before the job start and disable it after the job end/fail. In theory if you monitor 5 times then you should be able to get the error appears while monitoring the issue.

You need to get more information since there are no real randomly cancels but only what seems like.

0 Votes 0 ·

Hi carlchristianson-2540,

Welcome to Microsoft Q&A.
You can try to use extended event to select sp_statement_completed, sp_statement_starting, sql_batch_completed and other events to see if the problem can be captured.
Please check this article which might be helpful.

Best Regards,
Amelia


0 Votes 0 ·

1 Answer

carlchristianson-2540 avatar image
1 Vote"
carlchristianson-2540 answered pituach edited

Hi Amelia, thx.

So I figured out what was happening. Apparently a former sysadmin had put in (do not know why you would do this) a task scheduler job in that kills and restarts the SQL Server Agent. The ETL job more times then not would complete before that job would kick off, but if it ran just a few minutes long it would get wacked. So basically the SQL Cancel message was actually spot on, if not helpful on figuring out what was issuing the cancel request. It would be great if somehow the SQL Cancel request could have more information to help track that origin down. Previously I had seen that error on a memory pressure issue where a job would get cancelled due to that. In SQL Server 2017 how do (in transact sql) enable extended events to capture more info. I admit to looking and not quite finding it.
thx
Carl

· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Glad to hear that you got it sorted out!

I don't think the cancel message as such can be improved. But possibly Agent could have a special status saying that job was cancelled because of Agent was stopped. I guess Microsoft thought that wasn't necessary, because presumably you know that you stopped Agent. :-)

To trace the event as such, you would trace for attention signal, but that would not tell you much. An attention signal is what the client sends when it does not want to wait for the client anymore. It can be a query timeout or it can be that red button in SSMS. But SQL Server only knows that it should stop running.

0 Votes 0 ·

The SQL Agent error log would indicate it was restarted. I was just about to suggest you look at the agent log to see if it was being restarted. I have seen this kind of problem when Agent crashed for some reason.

Glad you found it.

PS. There is no reason to restart any SQL Server service regularly.

0 Votes 0 ·

Thanks for coming back with the update.

Please close the thread by marking one of the answers which helped you the most or mark your answer, if none of the others led you to the solution

0 Votes 0 ·