About five years ago our company created an app to record the findings of a human evaluation.
A person examines a device, checks it against each criterion onscreen, and then marks it "Pass" or "Fail". Each criterion is one row in our DB.
The app hasn't changed much in the last five years. About 1.3 million evaluations done so far. About 50 million criterion-rows.
Typically it's a two-person process:
- Evaluator-1
- Evaluator-2 (a senior evaluator).
Evaluator-2 doesn't start fresh. Rather the rows of Evalutator-1 are first cloned (but marked as Evaluator-2).
After five years of success, suddenly we are seeing an occasional failure in the cloning process, where Evaluator-2 is mysteriously missing exactly one row. The cloning is done in a stored procedure that looks roughly like this:
Insert Into dbo.Workspace (PassOrFail, EvaluationID, Iteration)
Select PassOrFail, EvaluationID, 'Evaluator-2'
From dbo.Workspace where EvaluationID = @EvalID and Iteration = 'Evaluator-1'
-- The same SP ends with the following validation to verify that all rows got cloned:
Declare @count1 int = (Select count() From dbo.Workspace where EvaluationID = @EvalID and Iteration = 'Evaluation-1');
Declare @count2 int = (Select count() From dbo.Workspace where EvaluationID = @EvalID and Iteration = 'Evaluation-2');
If @count1 <> @count2 Throw 51000, N'Error in cloning', 1
- That last step (the validation-by-count) has been in place for five years, and has never thrown an exception - until now - and it is always exactly ONE ROW that is missing. If we delete the cloned rows and re-run the app, all rows clone properly.
We tried to brainstorm on this - maybe another app is deleting one cloned row? But we can't find anything in our system that would do this.
Frankly, we're fresh out of theories. Any ideas, anyone?
Note:
- The INSERT statement is not wrapped in a try-catch, nor in an explicit tran.
- Seems xact_abort is Off. I suppose we should turn it on?