Silent SSAS shutdown caused by writeback timeout
A number of customers using SSAS instances with writeback functionality reported frequent crashes that were not accompanied by any useful error message.
The only hint found in msmdsrv.log close to time of crash was:
(03.04.2016 13:30:21) Message: OLE DB-Fehler: OLE DB- oder ODBC-Fehler : Query timeout expired; HYT00. (Source: \\?\J:\Log\msmdsrv.log, Type: 3, Category: 289, Event ID: 0xC1210003)
A query timeout as such is no unusual thing. But under normal circumstances a failing query should never result in the SSAS instance going down.
When having a closer look at the issue it turned out that we were at the end (commit phase 2) of a cube writeback transaction, and that the bulk insert to the writeback table took longer than DatabaseConnectionPoolGeneralTimeout (=60 sec by default).
Commit phase 2 is the most sensitive phase of a SSAS write transaction. Anything that goes wrong here can potentially cause a data corruption. That’s why the SSAS instance responds to any error in commit phase 2 of a write transaction with an emergency shutdown. Which is technically not a crash, but hard to distinguish from the outside.
So what can we do here?
If your writeback command is affecting a lot of cells, then you may just need a little longer than the default 60 sec to bulk copy the changed cell values to your writeback table. In this case just increase DatabaseConnectionPoolGeneralTimeout to your needs. But remember that a prolonged commit phase 2 (which needs an exclusive database commit lock) will prevent any new connections to SSAS while it lasts. So it’s probably not advisable to specify a value that goes much beyond 5 minutes. If it takes longer, then you should better use writeback commands that affect less cells.
There is also a possibility that the bulk copy command associated with your writeback is rather quick, but it gets blocked by some other activity on the writeback table. At this point I have to mention that a writeback table should only be accessed by the SSAS instance, and that you should not modify the schema of the system generated writeback table. Especially don’t add any triggers or indexes which are known to cause trouble. For troubleshooting this type of blocking issue it will be useful to collect a sql profiler trace on the relational sql instance that hosts the writeback table. It is more of a relational SQL server than an SSAS issue.
On our side we have also filed an internal change request asking for a better error message for this type of issue.