Hello everyone,
We have a SSAS 2019 CU16 hosting a very small tabular model, about 20-25 tables and only a couple MB in size. As expected the processing time is fairly short, only 2-3 seconds so we currently have an SQL Agent job (on a remote server) doing a full refresh every minute.
The issue is that a few times per week, the job just hangs there and cube processing is stuck. Trying to kill the connections with the following script does not always fix the issue, the connections will be killed but locks are still held on some objects and the agent job won't cancel.
<Cancel xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<SessionID>A7BBBE75-F009-4DA6-8962-ED2C86F194F8</SessionID>
<SPID>33134</SPID>
<ConnectionID>662</ConnectionID>
<CancelAssociated>1</CancelAssociated>
</Cancel>
Most of the time the only solution I found was to restart the whole SSAS service. Even worse, the service won't stop normally and the process needs to be killed first.
The last time it happened I checked $SYSTEM.discover_locks and saw that there was one TYPE 2 and one TYPE 4 lock that were granted for hours, seemingly stuck.
The last statement run was:
<Batch Transaction="true"
xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<Refresh xmlns="http://schemas.microsoft.com/analysisservices/2014/engine">
<DatabaseID>TIG</DatabaseID>
<Model>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:sql="urn:schemas-microsoft-com:xml-sql">
<xs:element>
<xs:complexType>
<xs:sequence>
<xs:element type="row"/>
</xs:sequence>
</xs:complexType>
</xs:element>
<xs:complexType name="row">
<xs:sequence>
<xs:element name="RefreshType"
type="xs:long"
sql:field="RefreshType"
minOccurs="0"/>
</xs:sequence>
</xs:complexType>
</xs:schema>
<row xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">
<RefreshType>1</RefreshType>
</row>
</Model>
</Refresh>
<SequencePoint xmlns="http://schemas.microsoft.com/analysisservices/2014/engine">
<DatabaseID>TIG</DatabaseID>
</SequencePoint>
</Batch>
I'm not sure what else to do, I searched around but couldn't find anything that worked. Here are the additional troubleshooting steps I've taken that didn't work:
- Changed CoordinatorSafeJobUnblocking to 0 (Based on a KB that I can't find anymore)
- Adjusted LowMemoryLimit, TotalMemoryLimit and VertiPaqMemoryLimit
- Checked that ForceCommitTimeout is set (it's 30000)
The latest thing I tried that I don't know if it will work or not is set the maxParallelism for this refresh to 1. Even if this works it's still suboptimal since it tripled the refresh time.
I haven't updated to CU17 yet since there was nothing about this in the release logs, but I might try that soon.
I would be grateful for any help that can be provided.
Thanks!