SSAS 2019 - Tabular Process Hanging / Deadlocking

JML 16 Reputation points
2022-08-17T12:30:07.243+00:00

Hello everyone,

We have a SSAS 2019 CU16 hosting a very small tabular model, about 20-25 tables and only a couple MB in size. As expected the processing time is fairly short, only 2-3 seconds so we currently have an SQL Agent job (on a remote server) doing a full refresh every minute.

The issue is that a few times per week, the job just hangs there and cube processing is stuck. Trying to kill the connections with the following script does not always fix the issue, the connections will be killed but locks are still held on some objects and the agent job won't cancel.

    <Cancel xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">  
    	<SessionID>A7BBBE75-F009-4DA6-8962-ED2C86F194F8</SessionID>  
    	<SPID>33134</SPID>  
    	<ConnectionID>662</ConnectionID>	  
    	<CancelAssociated>1</CancelAssociated>  
    </Cancel>  

Most of the time the only solution I found was to restart the whole SSAS service. Even worse, the service won't stop normally and the process needs to be killed first.

The last time it happened I checked $SYSTEM.discover_locks and saw that there was one TYPE 2 and one TYPE 4 lock that were granted for hours, seemingly stuck.

The last statement run was:

<Batch Transaction="true"  
       xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">  
	<Refresh xmlns="http://schemas.microsoft.com/analysisservices/2014/engine">  
		<DatabaseID>TIG</DatabaseID>  
		<Model>  
			<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"  
			           xmlns:sql="urn:schemas-microsoft-com:xml-sql">  
				<xs:element>  
					<xs:complexType>  
						<xs:sequence>  
							<xs:element type="row"/>  
						</xs:sequence>  
					</xs:complexType>  
				</xs:element>  
				<xs:complexType name="row">  
					<xs:sequence>  
						<xs:element name="RefreshType"  
						            type="xs:long"  
						            sql:field="RefreshType"  
						            minOccurs="0"/>  
					</xs:sequence>  
				</xs:complexType>  
			</xs:schema>  
			<row xmlns="urn:schemas-microsoft-com:xml-analysis:rowset">  
				<RefreshType>1</RefreshType>  
			</row>  
		</Model>  
	</Refresh>  
	<SequencePoint xmlns="http://schemas.microsoft.com/analysisservices/2014/engine">  
		<DatabaseID>TIG</DatabaseID>  
	</SequencePoint>  
</Batch>  

I'm not sure what else to do, I searched around but couldn't find anything that worked. Here are the additional troubleshooting steps I've taken that didn't work:

  1. Changed CoordinatorSafeJobUnblocking to 0 (Based on a KB that I can't find anymore)
  2. Adjusted LowMemoryLimit, TotalMemoryLimit and VertiPaqMemoryLimit
  3. Checked that ForceCommitTimeout is set (it's 30000)

The latest thing I tried that I don't know if it will work or not is set the maxParallelism for this refresh to 1. Even if this works it's still suboptimal since it tripled the refresh time.

I haven't updated to CU17 yet since there was nothing about this in the release logs, but I might try that soon.

I would be grateful for any help that can be provided.

Thanks!

SQL Server Analysis Services
SQL Server Analysis Services
A Microsoft online analytical data engine used in decision support and business analytics, providing the analytical data for business reports and client applications such as Power BI, Excel, Reporting Services reports, and other data visualization tools.
1,343 questions
{count} vote

2 answers

Sort by: Most helpful
  1. Joyzhao-MSFT 15,631 Reputation points
    2022-08-18T03:07:42.797+00:00

    Hi @JML ,
    If you suspect you are having some deadlock issues with SQL Analysis Services, you can follow the links below to trouble shoot the problem:
    Analysis Services stops accepting new connections.
    Deadlock Troubleshooting in SQL Server Analysis Services ( SSAS ).
    Best Regards,
    Joy


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.


  2. John Buchanan 1 Reputation point
    2022-10-12T14:59:46.743+00:00

    @JML
    I don't know if I am having the same problem as you but it sounds like it. I have a model that refreshes once a day and usually takes about 7-10 minutes. The refresh is triggered by a SQL Agent job which runs a full refresh on the model.

    {
    "refresh": {
    "type": "full",
    "objects": [
    {
    "database": "Model_FCP"
    }
    ]
    }
    }

    When it does hang up, I have let it run for as long as 8 hours and it will not complete and killing then restarting the job does not work. I have done some testing and it will be a single table in the model that is locking up on refreshing, but not always the same table.

    What I have had to do is change the partition definition on all of the tables, then the refresh will run. The change I make to the partition definition is meaningless, it actually changes nothing. It seems like there is something "corrupted?" in the model and making this change resets it.

    The change I make is this:

    from:
    "query": "SELECT [dbo].[vMEMBER].* FROM [dbo].[vMEMBER] WHERE [Client ID] = 'FCP'"
    to
    "query": "SELECT [dbo].[vMEMBER].* FROM [dbo].[vMEMBER] WHERE [Client ID] = 'FCP' AND [Client ID] IS NOT NULL"

    I make that same change to every table and the refresh will run fine. The next time it locks up (in a week or so) I change it back and it will refresh again.

    I don't know if this information will help you or if you have found a solution to the problem but I thought I would share. If you do have a solution I would love to hear it.

    Regards,

    John

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.