Azure synapse metric : Memory vs Cache

Question

Azure synapse metric : Memory vs Cache

sakuraime 2,341

May I know what are the difference between the metric

AdaptiveCacheUsedPercent
MemoryUsedPercent

??

1 answer

Your answer

Answer 1

Memory is exactly what you would expect it to be- the amount of memory used vs. how much is available based on your pricing tier. The list of memory per tier is maintained here: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits

Adaptive cache is made up a set of specialized SSDs that are much faster to access than traditional types of storage. Because it would be prohibitively expensive to keep all data in these devices instead of on more traditional storage they are instead implemented as a cache, only keeping a copy of the data that will best speed up query performance.

There's an article with the cache feature announcement that has some more details: https://azure.microsoft.com/en-us/blog/adaptive-caching-powers-azure-sql-data-warehouse-performance-gains/

--- EDIT for more info ---

Here's some more details for you: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/performance-tuning-result-set-caching

That and the rest of the pages in that section has a good bit of info on how it works, including rules on what goes into cache and when it is used to return results.

Max cache size is 1TB per database, but the database will push things out of cache under certain circumstances even if that limit isn't hit. For example, if a result set hasn't been used in 48 hours. You can also clear the cache by turning the feature off or running DBCC DROPRESULTSETCACHE on the database.

Storage and cache work together more so than it is related to memory usage. Memory gets used alongside CPU while processing the query. When a query wants to retrieve data it will try to get it from the cache first since that is much faster and then go to the main storage only if it has to. Once the query is finished the memory will be released immediately, but storage and possibly cached data will stay.

There are scenarios listed in the docs I linked where you want to make sure caching is turned off since it will actually hurt performance to try to work with cache instead of directly with storage. or example, if you run a query once a week that returns 5GB of data you would be spending significant compute resources loading the data into cache, but since queries are evicted after 48 hours you won't gain any benefit from it.

A few commands to get you started:

ALTER DATABASE {database name} SET RESULT_SET_CACHING ON{OFF}; Turns the feature on and off when run from the master db. Running SET RESULT_SET_CACHING ON on a specific database will change the setting for that session
DBCC SHOWRESULTCACHESPACEUSED shows you info on your current cache. This will throw an error if the feature is turned off
SELECT request_id, command, result_cache_hit FROM sys.dm_pdw_exec_requests WHERE request_id = <'Your_Query_Request_ID'> will show you info on whether the cache was utilized in a specific query
SELECT step_index, operation_type, location_type, status, total_elapsed_time, command FROM sys.dm_pdw_request_steps WHERE request_id = <'request_id'>; will give you info on the performance cost for loading data into cache. In particular, it will help you determine if turning caching off for a particular query would be helpful.

sakuraime 2,341 Reputation points

2021-05-25T07:18:54.647+00:00

cool.
so what's the cache size for each compute node ?
and how it will filled up ? I believe it should fill up the memory first . For example DWU100 has 60GB memory

Any DMV to check what data are in cache and what data are in Memory
Samara Soucy - MSFT 5,141 Reputation points

2021-05-25T19:21:25.767+00:00

@sakuraime I've updated my answer since there is a character limit for comments. Hopefully that helps clear things up.
gCW1886 26 Reputation points

2022-11-09T14:39:10.867+00:00

I'm a little confused about the second part of your answer.
'result-set-caching' is documented in the pages you mentioned and the per db 1TB limit is clear enough.
Adaptive caching as a general feature in Synapse (since Gen2 storage) seems to (also) work as an object caching mechanism (as opposed to result caching).
Furthermore, it is turned on by default.

To me the information available indicates that the (object caching) mechanism probably scales with DWUs and this cache resides at the compute nodes, as opposed to the result-set-caching which is controlled by the single control node.
Since both utilize the same SSD-based tech, they might have been introduced at the same time.
(e.g. https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resource-consumption-models)

Do you agree with this distinction?
If so:

Is there any info available about the adaptive (object) cache size and its scalability with DWUs?

Do the AdapativeCache metrics (used and hit) include both object and result cache? (they definitely include the object cache, since they show activity without result-set-caching turned on)?

Thanks!

Share via

Azure synapse metric : Memory vs Cache

1 answer

Your answer