An in-depth look at SQL Server Memory–Part 3
In part 1 and part 2 of the series, we talked about the memory architecture and the Procedure Cache respectively. In this third and final instalment of the SQL Server Memory series, I will look to focus on troubleshooting SQL Server Memory pressure issues.
Before we start on the troubleshooting part though, we need to determine the type of memory pressure that we're seeing here. I've tried to list those down here:
1. External Physical Memory pressure - Overall RAM pressure on the server. We need to find the largest consumers of memory (might be SQL), and try to reduce their consumption. It might also be that the system is provided with RAM inadequate for the workload it's running.
2. Internal Physical Memory pressure - Memory Pressure on specific components of SQL Server. Can be a result of External Physical Memory pressure, or of one of the components hogging too much memory.
3. Internal Virtual Memory pressure - VAS pressure on SQL server. Mostly seen only on 32 bit (X86) systems these days (X64 has 8 TB of VAS, whereas X86 only had 4 GB. Refer to Part 1 for details).
4. External Virtual Memory pressure - Page file pressure on the OS. SQL Server does not recognize or respond to this kind of pressure.
Troubleshooting
Now for getting our hands dirty. When you suspect memory pressure on a server, I would recommend checking the following things, in order:
1. Log in to the server, and take a look at the performance tab of the Task Manager. Do you see the overall memory usage on the server getting perilously close to the total RAM installed on the box? If so, it's probable that we're seeing External Physical Memory pressure.
2. Next, look at the Processes tab, and see which of the processes is using the maximum amount of RAM. Again, for SQL, the true usage might not reflect in the Working set if LPIM is enabled (i.e. SQL is using AWE API's to allocate memory). To check SQL's total memory consumption, you can run the following query from inside SQL (valid from SQL 2008 onwards):
select physical_memory_in_use_kb/(1024) as sql_physical_mem_in_use_mb,
locked_page_allocations_kb/(1024) as awe_memory_mb,
total_virtual_address_space_kb/(1024) as max_vas_mb,
virtual_address_space_committed_kb/(1024) as sql_committed_mb,
memory_utilization_percentage as working_set_percentage,
virtual_address_space_available_kb/(1024) as vas_available_mb,
process_physical_memory_low as is_there_external_pressure,
process_virtual_memory_low as is_there_vas_pressure
from sys.dm_os_process_memory
Go
For SQL installations prior to 2008 (valid for 2008 and 2008 R2 as well), you can run DBCC Memorystatus, and take the total of VM Committed and AWE Allocated from the memory manager section to get a rough idea of the amount of memory being used by SQL Server.
3. Next, compare this with the total amount of RAM installed on the server. If SQL seems to be taking most of the memory, or at least, much more than it should, then we need to focus our attentions on SQL Server. The exact specifics will vary according to the environment, and factors such as whether it is a dedicated SQL server box, number of instances of SQL Server running on the server, etc. In case you have multiple instances of SQL Server, it will be best to start with the instance consuming the maximum amount of memory (or the maximum deviation from "what it should be consuming"), tune it and then move on to the next one.
4. One of the first things to check should be the value of the "max server memory" setting for SQL Server. You can check this by turning on the 'show advanced options' setting of sp_configure, or by right clicking on the instance in Object Explorer in SSMS, selecting properties, and navigating to the "memory" tab. If the value is "2147483647", this means that the setting has been left to default, and has not been set since the instance was installed. It's absolutely vital to set the max server memory setting to an optimal value. A general rule of thumb that you can use to set a starting value is as follows:
Total server memory - (Memory for other applications/instances+ OS memory)
The recommendation for the OS memory value is around 3-4 GB on 64 bit systems, and 1-2 GB on 32 bit systems. Please note that this is only a recommendation for the starting value. You need to fine tune it based on observations w.r.t performance of both SQL and other applications (if any) on the server.
5. Once you've determined that the max server memory is set properly, the next step is to find out which component within SQL is consuming the most memory. The best place to start is, quite obviously, the good old "DBCC Memorystatus" command, unless you're using NUMA, in which case, it will be best to use perfmon counters to track page allocations across NUMA nodes, as outlined here.
I will try to break down most of the major components in the DBCC Memorystatus output here (I would recommend reading KB 907877 as a primer before this):I. First up is the memory manager section. As discussed earlier, this section contains details about the overall memory comsumption of SQL Server. An example:
Memory Manager KB
---------------------------------------- -----------
VM Reserved 4059416
VM Committed 43040
Locked Pages Allocated 41600
Reserved Memory 1024
Reserved Memory In Use 0
II. Next, we have the memory nodes, starting with 0. As I mentioned, because there is a known issue with the way dbcc memorystatus displays the distribution of allocations across memory nodes, it is best to study the distribution through the SQL Server performance counters. Here's a sample query:
select * from sys.dm_os_performance_counters
where object_name like '%Buffer Node%'
III. Next, we have the clerks. I've tried to outline the not so obvious ones in this table, along with their uses:
Clerk Name
Used for
MEMORYCLERK_SQLUTILITIES
Database mirroring, backups, etc.
MEMORYCLERK_SQLXP
Extended Stored Procedures (loaded into SQL Server)
MEMORYCLERK_XE, MEMORYCLERK_XE_BUFFER
Extended Events
If you see any of the clerks hogging memory, then you need to focus on that, and try and narrow down the possible causes.
Another thing to watch out for is high values for the multipage allocator. If you see any clerk with extremely high values for multipage allocator, it means that the non-Bpool area is growing due to one of the following:
i. CLR Code: Check the errorlog for appdomain messages
ii. COM Objects : Check the errorlog for sp_oacreate
iii. Linked servers: Can be checked using Object Explorer in SSMS
iv. Extended stored procedures : Check the errorlog for loading extended stored procedure messages.
Alternatively, you can query the sys.extended_procedures view as well.
v. Third party DLL's : Third party DLL's loaded into the SQL server process space. Run the following query to check:
select * from sys.dm_os_loaded_modules where company <> 'Microsoft Corporation'
Here's a query to check for the biggest multipage consumers:
select type, name, sum(multi_pages_kb)/1024 as multi_pages_mb
from sys.dm_os_memory_clerks
where multi_pages_kb > 0
group by type, name
order by multi_pages_mb desc
Yet another symptom to watch out for is a high ratio of stolen pages from the Buffer Pool. You can check this in the 'Buffer Pool' section of the MEMORYSTATUS output. A sample:
Buffer Pool Value
---------------------------------------- -----------
Committed 4448
Target 25600
Database 2075
Dirty 50
In IO 0
Latched 0
Free 791
Stolen 1582
Reserved 0
Visible 25600
Stolen Potential 22738
Limiting Factor 17
Last OOM Factor 0
Last OS Error 0
Page Life Expectancy 87529
What this means is that Buffer Pool pages are being utilized for "other" uses, and not for holding data and index pages in the BPool. This can lead to performance issues and a crunch on the Bpool, thereby slowing down overall query performance (please refer to part 1 for consumers that "Steal" pages from the BPool). You can use the following query to check for the highest "Steal" consumers:
select type, name, sum((single_pages_kb*1024)/8192) as stolen_pages
from sys.dm_os_memory_clerks
where single_pages_kb > 0
group by type, name
order by stolen_pages desc
IV. Next, we have the stores namely, Cachestore, Userstore and Objectstore. Please refer to part 1 for how and by which component these clerks are used. You can use the following queries to check for the biggest Cachestores, Userstores and Objectstores respectively:
select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024
as store_size_mb
from sys.dm_os_memory_cache_counters
where type like 'CACHESTORE%'
group by name, type
order by store_size_mb desc
go
select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024
as store_size_mb
from sys.dm_os_memory_cache_counters
where type like 'USERSTORE%'
group by name, type
order by store_size_mb desc
go
select name, type, (SUM(single_pages_kb)+SUM(multi_pages_kb))/1024
as store_size_mb
from sys.dm_os_memory_clerks
where type like 'OBJECTSTORE%'
group by name, type
order by store_size_mb desc
go
V. Next, we have the gateways. The concept of gateways was introduced to throttle the use of query compilation memory. In plain english, this means that we did not want to allow too many queries with a high requirement for compilation memory to be running at the same time, as this would lead to consequences like internal memory pressure (i.e. one of the components of the buffer pool growing and creating pressure on other components).
The concept basically works like this: When a query starts execution, it will start with a small amount of memory. As its consumption grows, it will cross the threshold for the small gateway, and must wait to acquire it. The gateway is basically implemented through a semaphore, which means that it will allow upto a certain number of threads to acquire it, and make threads beyond the limit wait. As the memory consumption for the query grows, it must acquire the medium and big gateways before being allowed to continue execution. The exact thresholds depend on factors like total memory on the server, SQL Max server memory sitting, memory architecture (x86 or x64), load on the server, etc.
The number of queries allowed at each of the gateways described in the following table:
Gateway |
Dynamic/Static |
Config Value |
Small |
Dynamic |
Default is (no. of CPU's SQL sees * 4) |
Medium |
Static |
Number of CPU's SQL sees. |
Large |
Static |
1 per instance |
So if you see a large number of queries waiting on the large gateway, it means that you need to see why there are so many queries requiring large amounts of memory, and try to tune those queries. Such queries will show up with RESOURCE_SEMAPHORE_QUERY_COMPILE or RESOURCE_SEMAPHORE wait types in sysprocesses, sys.dm_exec_requests, etc.
I am listing down some DMV's that might come in handy for SQL Server Memory Troubleshooting:
Sysprocesses
Sys.dm_exec_requests
Sys.dm_os_process_memory: Usage above.
Sys.dm_os_sys_memory: Will give you the overall memory picture for the server
Sys.dm_os_sys_info: Can be used to check OS level information like hyperthread ratio, CPU Ticks, OS Quantum, etc.
Sys.dm_os_virtual_address_dump: Used to check for VAS usage (reservations). The following query will give you VAS usage in descending order of reservations:
with vasummary(Size,reserved,free) as (select size = vadump.size,
reserved = SUM(case(convert(int, vadump.base) ^ 0) when 0 then 0 else 1 end),
free = SUM(case(convert(int, vadump.base) ^ 0x0) when 0 then 1 else 0 end)
from
(select CONVERT(varbinary, sum(region_size_in_bytes)) as size,
region_allocation_base_address as base
from sys.dm_os_virtual_address_dump
where region_allocation_base_address<> 0x0
group by region_allocation_base_address
UNION(
select CONVERT(varbinary, region_size_in_bytes),
region_allocation_base_address
from sys.dm_os_virtual_address_dump
where region_allocation_base_address = 0x0)
)
as vadump
group by size)
select * from vasummary order by reserved desc
go
Sys.dm_os_memory_clerks (Usage above)
Sys.dm_os_memory_nodes: Just a select * would suffice. This DMV has one row for each memory node.
Sys.dm_os_memory_cache_counters: Used above to find the size of the cachestores. Another sample query would be
select (single_pages_kb+multi_pages_kb) as memusage,* from Sys.dm_os_memory_cache_counters order by memusage desc
Once you have narrowed down the primary consumer and the specific component which is causing a memory bottleneck, the resolution steps should be fairly simple. For example, if you see some poorly written code, you can hound the developers to tune it. For other processes hogging memory at the OS Level, you will need to investigate them. For high consumption by a particular clerk, check the corresponding components. An example would be, say, in case of high usage by the SQLUtilities clerk, one of the first things you need to check if there is any Mirroring set up on the instance, and if it’s working properly.
Another thing I would strongly recommend would be to watch out for memory related KB articles, and make sure you have the relevant fixes applied.
Hope this helps. Any feedback, questions or comments are welcome.
Comments
Anonymous
May 11, 2013
GOOD WORKAnonymous
May 12, 2013
Thanks Nishant. :)Anonymous
May 19, 2013
Hi HarshDeep, Great Article for understanding SQL Memory Architecture. It would be more useful, if you share some more lights on III Section ( Non BPool Area).Anonymous
May 20, 2013
The comment has been removedAnonymous
June 11, 2013
Nice Article..Thanks!Anonymous
June 11, 2013
Thanks for appreciating Suresh.Anonymous
November 01, 2013
Nice sample. Thanks for sharing.Anonymous
November 02, 2013
You're welcome Durval...!!!Anonymous
November 12, 2013
Awesome !Anonymous
November 12, 2013
Thanks Samson...!!!Anonymous
December 02, 2013
Hi Singh Thanks for share all this knowledge, it's a very clean explanation and helps to understand those concepts better. I would call it "Gold work" :-)Anonymous
December 03, 2013
Thanks a lot Rene_Ville for your kind words. I'm glad that my blog helped you understand those concepts better. Regards, HarshAnonymous
February 16, 2014
Nice articles on memory HarshDeep. Any idea how internal memory pressure can be manually created on different memory regions such as plan cache, Bpool and workspace memory on SQL Server for further study?Anonymous
April 07, 2014
Great!!!Anonymous
April 22, 2014
First off let me say that this is an excellent write up. Very much appreciate what you have done here. I would like to comment though that I wish someone would take a look at how profoundly complex this structure is and come up with a way to answer a simple question in one query "Does my database server have a memory resource problem?" This appears to be beyond the capabilities of Microsoft because all the individuals are far too close to the problem and don't understand the needs of system admins who want to quickly report the health and status of their systems.Anonymous
May 06, 2014
Well written, good job.Anonymous
May 06, 2014
Hi Will...thanks for your kind words. To simulate pressure on the different areas, you will need different kinds of code. Unfortunately, this is something that is best done by oneself, as it will further help your understanding of the concepts.Anonymous
May 06, 2014
Hi Bill, Thanks for showing interest in the blog. Unfortunately, there is no "yes" or "no" flag with regards to memory pressure, simply because "it depends". The same numbers can mean different things for 2 different environments, which is why it's crucial to have an understanding of the environment, and use the approach outlined in this series to determine whether or not there is a problem, and if there is, how to fix it. Hope this helps. Thanks, HarshAnonymous
May 06, 2014
Thanks a lot Jay for appreciating...!!!Anonymous
May 27, 2014
Great work Harsh.. Very well explained.Anonymous
May 27, 2014
Thanks a lot Suresh. Glad you liked the blog.Anonymous
July 09, 2014
The comment has been removedAnonymous
July 27, 2014
You have worked on SQL Server memory. Appreciate of articles. Thanks Jayant DassAnonymous
March 02, 2015
good article....thanks a lot for sharing.Anonymous
March 13, 2015
Excellent! I really like the way you explain things.Anonymous
April 20, 2015
The comment has been removedAnonymous
May 26, 2015
very informative.. appreciate your work...Anonymous
June 10, 2015
Hi Harshdeep, Good articles. The basic memory are made clear. Good job.Keep it up. Regards, Rahul Mahimkar (SQL DBA)Anonymous
September 10, 2015
Hi Harshdeep, Thanks for putting this together. Few questions on input and output buffer caches 1.IO buffers get memory allocation from buffer pool pages?
- if a client requests large output, does it fill up output buffer cache? Does it impact the other clients and server as a whole ?
- If a client on slow network is unable to process the output quickly enough causing Asynchronous_IO can this affect server performance issue ? Assuming the slow client holds no locks or uses nolock hints to prevent blocking?
- Anonymous
September 17, 2015
Nice article.