SQL Server suddenly becomes not responding.
Question
Monday, June 18, 2012 10:15 AM
We have installed clustered Microsoft SQL Server Enterprise Edition (64-bit) (10.50.3720.0) on Windows Server 2003 x64 EE SP2 (two HP servers ProLiant DL580 G7 with 4 Xeon X7560 2.27GHz and 64Gb of RAM).
The SQL Server works with ~12 high perfomance bases (24/7, about 800 users online + replication using another server as distributor + server jobs etc.)
And we have a big problem with this installation.
Some time the SQL Server working fine. Then CPU Usage on server suddenly drops down and SQL Server almost becomes not responding. In this case only one thing helps - restart the SQL Server service.
We can't use Profiler much, because that causes big problems with using applications.
When we try to limit the RAM usage (set MIN and MAX server memory to value 20480 for example) the SQL Server works slowly, it's difficult to use applications , but server still works.
I know that it's not a good description of a problem, but we still can't find what causes that kind of behavior (CPU Usage drops down and SQL Server becomes not responding).
So maybe you can help us with that? Ask any information you need.
Thanks.
All replies (15)
Tuesday, June 19, 2012 3:46 PM âś…Answered
Do you know what is running when this happens?
It does look like some code executes and consumes all your CPU - I've had it happen when some BI packages kick off.
There is an article on the SOS_Scheduler_yield wait type here:
http://blog.sqlauthority.com/2011/02/08/sql-server-sos_scheduler_yield-wait-type-day-8-of-28/
Seth
Monday, June 18, 2012 10:37 AM | 1 vote
Hi,
There may be many potential reasons, you can check the following:
- Does it happened suddenly or persists for a long time?
- Did you mean several directories have this issue while others don't? If so, I guess it may caused by the files in the directory have been locked or too many there to lag the process.
- When it hangs, open the task manager can you see devenv.exe occupy all the cpu or memory?
- Also check the event viewer to see whether error logs there.
- Sometimes, low disk space may also cause time lag, try to clean up the disk to see if it helps.
Hope it helps you.
Monday, June 18, 2012 10:40 AM
There could be a lot things going on here but I would first start be inspecting the Windows Event Logs to ensure you are not seeing any issues being reported there.
Also, a good check is to ensure that your server Power Plans are running in the High Performance mode and NOT the Balanced Power Plan. I have seen this cause CPU performance problems far too many times and so it's always a good thing to check for when looking at these types of scenarios.
John Sansom | SQL Server DBA Blog | @SQLBrit on Twitter
Monday, June 18, 2012 11:21 AM
- Happens suddenly and persist under workload until SQL Server service will be restarted.
- Didn't understand the question. What directories?
- When server will hang next time I'll check. What does it means if answer will be 'yes'?
- Event logs (Windows, SQL Server, SQL Server Agent) are clean.
- There are more than enough disk space (>100Gb on each disk).
Monday, June 18, 2012 11:27 AM
The Event logs (Windows, SQL Server, SQL Server Agent) are clean. There are no errors or warnings. I didn't mention it at my first message, my bad.
We checked Power Plans at both servers, it's "Always On".
Monday, June 18, 2012 11:46 AM
Set up a Windows Performance Monitor trace to record the server hardware metrics at the time of your issue. See here for a video tutorial on how to do this. Right now you don't have enough information about your problem scenario in order to perform concrete analysis.
Why have you set the max memory to 20GB on a 64GB server?
Do you have other applications running on this server in addition to SQL Server?
I would also suggest setting the min server memory setting back to the default value of 0 or at least reducing it away from the same value of the max server memory to say 10GB.
John Sansom | SQL Server DBA Blog | @SQLBrit on Twitter
Monday, June 18, 2012 12:56 PM
As others have said, there's not much to go on here (that and I'm only on my first cup of coffee /grins). By your description, it sound like the SQL Server is under memory pressure. If the server runs at 20Gb, but slowly, that sounds like the server is spending a lot of time paging data pages to and from the disk. Not only will this slow things down, but that means your cache will also be minimized so the server will spend a lot of time recreating execution plans that might otherwise have been preserved.
If these are physical boxes (no VM's on them) and presuming that SQL Server is the only application on the server, I would ensure the service accounts have the permission 'lock pages in memory' and then configure SQL Server with MIN memory of 25Gb and MAX memory of 58 - 60Gb. VM's have different rules regarding memory.
If you do this, you'll notice that SQL 2008 doesn't immediately carve out all that RAM; this is by design. When the server needs the RAM, it will take it and keep it.
HTH.
Tuesday, June 19, 2012 9:31 AM
Hi again.
We set max memory to 20Gb as an experiment.
Now we continue these experiments with memory amount. Right now we set min-max as 40-60 and we had SQL Server hangs again..
The SQL Server is the only one application running on the server. We have only physical servers (two clustered servers) and no VM's . All service accounts have permissions to 'lock pages in memory'.
Here is a link to download xls file with values of counters.
The values right before the hang at the left side of file. And normal state at the right.
Tuesday, June 19, 2012 9:57 AM
Hi,
There may be many potential reasons, you can check the following:
- Does it happened suddenly or persists for a long time?
- Did you mean several directories have this issue while others don't? If so, I guess it may caused by the files in the directory have been locked or too many there to lag the process.
- When it hangs, open the task manager can you see devenv.exe occupy all the cpu or memory?
- Also check the event viewer to see whether error logs there.
- Sometimes, low disk space may also cause time lag, try to clean up the disk to see if it helps.
Hope it helps you.
Tuesday, June 19, 2012 1:10 PM
Please clarify something for me - you mentioned that you've configured a cluster. When the SQL Server service becomes non-responsive. does it fail over to the passive node?
Tuesday, June 19, 2012 1:53 PM
No, it doesn't.
Sometimes we still able to connect to SQL Server via SSMS but it takes more than 3 minutes.
Tuesday, June 19, 2012 2:00 PM
what do your waitstats tell you?
Geert Vanhove
Tuesday, June 19, 2012 2:04 PM
Hi,
Can you view your server wait stats - to you get a lot of CXPacket waits - in which case it could be a query consuming all your CPU.
This script will show the top 10 wait types by time:
SELECT TOP 10
wait_type ,
max_wait_time_ms wait_time_ms ,
signal_wait_time_ms ,
wait_time_ms - signal_wait_time_ms AS resource_wait_time_ms ,
100.0 * wait_time_ms / SUM(wait_time_ms) OVER ( )
AS percent_total_waits ,
100.0 * signal_wait_time_ms / SUM(signal_wait_time_ms) OVER ( )
AS percent_total_signal_waits ,
100.0 * ( wait_time_ms - signal_wait_time_ms )
/ SUM(wait_time_ms) OVER ( ) AS percent_total_resource_waits
FROM sys.dm_os_wait_stats
WHERE wait_time_ms > 0 -- remove zero wait_time
AND wait_type NOT IN -- filter out additional irrelevant waits
( 'SLEEP_TASK', 'BROKER_TASK_STOP', 'BROKER_TO_FLUSH',
'SQLTRACE_BUFFER_FLUSH','CLR_AUTO_EVENT', 'CLR_MANUAL_EVENT',
'LAZYWRITER_SLEEP', 'SLEEP_SYSTEMTASK', 'SLEEP_BPOOL_FLUSH',
'BROKER_EVENTHANDLER', 'XE_DISPATCHER_WAIT', 'FT_IFTSHC_MUTEX',
'CHECKPOINT_QUEUE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'BROKER_TRANSMITTER', 'FT_IFTSHC_MUTEX', 'KSOURCE_WAKEUP',
'LAZYWRITER_SLEEP', 'LOGMGR_QUEUE', 'ONDEMAND_TASK_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BAD_PAGE_PROCESS',
'DBMIRROR_EVENTS_QUEUE', 'BROKER_RECEIVE_WAITFOR',
'PREEMPTIVE_OS_GETPROCADDRESS', 'PREEMPTIVE_OS_AUTHENTICATIONOPS',
'WAITFOR', 'DISPATCHER_QUEUE_SEMAPHORE', 'XE_DISPATCHER_JOIN',
'RESOURCE_QUEUE' )
ORDER BY wait_time_ms DESC
Tuesday, June 19, 2012 2:46 PM
Here is top 10 waits while SQL Server is at normal state:
And here is top 10 waits when SQL Server becomes really slow and CPU Usage drops down to minimum.
Tuesday, June 19, 2012 2:57 PM
you have a CPU bottleneck, scheduler yield means your thread can not be handled within its 4ms quanta.
ASYNCC_NETWORK_IO is related with sending data to a client. Are there cursors involved?
Geert Vanhove