Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Here are some notes on “SQL Server 2008 SQLOS” I took while attending an advanced class on SQL Server taught by Gert Drapers (from https://dbproj.com/ and https://blogs.msdn.com/gertd/).
Please note that, although these notes were taken during the class, I might have added some of my own (mis)interpretation :-). Always check your facts on Books Online (I try to provide links when applicable). As with anything else you get from a blog, never use any of this in production before you thoroughly validate it a test environment. Please keep in mind that some of those will be hard to follow without some pre-requisite knowledge and the right context. Reading the post from top to bottom will help.
Processes
- Processes (context, access token, virtual address space descriptors, handle table)
- SQL is a user mode process
- Threads (unit of execution, access token, shares process address space, can synchronize)
- Threads can have the same access token as process, or another one (impersonation)
- Every process starts with one thread - “main”
- Context switches
- Fibers (lightweight thread, need a thread to schedule fibers, fibers share same thread context)
Scheduling
- Pre-emptive scheduler (Windows decides when and how long you run)
- See https://windowsitpro.com/Articles/Index.cfm?IssueID=22&ArticleID=302
- Windows Internals Book
- See https://technet.microsoft.com/en-us/sysinternals/bb963901.aspx
Hardware
- Defining: Processor, CPU, Socket, Core, Logical Processor, NUMA, NUMA node
- Example: Notebook, dual-core, hyper-threaded = 2 cores, 4 logical processor
- Windows has 64 threads of execution in Windows Server 2008 – R2 will change that.
- NUMA node – group of logical processors and cache that are “near each other”
- SMP – Symmetric Multi-Processing, Front Bus to access memory, contention
- SMP layout – CPU, Northbridge (memory, graphics), Shouthbridge (PCI)
NUMA
- NUMA = Non-Uniform Memory Access / Architecture, ccNUMA
- NUMA nodes – contain own RAM, CPUs, memory controller, talk to other nodes
- Goal: minimize/eliminate front-bus contention
- Performance penalty to access memory in other nodes (foreign node)
- Applications need to be NUMA-aware to take advantage
- Soft-NUMA – try out how things behave – only performs on NUMA hardware
- Support for NUMA – BIOS setting: Node interleaving disabled = NUMA
- How does the OS knows? ACPI and SRAT – Static Resource Affinity Table
- If Multiple nodes > 1 – You have NUMA, Number of procs – actual sockets populated
- Thread scheduling with NUMA (how “ideal processor” is chosen)
- Important for the app to know the nodes and available memory on each
- https://www.microsoft.com/whdc/archive/numa_isv.mspx
Windows Server 2008 and NUMA
- Optimizations for NUMA, like dividing kernel’s non-paged memory across nodes
- Hardware has evolved and the OS added constructs to help apps
- See https://msdn.microsoft.com/en-us/library/aa363804.aspx
Windows Server 2008 R2
- Support for greater than 64 logical processors
- See https://code.msdn.microsoft.com/64plusLP
- Groups of CPUs, backward compatible with 64-bit affinity
- New APIs to use groups, set the affinity mask and use more than 64 LPs
- Nehalem System Architecture – See Intel QuickPath Architecture
- See https://www.intel.com/technology/quickpath/demo/demo.htm
- Example: HP – 64 dual-core Itanium
- Example: Unisys – 32 dual core, hyper-threaded Xeon
SQL Server Scheduling
- SQL Server 6.x scheduling – multi-threaded – worker thread pool – OS pre-empts threads
- SQL Server 7/2000 – User mode schedulers, 1 per LP – UMS work queues – fewer content switches
- - Users sessions assigned to UMS scheduler, round robin, no re-balancing
- - Bulk Insert is single threaded, not in parallel within the same scheduler
- - Memory: memtoleave, threads, bpool – simple calculations to divide the memory
- SQL Server 2005/2008 – SQLOS
SQLOS
- User mode layer between SQL Server and the Windows OS.
- Thread mode or fiber mode. Default is thread.
- See https://msdn.microsoft.com/en-us/library/ms189248.aspx
- Scheduling/memory/buffer pool/hosting/exceptions/locks/latching
- SQLOS Diagram – Protocols/Query Compilation and Execution Engine/Storage Engine/SQLOS
- SQLCLR, Reporting Services are hosted process in SQL OS
- NUMA – SQLOS directly maps to the hardware architecture, understands NUMA
- NUMA – SQLOS (only one) / SOS_MemoryNode / SOS_CPUNode / SOS_Scheduler / SOS_Task
- NUMA – Performance counters on per-node basis
- SMP – Only one SOS_MemoryNode
- See https://blogs.msdn.com/slavao/archive/2005/02/05/367816.aspx
- See https://blogs.msdn.com/slavao/articles/441058.aspx
- See https://www.redbooks.ibm.com/redpapers/pdfs/redp4093.pdf
- Hot-add CPU? Depends – Windows Server 2003 or 2008, SQL Server 2005 or 2008
- I/O Port – Per SOS_CPUNode
- See https://blogs.msdn.com/slavao/archive/2006/04/12/575185.aspx
SQLOS Architecture
- SOS_OS – Singleton – Manages nodes, system level information
- SOS_Node – 1 per NUMA node (or 1 in SMP) – Memory node, Scheduler Management
- SOS_Scheduler – CPU abstraction, binds tasks and worker threads, manages affinity
- Affinity. See https://msdn.microsoft.com/en-us/library/ms187104.aspx
- SOS_Tasks – Executes request from user, executes on a worker, abortable, timer tasks
Scheduler
- Queue : Runnable / Pending / Timer / IO
- Worker Pool, Current Work, Idle Worker, Wake Event, Abort Tasks List
- States – New, Pending, Runnable, Running, Suspended, Preemptive, Done, Monitor
- Non-preemptive – task runs until – yield, wait on sync. object, quantum expires
- Preemptive – threads not in SQLOS control – external code – wait stat skewed
- I/O completion – Worker on I/O completion port
- See https://msdn.microsoft.com/en-us/library/aa365198.aspx
- Scheduler Monitor – Preemptive/long running tasks, non-yielding, deadlock, maintenance
- Dedicated Admin Connection – DAC – Only one, reserved memory at startup, own scheduler
- DAC: Connection using sqlcmd.exe –a or server name = admin:servername
- Backup– also has its scheduler
- Ring buffer – Key construct for communication between multiple producers/consumers
DMVs
- SELECT * FROM sys.dm_os_sys_info
- SELECT * FROM sys.dm_os_schedulers
- SELECT * FROM sys.dm_os_workers
- SELECT * FROM sys.dm_os_tasks
- SELECT * FROM sys.dm_os_waiting_tasks
- SELECT * FROM sys.dm_os_wait_stats
- - Look at wait_time_ms – signal_wait_time_ms, not just wait_time_ms
- - Clear waits with dbcc sqlperf('sys.dm_os_wait_stats',clear)
- - More detailed wait stats – In SQL Server 2008 with XEvents
- SELECT * FROM sys.dm_os_threads
- SELECT * FROM sys.dm_io_pending_io_requests
- See https://msdn.microsoft.com/en-us/library/ms176083.aspx
OS Memory Management
- Windows provide virtual memory services (except for AWE and PAE)
- 32-bit gives you up to 4GB – 64-bit, in theory, up to 16 EB
- Three buckets: committed memory, reserved memory (not physically used) and free memory
- Committed: working set (RAM), paged out (pagefile) and mapped (shared components, DLLs)
- If you try to access reserved or free memory: exception
Memory Concepts
- Allocation: Reserve with VirtualAlloc() in 64KB chunks, allocation in 4KB/8KB chunks
- See https://msdn.microsoft.com/en-us/library/aa366887(VS.85).aspx
- Memory Limits for Windows - See https://msdn.microsoft.com/en-us/library/aa366778(VS.85).aspx
- /3GB – Instead of a 2/2 split for the 4GB, uses a 3/1 split – more pressure on kernel memory
- /PAE – Access to memory between 4GB and 64GB – 36 bits to address memory
- AWE – How an application can reach beyond the 4GB limit, when using /PAE
- See https://msdn.microsoft.com/en-us/library/aa366796(VS.85).aspx
- Case: Comparing 32bit, 32-bit (AWE), 32-bit (/3GB), 32-bit (AWE/3GB) and 64-bit
- WOW – Running 32-bit applications on 64-bit Windows
- See https://msdn.microsoft.com/en-us/library/aa384209(VS.85).aspx
- 64-bit – 16TB total, 8TB for kernel, 8TB application.
SQL Server Memory
- 32-bit: MemToLeave, thread stacks, page cache, plan cache, query workspace, locks, SQL Server, OS
- 64-bit: No MemToLeave, more thread stacks space
- Lock pages in memory – Only EE – See https://support.microsoft.com/kb/918483
- See https://blogs.msdn.com/psssql/archive/2007/10/18/do-i-have-to-assign-the-lock-privilege-for-local-system.aspx
- Large pages in 64-bit – Only EE – T834 – See https://support.microsoft.com/kb/920093
- Case: Running 64-bit, EE and 16 GB – By default, you are not running with large pages
- Careful – Large pages can lead to increased memory fragmentation, set SQL memory to a fixed size
- See https://msdn.microsoft.com/en-us/library/ms143432.aspx
- Calculating SQL Server max memory
- See https://msdn.microsoft.com/en-us/library/ms178067.aspx
- Configuring Windows pagefile – Minimum size in x64 is 8GB. Physical memory + 4GB
- SQL Server will swap to page file if – Not 64-bit, Not using EE, not setting “lock pages in memory”
- See https://support.microsoft.com/kb/889654
SQL Server Memory Management
- Memory Cycle: Memory broker / resource monitor / cache / heap / memory allocation
- Components: SQLSO / Memory Node / SOS_Node / Memory Clerk / Memory Object
- Components: Buffer Pool / Resource Monitor / Caching Framework / Memory Pools and Brokers
- Allocators: Single Page / Multi-Page / Large Page / …
- See https://support.microsoft.com/kb/907877
- Buffer pool: shared, every node has a piece of it.
- Buffer pool: reserved up front, committed/mapped on demand. Has a clerk, but not an allocator
- Stolen pages: piece of buffer pool “borrowed” for other purposes, like procedure cache
- Memory clerk: The actual page allocator, the one that actually gives you the memory
- Allows us to know where the memory is going, caches are memory clerks too
- Memory objects: memory allocators for arbitrary size
- Caching framework – common way to build caches
- Cache Store – User Store – Clock Algorithm – Clock Hands – Clock Entry Info
DMVs
- DBCC MEMORYSTATUS – A good starting point – Overview on a per node and per clerk details
- See https://support.microsoft.com/kb/271624
- sys.dm_os_memory_clerks – multiple types, per node – By default not fully populated
- See https://technet.microsoft.com/en-us/library/ms175019.aspx
- sys.dm_os_memory_objects
- See https://msdn.microsoft.com/en-us/library/ms179875.aspx
- sys.dm_os_memory_pools
- See https://msdn.microsoft.com/en-us/library/ms175022.aspx
- sys.dm_os_memory_cache%
- sys.dm_os_ring_buffers – Documented via KB articles
- See https://support.microsoft.com/kb/920093
Resource Monitor
- Monitors a set of processes and indicators, using a simple state machine
- Also Important when you change things like max memory, affinity mask, etc.
- Runs per node, on its own hidden scheduler, non-preemptive.
- Sends notifications to clerks – “Hey! We’re running low on resources!”
- Monitors: Low physical resources / Low Virtual Address Space / High physical resources
- Reactive or Proactive: Memory node fails to allocate 4MB / Resource monitor itself tries to allocate 4MB
- Internal pressure: Example: SHRINK notification from broker / Cache over 75% of target
- External pressure: Example: Signaled by OS via events – LowMemoryCondition
- See https://msdn.microsoft.com/en-us/library/aa490194.aspx
- Careful – Trend to consolidate multiple instances of SQL on a single, large box
- See https://blogs.msdn.com/slavao/archive/2005/02/19/376714.aspx
- Memory pressure:
- See https://blogs.msdn.com/sqlprogrammability/archive/2007/01/16/9-0-memory-pressure-limits.aspx
Memory Broker
- Dynamic memory distribution: buffer pool / optimizer / query execution / caches
- Always running, tunes memory consumption
- DMV: sys.dm_os_memory_brokers
- See https://msdn.microsoft.com/en-us/library/bb522548.aspx
Books of interest
- SQL Server 2005 Practical Troubleshooting
- Ken Henderson & others
- https://www.amazon.com/SQL-Server-2005-Practical-Troubleshooting/dp/0321447743
- SQL Server 2008 Internals
- Kalen Delaney & others
- https://www.microsoft.com/learning/en/us/Books/12967.aspx
Bonus topic: Extended Events
- Captures event information, like SQL Server Profiler, but more lightweight on capture
- Managed using CREATE/ALTER/DROP EVENT SESSION
- Very flexible way to determine which events are capture.
- Can save to another database, file., later use ETW tools to view, correlate with Windows events
- See https://msdn.microsoft.com/en-us/library/bb630319.aspx
- See https://www.microsoft.com/whdc/DevTools/tools/EventTracing.mspx