Server slowness

Sam 1,476 Reputation points
2022-04-04T11:28:32.2+00:00

Hi All,

We have a QA server on which a bunch of databases are hosted.
It hosts like 20-25 databases. All these databases are not active all the team.
At set of databases indicates each environment like QA1 , QA2 , QA3 etc... In each set , there is 1 big db which is around 4.15TB and others are 500GB each.
Multiple testing team uses multiple environments. Sometimes only 1-2 environments are performing testing.
During the time, the workload queries gets executed fine. But whenever, multiple teams starts testing 4-5 environments
at same time , the performance is very slow. a query which takes 2-3 mins to finish would take more than 30 mins.
During these times, testing complains a lot of query performance issues. We dont find any blocking and mostly see PAGEIOLatches on all databases.

The server has 16 CPUs and 256GB RAM. Max server memory set to 80% i.e 205GB.

Questions

Q) This is kind of non-technical question. More of dealing with a pressure situation when servers are running slow.

I usually check number of connections to the users which shows the activity in each database and share it with users.
Sometimes we try bumping the max server memory and gets relief but still we see slowness in the execution of app queries.
I typically use, sp_whoisactive to see current acitivity. for blocking and waits...
All db are isolated to its own separate disks, autogrowths are set to 500mb fixed growth, tempdb has it own separate disks, instant file initialization is on, presize data and log files etc...
Really sometimes I am not finding answers on how can explain it to the end users/testers that multiple environments are running and that's reason for slowness.
They just keep sending escalation mails that db performance is slow. How can we tell / convey them in layman terms.
Most of the business users are non-technical folks.
What information do we need to gather so that we can show them or tell them so they understand?
for instance, If we show the I/O warnings from errorlog , then they will says its something wrong with sql server.
some people just look at PLE and say, there is no memory.
Even if we provide the list of long running queries, high CPU queries no action taken. They say, it was running fine on development server but on this server it takes long long time.
Lots of email noise goes on.. can't say in words. This continues especially when multiple releases happen in same month.
I feel somewhere I am not able explain. How to deal in such situations. How to set the clear expectations and at the same time I feel somewhere I am missing a systematic approach.
Please share your thoughts. Looking for some candid and detailed explanation.

Q) For such server, how much memory is needed is actually needed?
Lets take an example )

If I have a 4TB database(db1) and another 1TB whole set of databases(db2,db3,db4) which makes 1 env.
How does the calculation go for how much memory we need for the SQL VM?
What factors to be considered in determining that this server must need this much minimum memory and this much disk throughput.

Thanks,
Sam

SQL Server
SQL Server
A family of Microsoft relational database management and analysis systems for e-commerce, line-of-business, and data warehousing solutions.
13,811 questions
{count} votes

6 answers

Sort by: Most helpful
  1. Erland Sommarskog 111.4K Reputation points MVP
    2022-04-04T21:26:14.387+00:00

    In addition to the other posts: If you have a single DB on the server of some size, let's say 4.5 TB, you may be able to serve that database on 256 GB of RAM, if the application is well-written and mainly access current data. That is, most data stay on disk.

    But if you slap 20 other database each 500 GB and they are active at the same time, they will all need their memory for their active part, so they will end up fighting with each other.

    So it does sound that your server is underpowered, but to say that with authority, we would need to access to the server to collect hard data to back it up.

    Alternatively, you will need to set up some booking system for the server, so that only one database is active at the time.

    2 people found this answer helpful.

  2. Bjoern Peters 8,876 Reputation points
    2022-04-04T14:32:58.653+00:00

    Regarding your overall question... how to size that server... there is no general answer to this

    Remember SQL Server tries to cache data in memory to speed up some queries which run regularly. So the optimal situation would be to store all data in memory (which doesn't work in real life), but SQL Server tries to... So if you have several environments on that SQL server that in total are more than 20TB the optimal SQL Server should have a RAM of ~20TB...

    Your server has - for the time of heavy testing - absolutely not enough memory!

    From my experience and from your numbers I would suggest, that - for that time of heavy testing - you should provide at least 1TB of RAM

    And you should investigate those situations in your storage, maybe you can optimize your database structure and file structure to improve io speed eg partitioning or separate files for indexes, and invest in some better storage controllers.

    Edit:
    Maybe trying to setup a fresh separate environment in Azure or your on-prem Hypervisor for each application and testing phase => separating workload.

    1 person found this answer helpful.
    0 comments No comments

  3. Tom Phillips 17,731 Reputation points
    2022-04-04T15:40:31.597+00:00

    The size and number of databases is not relevant to the size of the server. The USAGE of those databases determines the size of the server. No one can answer "how fast is fast enough" for your situation.

    You didn't tell us what version of SQL Server you are using.

    Also, you mention VM. VMs are very complicated. The FIRST thing you need to check is the HOST, not the SQL Server VM. If the HOST is not performing, the VM will never perform. If your HOST is swapping RAM or has slow disk IO, your VM will not perform. You will not see any indication of a problem in your VM. If you have over committed the host RAM or CPU, your VM will not perform.

    From the VM side, you should check the page swapping and CPU usage. If those are both ok, then look at disk IO. Does the SQL Server log indicate "xxx took longer than 15 seconds"? That indicates a huge disk IO problem, but 15 seconds is a very long time. If the disk is taking 12 seconds per request * 10,000 requests, you will never see that message.

    You said you do not see lock waits. What wait types are you seeing (other than CXPACKET, that is not a problem)?

    1 person found this answer helpful.

  4. Seeya Xi-MSFT 16,471 Reputation points
    2022-04-05T09:34:56.507+00:00

    Hi @Sam ,

    Hope everything goes well.
    I recommend you read this: https://www.sqlshack.com/dba-guide-sql-server-performance-troubleshooting-part-1-problems-performance-metrics/

    Best regards,
    Seeya


    If the answer is the right solution, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".
    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    1 person found this answer helpful.
    0 comments No comments

  5. Bjoern Peters 8,876 Reputation points
    2022-04-05T12:24:08.687+00:00

    Please find two good HowTos here:

    https://www.mssqltips.com/sqlservertip/6337/sql-server-memory-pressure-troubleshooting/
    https://www.mssqltips.com/sqlservertip/2304/how-to-identify-microsoft-sql-server-memory-bottlenecks/

    They are explaining everything, how to monitor and explain memory pressure in SQL Server

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.