SharePoint 2019 on-prem Search and overall performance

Rumi 156 Reputation points
2020-09-22T05:52:52.627+00:00

Hello,

We went live on a medium sized farm with dedicated DC and Search servers (2 of each). I see no errors in central admin or any of the search configurations. Everything was working fine until we started to add load to the farm with a few thousand users hitting the sites. We started seeing very slow performance and latency on the sites. First thing we did was remove all our security agents to rule out any issues. However, after much observations, we discovered that as soon as search starts crawling every WFE is impacted and the sites load extremely slow to eventually server too busy error.

I produced the issue by running an incremental crawl and as soon as I start an incremental crawl, we experience this issue. We'd like to enable continuous crawl but as soon as we do that, we start seeing performance issues. So far, Search has about 15 million searchable items and when search items, it returns results and performance is quite fast. But, as soon as we start a crawl, it all goes downhill.

The farm is at:
Configuration database version: 16.0.10364.20001
Running on Windows Server 2019
SQL Server 2017 (14.0.3294.2) - dedicated to Search

Also, I'm noticing ULS showing a ton of these errors: Error encountered when creating uri from baseUrl /_layouts/15/next/odspnext/.
Any info or insight and help would be greatly appreciated.

Regards,

Rumi

SharePoint Server
SharePoint Server
A family of Microsoft on-premises document management and storage systems.
1,461 questions
SharePoint Server Management
SharePoint Server Management
SharePoint Server: A family of Microsoft on-premises document management and storage systems.Management: The act or process of organizing, handling, directing or controlling something.
2,364 questions
No comments
1 vote

5 answers

Sort by: Most helpful
  1. Rumi 156 Reputation points
    2021-01-15T14:47:49.173+00:00

    Hello,

    Apologies for the delayed feedback on this issue. Here is what was happening to our farm:

    I wanted to share this info hoping it will help others. The main issue around SP performance once we went live was related to Microsoft SQL server. The Production SQL servers were all configured with multiple sockets – each socket having 1 core. The server was allocated 16 sockets – each with 1 cores to meet the 16 cores requested for this server. The problem is that the Standard Edition of SQL server can only address 4 sockets. Since each socket had 1 core, SQL could only use 4 cores total. This was not a problem when we had a few dozen people on the new sites for testing. However, as soon as we went live, now we had a few thousand users and very soon we started to see the impact on SharePoint.

    The lesson here is that, any product that is licensed based on CPU should be taken into consideration when it comes to VM configuration. In this case, for SQL, we should not surpass 4 sockets for any SQL VM that will be running the Standard SQL version. For other products, there may be different licensing requirements which should be taken into consideration. In our case, to achieve 16 cores, the VM was reconfigured from 16 sockets to 4 sockets - each socket having 4 cores.

    Below is some screenshots that will visualize this issue.

    (the screenshots are for a VM with 8 cpu cores but the idea is the same)

    Thank you,
    57251-2021-01-15-8-45-42-sql-error.jpg57096-2021-01-15-8-42-06-cpu.jpg

    No comments

  2. Elsie Lu_MSFT 9,621 Reputation points
    2020-09-22T10:14:34.827+00:00

    Hi @Rumi ,

    Based on your description, when users try to search items ,the performance is well. The issue only occurs when you try to start a crawl, so I would like to confirm some information about your farm.

    1. Have you observed that many new files are created right after you added the users. A huge increase of new documents may cause performance issues on crawling.
    2. Please have a check on the crawl log to see if there is any error.
    3. Would you like to provide more detailed information about error messages in your ULS log?
      4.What is the spec of search servers, how large is the RAM?

    If the answer is helpful, please click "Accept Answer" and upvote it.

    Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

    No comments

  3. Trevor Seward 11,496 Reputation points Microsoft MVP
    2020-09-22T14:58:56.227+00:00

    The ULS error you listed is erroneous and normal to see. You didn't list out the specifications of the FEs -- it sounds like they're not spec'ed high enough for the load they're taking on.

    What you might also try to do is have your Search servers crawl a non-end user facing FE (like a server running the Application role) using a hosts file entry on the local Search server.

    No comments

  4. Rumi 156 Reputation points
    2020-09-23T02:57:17.2+00:00

    I'm sorry for the late reply on this. Just been addressing some other issues. Here is what my farm looks like, which is pretty identical to my SP 2016 farm that has been running since 2016 without any issues:

    4 WFEs (8 cores / 24GB RAM)
    2 Search (16 cores / 48GB RAM)
    2 DCcache (8 cores / 26GB RAM)
    2 App (8 cores / 24GB RAM)
    1 SQL server dedicated to content (16 CPUs and 64GB RAM)
    1 SQL dedicated to App and Search (16 CPU and 64 GB RAM)

    the VIP is behind a load-balancer (NetScaler). We have a good size OOS and a 3 server farm WFM farm as well and no issues there. Also, please note that I have removed AV products.

    Search crawlers are pointing to the app servers using a hosts file. This is exactly how I did in in SP 2016. I have used your book for both 2016 and 2019 deployments. I have read them page by page.

    I just tested this again. As soon as I start the crawl, performance goes down tremendously. And, as soon as I stop the crawl, a few minutes later - it's all good and snappy and happy. One thing to note is that I have Host Name Site Collections (HNSCs) in this farm. MySites is setup on a dedicated web/app but we are not even using that yet. I have removed access for mysites for all users. So, really, it is only crawling the HNSCs.

    Any help would be greatly appreciated.

    Thank you,

    Rumi

    No comments

  5. Rumi 156 Reputation points
    2020-09-24T16:13:23.377+00:00

    Hello - wondering if anyone has any updates?

    thanks,

    Rumi