To anybody else looking at this post, we didn't have to do an OS reinstall, but we did have to reinstall Exchange and everything is back to normal now, even after all that troubleshooting listed above.
Database copy content indices keep failing, service not started, but it is
I’ve run into an issue where a customer’s content indices for database copies on a particular server go back and forth between “Unknown” and “Failed.” The error on the database copy says the Host Controller service isn’t started, but it clearly is. Restarting the service doesn’t make anything happen. Updating the copy with the -CatalogOnly switch doesn’t do anything. I’ve also completely removed the copy, files and all, then re-added the copy and the content indices are still not in a healthy state. Looking at the event log while the content index is in a failed state, I see a couple of seemingly relevant errors:
1) Event 1010 – MSExchangeFastSearch - “Could not connect to net.tcp://localhost:3803…TCP Error Code 10061 - No connection could be made because target machine actively refused it…”
2) Event 1009 – MSExchangeFastSearch – “Indexing of [mailbox database] encountered an unexpected exception…The component operation failed…error code: ‘FastError’…”
I checked IIS bindings for both front end and back end and the correct certs are there and assigned as well. Checking port “3803” to the source servers is also fine. The server with the issue is actually in its own subnet and the DAG network shows that subnet as “Up” as well. This has been happening since 31 AUG. I think there was some sort of network blip that caused this particular server to take control of the PAM, activate its database copies, etc. DAC mode isn't on yet, so we're looking at that as well. It should be noted that when the content index status is failed, I see the errors and events above. When it is in an "Unknown" status, there is no error indicated from the Get-MailboxDatabaseCopyStatus cmdlet, but in the event log, there are a couple of relevant events:
1) Event ID 1026 - .NET Runtime - Application: NodeRunner.exe, Framework Version: v4.0.30319, Description: The process was terminated due to an unhandled exception. ExceptionInfo: SystemAccessViolation at Microsoft.Ceres.SearchCore.FastServer.Plugin.CreateIndexer()...."
2) Event ID 1000 - Application Error - Faulting Application Name: Noderunner.exe; version: 16.0.1497.0; timestamp: 0x5cb8eb2d; ExceptionCode: 0xc0000005....FaultingModulePath: %EXCHANGEINSTALLPATH%\Bin\Search\Ceres\Native\Microsoft.Ceres.SearcCore.FastServer.Native.dll; Report Id: <long hex string>..."
Exchange | Exchange Server | Management
-
Joseph Larrew 341 Reputation points Microsoft Employee
2020-09-21T17:12:34.603+00:00
2 additional answers
Sort by: Most helpful
-
Lydia Zhou - MSFT 2,386 Reputation points Microsoft Employee
2020-09-10T02:38:30.193+00:00 anonymous user
What's the detailed version of your DAG members? How many DAG members do you have? Please provide more details about your environment.
Do you install any software or update on the server before this issue?Please use the following command to make sure the search index is enabled, and check the content index for all database copies:
Get-MailboxDatabase | Format-Table Name,IndexEnabled,servername Get-MailboxDatabaseCopyStatus *| select name,status,contentindexstate|Sort-Object name
Except Microsoft Exchange Search Host Controller service, please also check and insure that Microsoft Exchange Search service is running well.
When this database copy is mounted on other DAG member, does the search index issue still occur?
Please also try to mount other database copies on this server, check if the search index issue can be reproduced on other databases. We have to make sure if this issue is related to a specific DAG member server, or a particular database copy.You can try to remove the database copy on this server, and re-add it. Then mount the new db copy to check again.
Additionally, you can post the screenshot of the error event 1010 and 1009 for further analysis, and don't forget to cover your personal information.
If the response is helpful, please click "Accept Answer" and upvote it.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread. -
Joseph Larrew 341 Reputation points Microsoft Employee
2020-09-15T12:23:09.143+00:00 This is what comes up when I try to do an "Update-MailboxDatabaseCopy -CatalogOnly":
WARNING: Seeding of content index catalog for database <database name> failed. Please verify that the Microsoft Search
(Exchange) and the Host Controller service for Exchange services are running and try the operation again. Error: There
was no endpoint listening at net.tcp://localhost:3863/Management/SeedingAgent-0870A2FD-AD68-4B14-9CD6-AAB4FF4793FF12/Single that could accept the
message. This is often caused by an incorrect address or SOAP action. See InnerException, if present, for more details..Those services are indeed running on all servers. I ran a test-netconnection to each of my servers from all of my servers over ports 3800-3803,3863 and a couple of others and they all worked. I checked a netstat -a and found that the server was actually listening on those ports as well (both with 0.0.0.0:[port] and 127.0.0.1:[port]) so it doesn't look like it's that. I don't know what it means by "incorrect address or SOAP" and I don't know where to find an "InnerException." This sounds like IIS though.