mailbox databases dismounting frequently without followable reason

Question

mailbox databases dismounting frequently without followable reason

SCTF 20

I have a 3 node DAG (EX2016, current patchlevel), 2 in "main" AD site and 1 in backup site. It's about 400 users homed in 9 mailbox databases currently. Since a few days on of the nodes in the main AD site doesn't keep db mounts anymore.

So, when i e.g. do a manual switchover, the db is mounted properly for some seconds. Then a store transient error occurs, results in a disablement of "readfrompassive" feature, dismounts the db again and failing over to node 1 again or node 3 - depending the load.

The "error structure" respectively the "process" happening when switching active copies to this server is always the same. I can reproduce it with every database on this server.

Manual or automatic switchover to "node2" occurs, starting with Event 2090 informing about.
Followed by 5 ESE informational events telling the log reply and db attaching.
Then "MSExchangeIS" reports that the feature component "readfrompassive" (1061) has been re-enabled.
Then the first error occurs "1001 - MSExchangeIS - MSEXCH info store encountered an internal logic error"
Followed by an unhandled exception "1002 - processid perf counter (0) does not match actual process id.."
Followed by a watson report error (4999) of the M.E.Store.Worker
Only then the next error provides a bit (!) more information: "489 ESE - attempt to open edb-file for read only access failed with system error 32 (used by another process)"
This is then followed by some Windows Error Reporting events (informational, 1001), some informational MSExchangeIS events (1021,40008,40036) ...
Followed by another bunch of ESE events telling about log replays of the db about to be mounted ... ??
Funny enough ...: Only then "MSExchangeIS - 40008" reports that the db has been mounted successfully.
Stopping worker process (1021)
Starting MapiAddressBookAppPool (2000,2001)
"MSExchange Mid-Tier Storage" informs about "PICW Core feature not enabled" (11000)
Now another 4999 error is logged about "MSExchangeDelivery, M.ExchangeStoreProvider, M.E.M.S.DeliveryItem..." crashing > stacktrace result "MS.Mapi.CrossServerConnectionPolicy.Apply"
Followed by 2 Windows Error Reporting informational events
And finally followed by a ExchangeStoreDB event (126) informing about that a db copy on this server caused an error which resulted in dismounts. The reason should be looked up in previous events (see above ^^)

Referring this it looks like that the server is trying to mount a db 3 times by default .... ? Every time it results in more or less same error events.

What i have already tried:

purge one of the passive copies on problematic node, delete related files and recreate the passive copy again newly
- Try to reproduce: Still failing as described above.
Review file/folder permissions on db and log storage locations (separated disks/volumes).
- Actually there was an issue with partially incorrect permissions -> corrected all ownership back to administrators group and re-applied default permissions
  - Unfortunately, this had also no effect at all - still same failing while activating db copy on this node
Sending server to maintenance mode and shut down, check dbs with eseutil: all in dirty shutdown state (but still mounting even though only for a short time??)
Read tons of kb articles and forum discussions related to such and similar behavior - no further insights unfortunately.

Any help, ideas, tipps & tricks are highly appreciated. :)

Accepted answer

1 additional answer

Your answer

Answer 1

According to the information you have given, it appears that you are experiencing a problem with the "readfrompassive" capability on one of the DAG nodes. This feature enables faster database mounting by allowing Exchange to read from a passive copy of the database. The database may not mount correctly if there is a problem with the "readfrompassive" capability.

The problems you're seeing in Event Viewer point to a potential issue with the node's database and log files' permissions. The permissions have previously been fixed, however the issue persists. This could indicate that there is another permissions problem, a problem with the database, or a problem with the log files itself.

Using the "eseutil /mh" command to examine the status of the database and log files is one method of troubleshooting this problem. This command will check the files for corruption and mistakes. If there are any issues, you must use the "eseutil /r" command to fix the files.

If the "eseutil" command does not reveal any issues, you might need to get in touch with Microsoft Support for additional help. They will be able to assist you in troubleshooting the problem and identifying its underlying cause.

SCTF 20 Reputation points

2023-07-20T07:38:48.28+00:00

I've already checked the dbs with eseutil /mh. All of 'em are in dirty shutdown states. Soft Recovery didn't help until now.

In the meantime, i'm thinking about suspending the DAG, removing all db copies, recreate the volume (just in case) and recreate the DAG again. What do you think ?
At the same time i'm looking for some guidance on what to take care of while recreating a DAG. Especially, without disturbing the users ;)

Answer 2

Hi @SCTF

Do you have some anti-virus, backup or other third-party software installed and running on the node2 server?

And do you have any firewall between Exchange servers?

If yes, I would suggest uninstalling or turning off these software or firewall between Exchange servers to see if it can help with this issue.

If it doesn't help, please have a check if Nic Teaming is enabled on Exchange servers:

If NIC Teaming is enabled, please disable it to see if it can help.

If the answer is helpful, please click "Accept Answer" and kindly upvote it. If you have extra questions about this answer, please click "Comment".

Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

SCTF 20 Reputation points

2023-07-20T07:40:54.0733333+00:00

Luckily, it's not about such blatant rookie mistakes. ;)

Share via

mailbox databases dismounting frequently without followable reason

1 additional answer

Your answer