I have a 3 node DAG (EX2016, current patchlevel), 2 in "main" AD site and 1 in backup site. It's about 400 users homed in 9 mailbox databases currently. Since a few days on of the nodes in the main AD site doesn't keep db mounts anymore.
So, when i e.g. do a manual switchover, the db is mounted properly for some seconds. Then a store transient error occurs, results in a disablement of "readfrompassive" feature, dismounts the db again and failing over to node 1 again or node 3 - depending the load.
The "error structure" respectively the "process" happening when switching active copies to this server is always the same. I can reproduce it with every database on this server.
- Manual or automatic switchover to "node2" occurs, starting with Event 2090 informing about.
- Followed by 5 ESE informational events telling the log reply and db attaching.
- Then "MSExchangeIS" reports that the feature component "readfrompassive" (1061) has been re-enabled.
- Then the first error occurs "1001 - MSExchangeIS - MSEXCH info store encountered an internal logic error"
- Followed by an unhandled exception "1002 - processid perf counter (0) does not match actual process id.."
- Followed by a watson report error (4999) of the M.E.Store.Worker
- Only then the next error provides a bit (!) more information: "489 ESE - attempt to open edb-file for read only access failed with system error 32 (used by another process)"
- This is then followed by some Windows Error Reporting events (informational, 1001), some informational MSExchangeIS events (1021,40008,40036) ...
- Followed by another bunch of ESE events telling about log replays of the db about to be mounted ... ??
- Funny enough ...: Only then "MSExchangeIS - 40008" reports that the db has been mounted successfully.
- Stopping worker process (1021)
- Starting MapiAddressBookAppPool (2000,2001)
- "MSExchange Mid-Tier Storage" informs about "PICW Core feature not enabled" (11000)
- Now another 4999 error is logged about "MSExchangeDelivery, M.ExchangeStoreProvider, M.E.M.S.DeliveryItem..." crashing > stacktrace result "MS.Mapi.CrossServerConnectionPolicy.Apply"
- Followed by 2 Windows Error Reporting informational events
- And finally followed by a ExchangeStoreDB event (126) informing about that a db copy on this server caused an error which resulted in dismounts. The reason should be looked up in previous events (see above ^^)
Referring this it looks like that the server is trying to mount a db 3 times by default .... ? Every time it results in more or less same error events.
What i have already tried:
- purge one of the passive copies on problematic node, delete related files and recreate the passive copy again newly
- Try to reproduce: Still failing as described above.
- Review file/folder permissions on db and log storage locations (separated disks/volumes).
- Actually there was an issue with partially incorrect permissions -> corrected all ownership back to administrators group and re-applied default permissions
- Unfortunately, this had also no effect at all - still same failing while activating db copy on this node
- Sending server to maintenance mode and shut down, check dbs with eseutil: all in dirty shutdown state (but still mounting even though only for a short time??)
- Read tons of kb articles and forum discussions related to such and similar behavior - no further insights unfortunately.
Any help, ideas, tipps & tricks are highly appreciated. :)