Hello,
I am in the progress of researching an Exchange outtage that occured.
We are using 2 CAS/HT servers and 2 other server installed as a 2 node DAG. All is Exchange 2010 SP2RU3
All of a sudden all users could not acces their mailboxes, the CAS Servers were rebooted and all was ok again.
After researching, I found an error on de MBX DAG server:
eventID 10025, MSExchangeIS:
There are 20 RPC requests that have taken an abnormally long time to complete. This may be indicative of performance issues with your server.
I have found some info:
and
http://technet.microsoft.com/en-us/library/ff477616
Also this one is helpful: http://johanveldhuis.nl/?tag=exchange-2010&lang=en
So here's what I think that happenend:
At some moment there were 20 RPC's not making progress. This caused the affected mailserver to stop working.
The reboot of the CAS servers solved the issue, because the RPC's stopped.
Is my theory correct? The links only mention a mailbox Quarantine when a treshold of 5 is reached for a particular mailbox. In this case the mailbox of the offending user will be quarantined, meaning not accessable.
Is doesn't explicitly mention what happens when the treshold of 20 is reached for a Server (this happend to my DAG member)
Anyone