Quarantine is the system's way of identifying "An overwhelmingly high number of the operations being attempted are failing" without wasting resources (compute, network traffic). It will self-recover and leave quarantine on the next cycle where the criteria that caused it to enter the quarantine state are no longer met. If it's a temporary issue like an outage/maintenance then during testing you can manually restart the provisioning job. For customers using this at scale (i.e.: a gallery multi-tenant app), it will self recover on its own and no action can be taken by the SCIM server to expedite that.
Specific answers to your questions:
- Yes.
- Yes.
- Immediately - exact interval isn't documented, but it's an exponential backoff and roughly can be thought of as 30m -> 1h -> 2h -> 4h -> 8h.. ish.. so if there's a temporary outage lasting for an hour or less, everything should in turn recover within 1-2 hours as it'd probably be between the 2nd and 3rd retries.
- Per-object failures are held to be retried and generate what can be referred to as retries or "escrows", if you accumulate enough of those then you'll hit quarantine - OR if there's a global failure (server down, failing initial startup test calls..). If there's a global issue then there is no avoiding entering quarantine, nor should there be.
- No.