How many HSMs do we need?

NOBLE Robert 1 Reputation point
2022-03-14T12:13:34.457+00:00

Hi,
We are building a new PKI, and will use HSMs for the root and issuing CAs.
We are seeing advice (on forums, and from Microsoft support and Thales) that the HSMs need high availability, and so will need at least two, and that we should use at least two for back-up as well.
Q1 - Do we need two HSMs for high availability? If the most frequent use is for issuing certificates, then will we lose the ability to issue and renew certificates for a long time if a solo HSM goes down?
Q2 - Do we have to use a Thales HSM for backup if we have a Thales HSM in live service supporting our CAs, or can we use a USB key for backup?

Options: I would like to know if we need

  • 4 HSMs (2 in Azure, two in on-prem backup locations),
  • 3 (2 in Azure, one on-prem backup)
  • 2 (2 in Azure, USB backup),
  • 2 (1 in Azure, 1 backup) or
  • 1(in Azure, USB backup)
    Any recommendations?

Thanks and regards,
Rob

Azure Dedicated HSM
Azure Dedicated HSM
An Azure service that provides hardware security module management.
25 questions
0 comments No comments
{count} votes

3 answers

Sort by: Most helpful
  1. Thameur-BOURBITA 32,496 Reputation points
    2022-03-15T07:32:31.9+00:00

    Hi,

    In this link you will find a example of high availability : high-availability

    Please don't forget to mark helpful reply as answer

    0 comments No comments

  2. Vadims Podāns 8,866 Reputation points MVP
    2022-03-16T12:35:11.47+00:00

    Do we need two HSMs for high availability?

    yes, you should have.

    If the most frequent use is for issuing certificates

    there is CRL signing as well. If you loose HSM, you loose the ability to sign CRLs, thus invalidating previously issued certificates due to offline revocation.

    I would like to know if we need

    no one here will answer this. It depends on a budget and recovery options. Say, how easy you can fallback from Azure to onprem in case if cloud HSM fails?


  3. Vadims Podāns 8,866 Reputation points MVP
    2022-03-22T08:13:14.747+00:00

    if we make sure our CRL publishing process always releases a new CRL in at least the time it takes to recover an HSM and don't use the HSM for any other part of the validation process, the OCSP responder service should always have a valid CRL to validate certificates against, right?

    that's correct.

    I would plan CRL schedule in conjunction with HSM DR (disaster recovery). You can have your CRL be valid for any reasonable period and add maximum DR time in overlap period. For example, CRL is valid for 3 days (72hrs) and your DR period is 36hrs. Then you configure CRLPeriod to 3 days and CRLOverlapPeriod to 36hrs. This will result in:

    This Update: Monday, 00:00
    Next CRL Publish: Thursday 00:00
    Next Update: Friday 12:00

    This means that your CRL is effectively valid from Monday 00:00 till Friday 12:00. CRL publish will be attempted by Thursday 00:00. If this publish fails, you have extra 36 hours to recover HSM and publish new CRL before any cached copy of previous CRL expires. For more information, please check my blog post article: How ThisUpdate, NextUpdate and NextCRLPublish are calculated (v2)

    0 comments No comments