Step 1: Prepare for your deployment
The first step in the deployment of your HPC cluster is to make important decisions, such as deciding the number of head nodes, and choosing a network topology for your cluster. The following tasks will help you prepare for your cluster deployment.
1.1: Review system requirements
If you have not done so already, review the System Requirements for Microsoft HPC Pack 2019. Note that HPC Pack has different requirements for different node roles and deployment options. You might want to review the system requirements again, after you have finalized the decisions for your deployment.
1.2: Decide if you want to configure your head node for high availability
If you will need to continue running HPC jobs during a planned or unplanned disruption in services on a head node computer, you can plan to configure the head node for high availability. To do this, you will need to install HPC Pack on at least two head node computers.
1.3: Decide if you want to deploy your cluster with remote databases
HPC Pack 2019 requires and supports Microsoft SQL Server 2014 or a later version. HPC Pack uses five different SQL Server databases to store cluster management, job scheduling, reporting, diagnostics, and monitoring data. You can install one or more of these five HPC databases on one or more remote servers, instead of installing them on the head node of your cluster. By default, HPC Pack installs SQL Server Express 2019 on the head node and creates the HPC databases on the head node if you choose single head node. If you choose to deploy three head nodes, the advantage of installing the HPC databases on one or more remote servers is that it saves resources on the head node, helping ensure that it can efficiently manage the cluster.
Important
Use of SQL Server 2019 Express on the head node is recommended for proof-of-concept or development clusters, and for smaller production clusters. You should consider installing the HPC databases on one or more remote servers if your cluster will have more than 256 nodes, you plan to configure the head node for high availability, or your job throughput and reporting requirements could exceed the capabilities of SQL Server 2019 Express.
To install the HPC databases on a remote server, that server must be running the Standard or Enterprise edition of SQL Server 2008 R2 or later, and configured to work with HPC Pack. Before you install HPC Pack with remote databases, ask the database administrator to run the SetupHpcDatabase.ps1 script in the Setup folder or to manually perform or modify the tasks in the script. The script automatically creates the necessary databases and the SQL instance logins and database users for the account that will install HPC Pack and for the machine account for HPC services. For detailed information, see Deploying a Windows HPC Cluster with Remote Databases Step-by-Step Guide.
1.4: Decide what type of nodes you want to add to your cluster and how many
You can add the following types of nodes to your on-premises cluster:
- Compute nodes - Compute nodes are used for running jobs. This type of node cannot become a different type of node (that is, change roles) without being redeployed.
- Broker nodes - Windows Communication Foundation (WCF) broker nodes are used for routing WCF calls from the Service-Oriented Architecture (SOA) clients to the SOA services running on nodes in your cluster. This type of node can change roles to become a compute node without being redeployed.
- Workstation nodes and unmanaged server nodes - Workstation nodes and unmanaged server nodes are computers in your organization that can also run jobs, but they are not dedicated cluster resources. They can be scheduled to become available to run jobs at specific times, or can be made available on demand. This type of node cannot change roles.
- Microsoft Azure nodes - If you have a Microsoft Azure subscription, you can add Azure nodes on demand to increase your cluster capacity when you need it. Like compute nodes, workstation nodes, and unmanaged server nodes, Azure nodes can run jobs. When you add Azure nodes, you also configure a fixed or variable number of proxy nodes in your Azure deployment to facilitate communication between the on-premises head node and the Azure nodes.
- Microsoft Azure IaaS nodes - If you have a Microsoft Azure subscription, you can add Microsoft Azure IaaS nodes on demand to increase your cluster capacity when you need it.
For more information about node roles in a Windows HPC cluster, see Understanding Node Roles in Microsoft HPC Pack.
When HPC Pack is installed, depending on the type of node that is being created, different features are installed. These features determine the role that the node will perform in the cluster. In some cases, a node is able to change roles because it has the necessary features to perform a different role. The ability to change roles is an important aspect that you need to consider when deciding the type of nodes that you want to add to your cluster.
Another important decision that you have to make is the number of nodes that you want to add. If you are adding broker nodes, you also need to decide how many compute nodes you will add for each broker node that is available on the cluster. The ratio of broker nodes to compute nodes can affect cluster performance.
If you plan to add Azure nodes, you should consider the number of proxy nodes that is optimal for the number of nodes deployed in Azure and the jobs that will run on those nodes. The proxy nodes are required for communication with the on-premises head node and can be a bottleneck for certain cluster sizes and workloads.
Finally, if you want to configure the head node or a broker node in a failover cluster, you need one additional computer for each failover cluster node that you configure, which might reduce the number of compute nodes that you can add to your cluster.
1.5: Choose the Active Directory domain for you cluster
From HPC Pack 2016 on, HPC Pack can be installed on a computer that is not domain joined, however, this feature is designed only for HPC clusters in Azure. For an on-premises HPC cluster, you should create the cluster in an Active Directory domain.
The nodes in your on-premises HPC cluster will be members of an Active Directory domain. Before deploying your on-premises cluster, choose the Active Directory domain that you will use for your HPC cluster.
Depending on the Active Directory environment in your organization, it may be helpful to configure a separate organizational unit (OU) for the computers that will be members of the HPC cluster. With a separate OU, if necessary, different policies and settings can be applied to the cluster nodes than to the other computers in your organization.
If you do not have an Active Directory domain to which you can join your cluster, or if you prefer not to join an existing domain, you can create a new Active Directory domain. For more information about installing the Active Directory Domain Services role, see Deploying Active Directory Domain Services (AD DS) in Your Enterprise.
Additional considerations
The HPC Pack 2019 head node cannot be installed in a domain controller if you plan to install high available cluster with Microsoft Service Fabric. This is because Microsoft Service Fabric cluster cannot be deployed in a domain controller.
If you plan to add workstation nodes or unmanaged server nodes to the HPC cluster, those computers can be joined to any Active Directory domain that has an established trust relationship with the domain to which the head node is joined.
1.6: Choose a domain account for adding nodes
To install HPC Pack on the head node, you must be logged on with a domain user account that is a member of the Administrators group on the head node computer. Additionally, during the configuration process of your HPC head node after the installation of HPC Pack, you must provide credentials for a domain user account that will be used for adding on-premises nodes and for system configuration of those nodes. You must choose an existing account or create a new account before starting your cluster deployment.
Considerations for choosing a user account
- The user account that you choose must be a domain account with enough privileges to create Active Directory computer accounts for the nodes and to join the nodes to the domain.
- If the policies of your organization restrict you from using a domain account that can add new computers to the domain, you will need to ask your domain administrator to pre-create the computer objects for you in Active Directory Domain Services before you deploy your nodes. For more information, see Deploy Nodes with Pre-created Computer Objects in Active Directory.
- If part of your deployment requires access to resources on the enterprise network, the user account must have the necessary permissions to access those resources — for example, installation files that are available on a network server.
- If you want to restart nodes remotely by using HPC Cluster Manager, the account must be a member of the local Administrators group on the head node. This requirement is only necessary if you do not have scripted power control tools that you can use to remotely restart the nodes.
1.7: Choose a network topology for your cluster
HPC Pack supports five cluster topologies. These topologies are distinguished by how the nodes in the cluster are connected to each other and to the enterprise network. The five supported cluster topologies are:
- Topology 1: Compute nodes isolated on a private network
- Topology 2: All nodes on enterprise and private networks
- Topology 3: Compute nodes isolated on private and application networks
- Topology 4: All nodes on enterprise, private, and application networks
- Topology 5: All nodes on an enterprise network
For more information about each network topology and each HPC cluster network, see Appendix 1: HPC Cluster Networking, later in this guide.
When you are choosing a network topology, you must take into consideration your existing network infrastructure and the type of nodes that you will be adding to your cluster:
- Decide which network in the topology that you have chosen will serve as the enterprise network, the private network, and the application network.
- Do not have the network adapter that is connected to the enterprise network on the head node in automatic configuration (that is, the IP address for that adapter does not start with: 169.254). That adapter must have a valid IP address, dynamically or manually assigned (static).
- If you choose a topology that includes a private network, and you are planning to add nodes to your cluster from bare metal, do the following:
- Ensure that there are no Pre-Boot Execution Environment (PXE) servers on the private network.
- If you want to use an existing DHCP server for your private network, ensure that it is configured to recognize the head node as the PXE server in the network.
- If you want to enable DHCP server on your head node for the private or application networks and there are other DHCP servers connected to those networks, you must disable those DHCP servers.
- If you have an existing Domain Name System (DNS) server connected to the same network as the nodes in your cluster, no action is necessary, but the nodes will be automatically deregistered from that DNS server.
- Contact your system administrator to determine if Internet Protocol security (IPsec) is enforced on your domain through Group Policy. If IPsec is enforced on your domain through Group Policy, you may experience issues during deployment. A workaround is to make your head node an IPsec boundary server so that the other nodes in your cluster can communicate with the head node during PXE boot.
- If you want to add workstation nodes or unmanaged server nodes to your cluster, topology 5 (all nodes on an enterprise network) is the recommended topology, but other topologies are supported. If you want to add workstation nodes on other topologies, see the content on Adding Workstation Nodes to a Windows HPC cluster.
- If you want to add broker nodes to your cluster, they must be connected to the network where the clients that are starting SOA sessions are connected (usually the enterprise network) and to the network where the nodes that are running the SOA services are connected (if different from the network where the clients are connected).
- If you want to add Azure nodes to your cluster, your HPC cluster can be configured in any cluster network topology that is supported by HPC Pack. The head node and any client computer that is used to manage the cluster and that needs a connection to Azure must be able to connect over the Internet to Azure services.
1.8: Prepare certificates used to secure the communication between HPC nodes
Microsoft HPC Pack 2016 (and later) cluster uses X.509 certificate to secure the communication between the HPC nodes. You can use one same certificate across all HPC nodes, or use two different certificates:
- Certificate for the head node - This certificate is installed on the head node (or head nodes) to secure the Service Fabric cluster (if used for HA) and the communication between HPC nodes. And if the certificate is self-signed, you shall also import it to Azure Key Vault certificate if you plan to deploy Azure IaaS compute nodes with Burst to Azure IaaS VM feature.
- Certificate for other nodes - This certificate is installed on the HPC nodes other than head node (or head nodes) to secure the communication between HPC nodes. If you choose to use one same certificate across all HPC nodes, this is the same certificate with Certificate for the head node.
The certificate(s) must meet the following requirements:
- Have a private key capable of key exchange;
- Key usage includes Digital Signature, Key Encipherment, Key Agreement and Certificate Signing;
- Enhanced key usage includes Client Authentication and Server Authentication;
- If two different certificates are used, they must have a same subject name.
If the certificate is used to secure Service Fabric Cluster as well, it must meet the following additional requirements:
- The certificate's provider must be Microsoft Enhanced RSA and AES Cryptographic Provider;
- The RSA key length must be 2048 bits.
If you don't already have certificates that meet these requirements, you can request the certificates from a certification authority, or alternatively you can use self-signed certificates. We provide a PowerShell script tool CreateHpcCertificate.ps1 certificate in the Setup folder of the HPC Pack installation media to generate a self-signed certificate.
.\CreateHpcCertificate.ps1 -CommonName "HPCPackNodeCommunication" -Path "d:\hpccomm.pfx" -Password (ConvertTo-SecureString "P@ssw0rd" -AsPlainText -Force)
If you are using a certificate authority (CA) signed certificate or existing self-signed certificate, you can run the following command and check the value of KeySpec, Subject, Key Usage, Enhanced Key Usage, Public Key Length, and Provider.
CertUtil.exe -p "<password>" -v -dump <path-of-pfxFile>
If the value of Subject, Key Usage, Enhanced Key Usage or Public Key Length doesn't match, you must re-generate the certificate.
If the value of KeySpec (shall be "1 -- AT_KEYEXCHANGE") or Provider doesn't match, you don't need to re-generate the certificate, run the following command to import the certificate with modified KeySpec and Provider values, and then run certlm.msc to export the certificate (including private key) to a new PFX file which meets the requirements.
CertUtil.exe -f -p "<password>" -csp "Microsoft Enhanced RSA and AES Cryptographic Provider" -importpfx "<path-of-pfxFile>" AT_KEYEXCHANGE
If you decided to use a single head node in Step 1.2 and want to use a self-signed certificate, you can also generate a self-signed certificate in the Setup wizard during installation of the head node.
If you decide to use a self-signed certificate for other nodes, you can generate a self-signed certificate in HPC Cluster Manager in Step 3.4, later in this guide.
1.9: Prepare for the integration of scripted power control tools (optional)
The cluster administration console (HPC Cluster Manager) includes actions to start, shut down, and reboot nodes remotely. These actions are linked to a script file (CcpPower.cmd) that performs these power control operations by using operating system commands. You can replace the default operating system commands in that script file with your own power control scripts, such as Intelligent Platform Management Interface (IPMI) scripts provided by your vendor of cluster solutions.
In preparation for this integration, you must obtain all the necessary scripts, .dll files, and other components of your power control tools. After you have obtained all the necessary components, test them independently and ensure that they work as intended on the computers that you will be deploying as nodes in your cluster.
For information about modifying CcpPower.cmd to integrate your own scripted power control tools, see Appendix 5: Scripted Power Control Tools, later in this guide.