Private Cloud Simulator for Windows Server 2016

Introduction

The current industry trend is for private cloud solutions to comprise tightly integrated software and hardware components in order to deliver a resilient private cloud with high performance. Issues in any of the components (software, hardware, drivers, firmware, and so forth) can compromise the solution and undermine the promises made regarding a Service Level Agreement (SLA) for the private cloud.

Some of these issues are surfaced only under a high-stress, cloud-scale deployment, and are potentially hard to find using traditional standalone, component-focused tests. The Private Cloud Simulator is a cloud validation test suite that enables you to validate your component in a cloud scenario and identify these types of issues.

Target Audience

The target audience for this document are those working towards validating their hardware for Windows Server Logo, Microsoft Azure Stack solutions and Microsoft Azure Stack HCI solutions.

Test Overview

Private Cloud Simulator (PCS) simulates a live datacenter/private cloud by creating VM workloads, simulating data center operations (load balancing, software/hardware maintenance), and injecting compute/storage faults (unplanned hardware/software failure). PCS uses a Microsoft SQL Server database to record test and solution data during the run. It then presents a report that includes operation pass/fail rates and logs whihch provide the capability to correlate data for pass/fail determination and failure diagnosis (as applicable).

Below table contains the links to the files that you need to download to run PCS tests.

Name Location
HLK Kit HLK version 1607
HLK Update Package Install the latest version available at Microsoft Collaborate site.
File name format: HlkUpdatePackage14393.buildnumber.datetime.zip
HLK Playlist HLK Version 1607 CompatPlaylist.xml
PCSFiles.vhd PCSFiles.vhd (See blow for its hash value)
dotNet 3.5 for Windows 10 Microsoft-Windows-NetFx3-OnDemand-Package.cab
Windows Server 2016 Update Install the latest version available at Windows Update site

PCSFiles.vhd file contains two VHD files within it. VHD hash values are listed below. You can use Get-FileHash PowerShell cmdlet to compute the hash value for a file.

File Name SHA256 Hash Value
PCSFiles.vhd 8AE4F86D0F40B4304CA4DC8CBCFA989885E3507FCB0FFDBF969DDF10542F0035
Files\BaseVHDX\14393.0.amd64fre.rs1_release.160715-1616_server_serverdatacentereval_en-us.vhdx 1B8AAC473B97725DD339624A4D1C894A7492ABBF26824156C9EF41F954836F84
Files\BaseVHDX\PcsBaseVhd.vhd EC449434544B383DC1AD65D93EB004DBFBB91D4476CFCC32E93EE769866626FF

Common Lab Infrastructure Setup

Topology

PCS lab environment contains the following elements:

Supporting Documents:

Notes:

  • All the above machines must be joined to the same test domain.
  • All PCS tests need to be run as the same user in the 'Domain Admins' group for the test domain.
  • Use the same user with Domain Admin credentials to install the HLK controller.

HLK Controller System Requirements

Minimum system requirements are as shown in the table below.

Resource Minimum requirement
CPU (or vCPU) 4 cores
Memory 12 GB RAM
Available disk space 200 GB
Operating system Windows Server 2016 Datacenter
Active Directory domain Join it to the test domain

HLK Controller Setup

Get IOMeter files

  • IOMeter is a workload that must be installed on the HLK controller.

  • Download the i386 Windows version of IOMeter release dated 2006.07.27 from the IOMeter website.

  • Run the setup (or unzip the package) to unpack the files.

  • Copy IOMeter.exe, Dynamo.exe to Tests\amd64\pcs\GuestScenarioManager\IOMeter folder on the HLK controller. Below is the default path for an HLK installation:

    C:\Program Files (x86)\Windows Kits\10\Hardware Lab Kit\Tests\amd64\pcs\GuestScenarioManager\IOMeter

PCS Controller System Requirements

Minimum system requirements are as shown in the table below.

Resource Minimum requirement
CPU (or vCPU) 4 cores
Memory 12 GB RAM
Free space on the boot drive 200 GB
Operating system Windows Server 2016 Datacenter
Active Directory domain Join it to the test domain

PCS Controller Setup

  • PCS controller MUST be a Generation v2 VM or a physical machine.
  • Secure Boot and BitLocker MUST be disabled. This is required because PCS enables TestSigning boot configuration. If you are using Generation 2 Hyper-V VM as PCS controller, stop the VM to disable Secure Boot in the VM's settings.
  • Install the HLK Client using the Windows HLK Getting Started guide and open the requisite ports.
  • Install .NET Framework 3.5 (This feature is not included by default in Windows Server 2016).
    • Generic Installation Instructions can be found at the following locations:
    • For builds released via Microsoft Connect, see details below:
      • Mount the ISO supplied with the build and find the file at MountedDriveLetter:\sources\sxs\microsoft-windows-netfx3-ondemand-package.cab

      • Copy the file to a local folder on the PCS controller

      • Install the package by executing this command line using admin privileges

        Add-WindowsFeature Net-Framework-Features -source <Local Folder>
        

PCS Tests

This section discusses how to find an appropriate PCS test for your device/solution, configure the lab, and kickoff PCS execution.

  • You need to use the same domain admin user account to setup lab and run tests.
  • Secure Boot State must be OFF on all nodes and PCS controller.
  • HLK update package MUST be download and installed on HLK controller/clients. HLK update package is available at Microsoft Collaborate site for download.

PCS Test Selection

The PCS jobs are used to certify multiple categories of devices and solutions. The table below, maps them to the appropriate PCS job.

Target Certification Program Job Name in HLK
NIC Windows Server Logo PrivateCloudSimulator-Device.Network.LAN.10GbOrGreater
NIC SDDC Standard PrivateCloudSimulator-Device.Network.LAN.10GbOrGreater
NIC SDDC Premium PrivateCloudSimulator-Device.Network.LAN.AzureStack
NIC AZURESTACK PrivateCloudSimulator-Device.Network.LAN.AzureStack
SAS HBA SDDC Standard PrivateCloudSimulator-Device.Storage.Controller.AzureStack
SAS HBA SDDC Premium PrivateCloudSimulator-Device.Storage.Controller.AzureStack
SAS HBA AZURESTACK PrivateCloudSimulator-Device.Storage.Controller.AzureStack
Disk(HDD/SSD/NVMe) SDDC Standard PrivateCloudSimulator-Device.Storage.HD.AzureStack
Disk(HDD/SSD/NVMe) SDDC Premium PrivateCloudSimulator-Device.Storage.HD.AzureStack
Disk(HDD/SSD/NVMe) AZURESTACK PrivateCloudSimulator-Device.Storage.HD.AzureStack
Solution SDDC Standard PrivateCloudSimulator-System.Solutions.StorageSpacesDirect (MIN) & (MAX)
Solution SDDC Premium PrivateCloudSimulator-System.Solutions.StorageSpacesDirect (MIN) & (MAX)
Solution AZURESTACK PrivateCloudSimulator-System.Solutions.AzureStack (MIN) & (MAX)

PCS jobs are summarized below:

  • PrivateCloudSimulator - Device.Network.LAN.10GbOrGreater
    This test contains a set of actions, that specifically target the network adapter device along with VM and compute cluster actions.
  • PrivateCloudSimulator - Device.Network.LAN.AzureStack
    This test contains an extended set of actions, that verify network adapter support for the new 'Software Defined Networking' feature in Windows Server, along with VM and compute cluster actions.
  • PrivateCloudSimulator - Device.Storage.Controller.AzureStack
    This test contains an extended set of actions, that specifically target the Storage Controller, along with VM and compute cluster actions.
  • PrivateCloudSimulator - Device.Storage.Enclosure.AzureStack
    This test contains an extended set of actions, that specifically target the JBOD enclosure, along with VM, compute cluster and storage cluster actions.
  • PrivateCloudSimulator - Device.Storage.HD.AzureStack
    This test contains an extended set of actions, that specifically target the disk, along with VM and compute cluster actions.
  • PrivateCloudSimulator - System.Solutions.StorageSpacesDirect (MIN)/(MAX)
    This test contains an extended set of actions, that target the entire solution built on an hyper-converged storage spaces direct cluster. The (MIN) test should be run on a cluster with the minimum number of supported nodes for the solution. The (MAX) test should be run on a cluster with the maximum number of supported nodes for the solution.
  • PrivateCloudSimulator - System.Solutions.AzureStack (MIN)/(MAX)
    This test contains an extended set of actions, that target the entire AzureStack solution. The (MIN) test should be run on a cluster with the minimum number of supported nodes for the solution. The (MAX) test should be run on a cluster with the maximum number of supported nodes for the solution.

PCS Job Execution Flow

Each PCS job contains the following tasks.

  • Initialize PCS Controller
    • In this stage, the PCS execution engine sets up a SQL server and IIS on the PCS controller machine
    • It also copies content (e.g. evaluation OS VHD files) to enable VM creation in the next stage
  • Create VMs
    • This stage sees the PCS engine start creating VMs on each node of the cluster
    • VM creation stops when the target number of VMs/node has been reached.
    • This step is a part of PCS setup phase. Test run duration timer kicks in post this stage.
  • Run PCS Actions
    • Now, PCS initiates various types of actions (VM, Cluster, Storage, Network) on each node of the cluster.
    • Actions run in parallel and co-ordinate among themselves to exercise the device (storage, network) and the solution through the private cloud/datacenter lifecycle
    • Actions run periodically and stop once the target execution time (defined by the profile/job) of the test has been reached.
    • Test execution time is defined per profile and can vary based on the profile you are running. Test execution timer kicks in after all the VMs are created.
    • The steps in each action and the corresponding result of each step is stored in the SQL server.
  • Cleanup Run
    • In this stage, VMs created in stage (4) are cleaned up and the cluster is restored to a clean state (as possible).
    • It generates a report file (PcsReport.htm) and a ZIP file that contains test logs.
  • Report result in HLK Studio
    • In this stage, the HLK studio reports the result of the PCS run.
    • The result can be packaged into an HLKX file for submission to Microsoft.

Execute PCS Tests

PrivateCloudSimulator - Device.Network.LAN.10GbOrGreater

System Requirements (Device.Network.LAN.10GbOrGreater)

Requirement Description
Component Being certified NIC
Setup Type Hyper-converged setup with S2D storage. Note: An SDDC certified HBA is required.
Minimum Number of Server Nodes 3 identical machines
Server Spec CPU: 16 physical cores (e.g. 2 sockets with 8 cores), MEMORY: 128 GB, 64GB free space on boot drive
Storage Overall 4 TB free space per node on HDD, 800 GB free space per node on SSD
Disk If there are drives used as cache, there must be at least 2 per server. There must be at least 4 capacity (non-cache) drives per server. See S2D hardware requirements for more information.
Network Card NIC being certified
Switch Switch supporting all NIC features

Setup (Device.Network.LAN.10GbOrGreater)

  • Follow the Windows HLK Getting Started guide to install HLK client software on all cluster nodes.
  • Follow the Windows Server 2016 Storage Spaces Direct cluster guide to deploy a cluster.
  • All nodes must be connected to the same physical switches.
  • 10GbE or better networking bitrate must be used. Create a virtual swith with the same name on each node.
  • Virtual machines, created by PCS, connect to the virtual switch to send network traffic between them. These VMs get IP address via DHCP. Make sure your DHCP server assigns valid IP addresses to these VMs. If DHCP server is not available or fails, VMs would use Automatic Private IP Addressing (APIPA) to self-configure an IP address and subnet. Each VM must have a valid IP address to send network traffic between VMs.

Execute

  • Open HLK Studio

  • Follow the Windows HLK Getting Started guide to create a machine pool

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the network adapter device

  • Select device manager

  • Select the device. It should be ok to select any relevant NIC device (does not matter which member of the virtual switch team) on any of the compute nodes that is targeted for certification.

    hlk showing 10gborgreater test with device selected

  • Right-click on the selected device and select Add/Modify Features

  • In the features dialog, select Device.Network.LAN.10GbOrGreater and then click OK. For most NIC cards (with speeds 10GbE or higher) this feature should have been selected automatically.

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - Device.Network.LAN.10GbOrGreater

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's domain name
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: Name of compute cluster
      • StoragePath: Default value is "". It uses all the available CSVs from compute cluster. You can use different paths by entering comma separated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2"
      • VmSwitchName: Name of virtual switch on all nodes
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive letter on PCS controller. Make sure this drive letter is available.
      • IsCreateCluster: Use default value
      • IsRemoveCluster: Use default value
      • IsConfigureHyperV: Use default value
    • Map machines to roles
      • PrimaryNode: This is the node with the selected device
      • Test Controller: Select PCS test controller machine
      • OtherNodes: Select other cluster nodes
  • Click OK to schedule the test

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Duration

  • PCS actions (listed below) run for about 24 hours.
  • The complete run may take an additional 24-36 hours (including time for setup and cleanup).

PCS Actions

The table below lists the actions that are included in this test.

Action Name Description
VmCloneAction Creates a new VM.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmSnapshotAction Takes a snapshot of the VM.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmStartWorkloadAction Starts a user-simulated workload.
VmGuestFullPowerCycleAction Power-cycles the VM.

PrivateCloudSimulator - Device.Network.LAN.AzureStack

System Requirements (Device.Network.LAN.AzureStack)

Requirement Description
Component Being certified NIC (with RDMA)
Setup Type Hyper-converged setup with S2D storage. Note: An SDDC certified HBA is required.
Minimum Number of Server Nodes 3 identical machines
Server Spec CPU: 16 physical cores (e.g. 2 sockets with 8 cores), MEMORY: 128 GB, 64GB free space on boot drive
Storage Overall 4 TB free space per node on HDD, 800 GB free space per node on SSD
Disk If there are drives used as cache, there must be at least 2 per server. There must be at least 4 capacity (non-cache) drives per server. See S2D hardware requirements for more information.
Network Card NIC being certified
Switch Switch supporting all NIC features

Setup (Device.Network.LAN.AzureStack)

  • Hyper-V host that contains PCS Controller VM must be Windows Server 2016 or later.

  • Follow the Windows HLK Getting Started guide to install HLK client software on all cluster nodes

  • Follow the Windows Server 2016 Storage Spaces Direct cluster guide to deploy a cluster

  • For instructions to set up networking for Storage Spaces Direct, see Windows Server 2016 Converged NIC and Guest RDMA Deployment Guide.

  • PCS Controller VM should be built as a generation 2 VM and have 2 network interfaces, one for the management network and the other for SDN (PA address space) topology. The interface for SDN topology will be assigned an IP address from the IP address space passed in as the AddressPrefixes parameter.

    software-defined networking with s2d

  • All the nodes must be able to communicate with the PCS Controller VM at all times through a management interface. For this purpose, each server should have one additional NIC for management interface, which does not need to meet strict bitrate requirements.

  • All the nodes and PCS Controller must have the same most recent KB installed.

  • 10GbE or better networking bitrate is required for the NICs under test. Each server should have two identical 10gb or greater NICs.

  • If RDMA capable NICs are used, the physical switch must meet the associated RDMA requirements.

  • Set NICs' properties that are specific to AzureStack deployments to make sure NICs getting certified can support these properties. You can use PowerShell Get-NetAdapterAdvancedProperty cmdlet to verify NIC properties.

    • VXLAN Encapsulated Task Offload == Enabled
    • Encapsulation Overhead == 160
    • Jumbo Packet >= 1500
    • MtuSize == 1660
  • Make sure that every node contains a teaming enabled virtual switch with the same name.

    New-VMSwitch -Name SdnSwitch -NetAdapterName "Name 1,Name 2" -AllowManagementOS -EnableEmbeddedTeaming
    
  • Configure Nested Virtualization: Nested virtualization for the PCS Controller VM must be enabled. While the PCS VM is in the OFF state, run the following command on the Hyper-V host.

    Set-VMProcessor -VMName <VMName> -ExposeVirtualizationExtensions $true
    
  • Make sure that RDMA is setup on all nodes and reflects when queried through Get-SMBClientNetworkInterface & Get-SMBServerNetworkInterface.

  • Live Migration settings (Failover Cluster Manager->Networks->Live Migration Settings) must be set appropriately to use storage network for live migrations.

  • This test creates virtual machines and send traffic between them using the virtual switch created. The vNic (virtual nic) of the PCS virtual machines are assigned IP address from the IP address space passed in as the AddressPrefixes parameter.

Execute (Device.Network.LAN.AzureStack)

  • Open HLK Studio

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the network adapter device

  • Select device manager

  • Select the device. It should be OK to select any relevant NIC device (does not matter which member of the virtual switch team) on any of the compute nodes that is targeted for certification.

    hlk studio showing device.network.lan test with device selected

  • Right-click on the selected device and select Add/Modify Features

  • In the features dialog, select Device.Network.LAN.AzureStack and click OK.

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - Device.Network.LAN.AzureStack

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's fully qualified domain name (FQDN).
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: compute cluster name
      • StorageCluster: Use default value ''
      • StoragePath: Default value is ''. It uses all the available CSVs from compute cluster. You can use different paths by entering comma separated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2"
      • VmSwitchName: Name of virtual switch to be used for SDN. Example: SdnSwitch
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive tter on PCS controller. Make sure this drive letter is available.
      • AdapterNames: Comma seperated list of adapter names that are part of the vmSwitch. Use the format "'Name 1', 'Name 2'" (double quotes and single quotes are needed) for multiple adapters. Names must be derived from Get-NetAdapter cmdlet.
      • VLan: Vlan ID set on vmSwitch. Only required if your physical switch is configured for Vlan. Enter '0' to indicate that there is no Vlan tagging.
      • AddressPrefixes: The IP address range to be used by Tenant VMs and Hosts. These addresses will be used for SDN datacenter management.
      • ClientAddressPrefix: The IP address range used by Client VMs.
      • RDMAEnabled: Enter 1 if NIC supports RDMA
      • SetEnabled: Enter 1 if NIC supports Switch Embedded Teaming
      • HnvEnabled: Enter 1 if NIC supports Hyper-V Network Virtualization
      • TaskOffloadEnabled: Enter 1 if NIC supports Encapsulate Task Offload
      • TestControllerNetAdapterName: Adapter name on PCS Controller that can be assigned a static IP in the AddressPrefixes range to communicate with SDN Network Controller virtual machines.
      • IsCreateCluster: Use default value
      • IsRemoveCluster: Use default value
      • VHDSourcePath: a VHDX file for Windows Server 2016 DataCenter. This VHDX file will be used to create Network Controller VMs. Default value is c:\pcs\BaseVHDX\14393.0.amd64fre.rs1_release.160715-1616_server_serverdatacentereval_en-us.vhdx. DON’T change the default value unless you have to use your own VHDX file. Cloned vhdx files have the same disk signatures. To avoid disk signature collision, this VHDX file cannot be the same as the one used by PCS controller.
      • KBPackagePath: Comma seperated list of Windows Update Packages that should be applied to the VHDX file that specified in parameter VHDSourcePath. These update packages should match the ones installed on all cluster nodes and PCS controller machine.
        • You should install the lastest version or a recent version of Windows Update packages. You can use Get-Hotfix cmdlet to find out what have installed on your machines.
        • Most of the Windows Update Packages require you to install 'servicing stak update (SSU)' first. In other words, you should eneter at least two KBs in this parameter.
        • Example:
          • KB4503294 (June 18, 2019)
          • In "How to get this update" section, it says 'servicing stack update (SSU)' KB4503537 is required.
          • In this parameter, you should enter 'c:\KB\Windows-KB4503537-x64.msu,c:\KB\Windows-KB4503294-x64.msu'. (single quote is required, KB4503537 will be installed before installing KB4503294.)
          • You need to download the MSU files from Windows Update site and copy them to c:\KB folder on the PCS controller machine.
          • Important: The file name format MUST be "Windows-KBNumber-x64.msu". A dash (-) is required before and after KBNumber.
    • Map machines to roles
      • PrimaryNode: This is the node with the selected device, automatically selected by HLK.
      • Test Controller: Select PCS test controller machine
      • OtherNodes: Select other cluster nodes
  • Click OK to schedule the test

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Cleanup

Use the C:\Pcs\ReRunPcsCleanup.cmd script on the PCS-Controller for cleaning up state of the setup if the test abruptly ends. It is very important that stale VMs & SDN infrastructure is cleaned up before starting a new run.

Please make sure the following items are cleaned up before starting a new run:

  • Clustered VM roles (FailoverClusterManager->Cluster->Roles)

    Get-ClusterGroup -Cluster $clusterName
    
  • All the VMs created by PCS

    Get-ClusterNode -Cluster $clusterName | % { Get-VM -ComputerName $_.Name }
    
  • vNics created by PCS/SDN

    Get-ClusterNode -Cluster $clusterName | % { Get-VMNetworkAdapter -ComputerName $_.Name -ManagementOS | Select-Object ComputerName,Name,SwitchName }
    

    powershell showing vnic that needs to be cleaned up

  • Storage/CSV-volumes on the cluster do not have any entries pertaining to PCS (C:\ClusterStorage\Volume1\PCS)

Duration (Device.Network.LAN.AzureStack)

  • PCS actions (listed below) run for about 24 hours.
  • The complete run may take an additional 36-48 hours (including time for setup and cleanup).

PCS Actions (Device.Network.LAN.AzureStack)

The table below lists the actions that are included in this test.

Action Name Description
NetRunEastWestCrossSubnetTrafficAction Run traffic between two Tenant Vms in same VNetwork, but different Vsubnets
NetRunEastWestSameSubnetTrafficAction Run traffic between two Tenant Vms in same Vsubnet
NetLoadBalancerEastWestInterTenantTrafficAction Run traffic between load balanced tenants and another Vm in a different App Tier. Simulates load balanced traffic amongst frontent application (website) Vms.
NetLoadBalancerEastWestIntraTenantTrafficAction Run traffic between load balanced tenants and a Vm in the same App Teir. Simulates load balanced traffic from backend application (DB) to frontent application (website).
NetLoadBalancerInboundTrafficAction Run traffic from outside the Tenant network to a load balanced Vms (website).
NetLoadBalancerNorthSouthTrafficAction Run traffic from inside the Tenant network to a load balanced Vms.
NetLoadBalancerOutboundTrafficAction Run traffic from load balancedVms inside the Tenant network to a Vm outside.
NetAddInboundVipToLoadBalancerAction Creates Virtual Ips for Tenant VMs dynamically, mainly for other traffic actions to use.
VmCloneAction Creates Virtual Ips for Tenant VMs dynamically, mainly for other traffic actions to use.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmGuestFullPowerCycleAction Power-cycles the VM.

PrivateCloudSimulator - Device.Storage.HD.AzureStack

System Requirements for Solid State Drives

When certifying SSD's for use in Azure Stack the following is the minimum required hardware test harness that must be running a Windows Server 2016 Storage Spaces Direct cluster.

Requirement Description
Component Being certified SSD
Setup Type Hyper-converged setup with S2D storage. Note: An SDDC certified HBA is required.
Minimum Number of Server Nodes 4 identical machines
Server Spec CPU: 16 physical cores (e.g. 2 sockets with 8 cores), MEMORY: 128 GB, 64GB free space on boot drive
Storage Overall 4 TB free space per node on SSD
Storage SSD Total SSD storage capacity on each node = 4 TB. Minimum of 2 SSD per node, but more may be needed to meet the 4 TB free space requirement and have enough spare disks for repair test case. To certify multiple SSD disk families in the same setup concurrently (aka with a single PCS run), you need 1 SSD of each family on each of the 4 nodes in the same enclosure slot.
Storage HDD None
Network Card 10 GbE NIC with WS2016 certification
Switch Switch supporting all NIC features

System Requirements for Hard Disk Drives

When certifying HDD's for use in Azure Stack the following is the minimum required hardware test harness that must be running a Windows Server 2016 Storage Spaces Direct cluster.

Requirement Description
Component Being certified HDD
Setup Type Hyper-converged setup with S2D storage. Note: An SDDC certified HBA is required.
Minimum Number of Server Nodes 4 identical machines
Server Spec CPU: 16 physical cores (e.g. 2 sockets with 8 cores), MEMORY: 128 GB, 64GB free space on boot drive
Storage Overall 4 TB free space per node on SSD
Storage SSD None
Storage HDD Total HDD storage capacity on each node = 4 TB. Minimum of 2 HDD per node, but more may be needed to meet the 4 TB free space requirement and have enough spare disks for repair test case. To certify multiple HDD disk families in the same setup concurrently (aka with a single PCS run), you need 1 HDD of each family on each of the 4 nodes in the same enclosure slot.
Network Card 10 GbE NIC with WS2016 certification
Switch Switch supporting all NIC features

Setup

Execute

  • Open HLK Studio

  • Follow the Windows HLK Getting Started guide to create a machine pool

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the disk device

  • Select device manager

  • Select the disk device that needs to be certified.

    hlk studio showing device.storage.hd test with device selected

  • Right-click on the selected device and select Add/Modify Features

    hlk studio showing device.storage.hd test with add/modify features context menu

  • In the features dialog, select Device.Storage.HD.AzureStack and click OK.

    hlk studio showing device.storage.hd.azurestack feature selected

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - Device.Storage.HD.AzureStack

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's fully qualified domain name (FQDN).
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: compute cluster name
      • StoragePath: This location(s) will be on the disk device under test. Default value is "". It uses all the available CSVs from compute cluster. You can use different path by entering comma seperated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2"
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive letter on PCS controller. Make sure this drive letter is available.
    • Map machines to roles
      • PrimaryNode: This is the node with the selected device
      • Test Controller: Select PCS test controller machine
      • OtherNodes: Select other cluster nodes
  • Click OK to schedule the test.

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Duration

  • PCS actions (listed below) run for about 48 hours.
  • The complete run may take an additional 24-36 hours (including time for setup and cleanup).

PCS Actions

The profile defines the actions to execute to validate the disk drives for Microsoft AzureStack. The table below lists the actions that are included in this profile.

Action Name Description
VmCloneAction Creates a new VM.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmSnapshotAction Takes a snapshot of the VM.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmStartWorkloadAction Starts a user-simulated workload.
VmGuestFullPowerCycleAction Power-cycles the VM.
ClusterCSVMoveAction Move the CSV disks to the best available node.
StorageNodePoolMove Moves a storage pool (created in Storage Spaces) to a different owner node in the storage cluster.
StorageNodeRestart Restarts a node in the storage cluster.
StorageNodeBugcheck Bug checks one node of the storage cluster.
StorageNodeDiskReadTimeoutAction This action goes through disks that tolerate errors (not readonly, clustered, no simple spaces) and waits for read IO. Once an IO is intercepted, it will cause the IO to timeout. If a single timeout is detected on any disk, the action is considered successful.
StorageNodeDiskWriteTimeoutAction This action goes through disks that tolerate errors (not readonly, clustered, no simple spaces) and waits for write IO. Once an IO is intercepted, it will cause the IO to timeout. If a single timeout is detected on any disk, the action is considered successful.
StorageNodeBusResetAction This action attempts to inject a bus reset to any of the physical disks backing the pool. First, a timeout to a read or write IO is attempted, if that is successful then the corresponding abort, reset LUN, and reset target commands are failed. If any of these succeed then a bus reset will be triggered. If any disk issues a bus reset, the action is then considered successful.
StorageNodeUpdateStorageProviderCacheAction Calls update-storageprovidercache command in powershell

PrivateCloudSimulator - Device.Storage.Controller.AzureStack

System Requirements

When certifying SAS HBA's for use in Azure Stack the following is the minimum required hardware test harness that must be running a Windows Server 2016 Storage Spaces Direct cluster.

Requirement Description
Component Being certified SAS HBA (for S2D)
Setup Type Hyper-converged setup with S2D storage. HBA under test has to be separate from the Boot HBA
Minimum Number of Server Nodes 3 identical machines
Server Spec CPU: 16 physical cores (e.g. 2 sockets with 8 cores), MEMORY: 128 GB, 64GB free space on boot drive
Storage Overall 4 TB free space per node on HDD, 800 GB free space per node on SSD
Storage SSD Minimum of 1 SSD per node
Storage HDD Minimum of 2 HDD per node
Network Card 10 GbE NIC with WS2016 certification
Switch Switch supporting all NIC features

Setup

Execute

  • Open HLK Studio

  • Follow the Windows HLK Getting Started guide to create a machine pool

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the disk device

  • Select device manager

  • Select the disk device that needs to be certified.

    hlk studio with lsi adapter storage device selected

  • Right-click on the selected device and select Add/Modify Features

    hlk studio with the add/modify features context menu

  • In the features dialog, select Device.Storage.Controller.AzureStack and click OK.

    hlk studio with the features dialog

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - Device.Storage.Controller.AzureStack

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's fully qualified domain name (FQDN).
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: compute cluster name
      • StoragePath: This location(s) will be on the disk device under test. Default value is "". It uses all the available CSVs from compute cluster. You can use different path by entering comma seperated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2"
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive letter on PCS controller. Make sure this drive letter is available.
    • Map machines to roles
      • PrimaryNode: This is the node with the selected device
      • Test Controller: Select PCS test controller machine
      • OtherNodes: Select other cluster nodes
  • Click OK to schedule the test.

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Duration

  • PCS Actions (listed below) will run for 48 hours.
  • The complete run may take an additional 24-36 hours (including time for setup and cleanup).

PCS Actions

The profile defines the actions to execute to validate the storage controller device for Microsoft AzureStack. The table below lists the actions that are included in this profile.

Action Name Description
VmCloneAction Creates a new VM.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmSnapshotAction Takes a snapshot of the VM.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmStartWorkloadAction Starts a user-simulated workload.
VmGuestFullPowerCycleAction Power-cycles the VM.
ClusterCSVMoveAction Move the CSV disks to the best available node.
StorageNodePoolMove Moves a storage pool (created in Storage Spaces) to a different owner node in the storage cluster.
StorageNodeBusResetAction This action attempts to inject a bus reset to any of the physical disks backing the pool. First, a timeout to a read or write IO is attempted, if that is successful then the corresponding abort, reset LUN, and reset target commands are failed. If any of these succeed then a bus reset will be triggered. If any disk issues a bus reset, the action is then considered successful.
StorageNodePortDisableAllAction This action disables all the storage controllers in the node. All of the SCSI controllers are disabled, if one is successfully disabled then the action is considered passed. After the specified time, all of the controllers are then re-enabled. This action is disabled for boot controllers

PrivateCloudSimulator - Device.Storage.Enclosure.AzureStack

System Requirements

Requirement Description
Component Being certified Enclosure
Setup Type Hyper-converged setup with S2D storage. HBA under test has to be separate from the Boot HBA.
Minimum Number of Server Nodes 3 identical machines
Server Spec CPU: 16 physical cores (e.g. 2 sockets with 8 cores), MEMORY: 128 GB, 64GB free space on boot drive
Storage Overall 4 TB free space per node on HDD, 800 GB free space per node on SSD
Storage SSD Minimum of 1 SSD per node
Storage HDD Minimum of 2 HDD per node
Network Card 10 GbE NIC with WS2016 certification
Switch Switch supporting all NIC features

Setup

Execute

  • Open HLK Studio

  • Follow the Windows HLK Getting Started guide to create a machine pool

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the disk device

  • Select device manager

  • Select the disk device that needs to be certified.

    hlk studio showing selected storage enclosure device.

  • Right-click on the selected device and select Add/Modify Features

    hlk studio showing add/modify features context menu

  • In the features dialog, select Device.Storage.Enclosure.AzureStack and click OK.

    hlk studio showing features dialog

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - Device.Storage.Enclosure.AzureStack

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's fully qualified domain name (FQDN).
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: compute cluster name
      • StoragePath: This location(s) will be on the disk device under test. Default value is "". It uses all the available CSVs from compute cluster. You can use different path by entering comma seperated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2"
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive letter on PCS controller. Make sure this drive letter is available.
    • Map machines to roles
      • PrimaryNode: This is the node with the selected device
      • Test Controller: Select PCS test controller machine
      • OtherNodes: Select other cluster nodes
  • Click OK to schedule the test.

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Duration

  • PCS Actions (listed below) will run for 48 hours.
  • The complete run may take an additional 24-36 hours (including time for setup and cleanup).

PCS Actions

The profile defines the actions to execute to validate the storage Enclosure for Microsoft AzureStack. The table below lists the actions that are included in this profile.

Action Name Description
VmCloneAction Creates a new VM.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmSnapshotAction Takes a snapshot of the VM.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmStartWorkloadAction Starts a user-simulated workload.
VmGuestFullPowerCycleAction Power-cycles the VM.
ClusterCSVMoveAction Move the CSV disks to the best available node.
StorageNodePoolMove Moves a storage pool (created in Storage Spaces) to a different owner node in the storage cluster.
StorageNodeBusResetAction This action attempts to inject a bus reset to any of the physical disks backing the pool. First, a timeout to a read or write IO is attempted, if that is successful then the corresponding abort, reset LUN, and reset target commands are failed. If any of these succeed then a bus reset will be triggered. If any disk issues a bus reset, the action is then considered successful.
StorageRetireAndRepairAction This action retires a disk from a pool and starts repair. If spaces doesn’t get healthy, the action fails. The action randomly picks a pool and tries to retire a disk in the pool. If the disk is set as read-only, or it is a simple space or is used for cluster purposes (i.e. quorum resource) then the action is skipped

PrivateCloudSimulator - System.Solutions.StorageSpacesDirect

Setup

  • Setup a hyper-converged solution. See here for an example.
  • We recommend making the number of volumes a multiple of the number of servers in your cluster. For example, if you have 4 servers, you will experience more consistent performance with 4 total volumes than with 3 or 5. This allows the cluster to distribute volume "ownership" (one server handles metadata orchestration for each volume) evenly among servers.
  • We recommand using Resilient File System (ReFS) for Storage Spaces Direct.
  • By default, test creates 20 VMs per cluster node. Estimated average VM's VHD file size could be 40GB. To run this test in a 4-node cluster environment, your virtual disk size should be at least 20 * 40 * 4 = 3200GB.
  • Minimum Configuration
    • This config contains the minimum of cluster nodes, slowest supported processor, least memory and lowest storage capacity supported by the solution family.
    • Please use the PrivateCloudSimulator - System.Solutions.StorageSpacesDirect (MIN) job to validate this setup
  • Maximum Configuration
    • This config contains the maximum of cluster nodes and the maximum storage supported by the solution family.
    • Processor and memory should be equal or higher than the lowest supported value for the solution, but need not be the maximum possible supported value. The processor and memory values should be representative of the most common skus for the solution.
    • Please use the PrivateCloudSimulator - System.Solutions.StorageSpacesDirect (MAX) job to validate this setup

Execute

  • Open HLK Studio

  • Follow the Windows HLK Getting Started guide to create a machine pool

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the system under test and PCS controller machine.

  • Select systems on the left panel and then select the PCS test controller (NOTE: NOT the machine that needs to be certified).

    hlk studio showing systems tab with pcs test controller selected

  • Right-click on the selected PCS controller machine and select Add/Modify Features

  • In the features dialog, select System.Solution.StorageSpacesDirect and click OK

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - System.Solutions.StorageSpacesDirect (MAX) or PrivateCloudSimulator - System.Solutions.StorageSpacesDirect (MIN) (based on the solution size you are testing)

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's fully qualified domain name (FQDN).
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: compute cluster name
      • StoragePath: Default value is "". It uses all the available CSVs from compute cluster. You can use different paths by entering comma seperated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2" (double quote is needed)
      • VmSwitchName: Enter the name of the virtual switch. This name must be the same on all nodes
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive letter on PCS controller. Make sure this drive letter is available.
      • IsCreateCluster: Use default value
      • IsRemoveCluster: Use default value
    • Map machines to roles
      • Test Controller: Select PCS test controller machine
  • Click OK to schedule the test.

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Duration

  • PCS Actions (listed below) will run for 96 hours.
  • The complete run may take an additional 24-36 hours (including time for setup and cleanup).

PCS Actions

The profile defines the actions to execute to validate the disk drives for Microsoft AzureStack. The table below lists the actions that are included in this profile.

Action Name Description
VmCloneAction Creates a new VM.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmSnapshotAction Takes a snapshot of the VM.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmStartWorkloadAction Starts a user-simulated workload.
VmGuestFullPowerCycleAction Power-cycles the VM.
ComputeNodeEvacuation Drains all resources from one cluster node.
ClusterCSVMoveAction Move the CSV disks to the best available node.
StorageNodePoolMove Moves a storage pool (created in Storage Spaces) to a different owner node in the storage cluster.
StorageNodeRestart Restarts a node in the storage cluster.
StorageNodeBugcheck Bug checks one node of the storage cluster.
StorageNodeDiskReadTimeoutAction This action goes through disks that tolerate errors (not read-only, clustered, no simple spaces) and waits for read IO. Once an IO is intercepted, it will cause the IO to timeout. If a single timeout is detected on any disk, the action is considered successful. This action is invoked on storage nodes every 15 minutes.
StorageNodeDiskWriteTimeoutAction This action goes through disks that tolerate errors (not read-only, clustered, no simple spaces) and waits for write IO. Once an IO is intercepted, it will cause the IO to timeout. If a single timeout is detected on any disk, the action is considered successful. This action is invoked on storage nodes every 15 minutes.
StorageNodeBusResetAction This action attempts to inject a bus reset to any of the physical disks backing the pool. First, a timeout to a read or write IO is attempted, if that is successful then the corresponding abort, reset LUN, and reset target commands are failed. If any of these succeed then a bus reset will be triggered. If any disk issues a bus reset, the action is then considered successful.
StorageNodePortDisableAllAction This action disables all the storage controllers in the node. All of the SCSI controllers are disabled, if one is successfully disabled then the action is considered passed. After the specified time, all of the controllers are then re-enabled.
StorageRetireAndRepairAction This action retires a disk from a pool and starts repair. If spaces doesn't get healthy, the action fails. The action randomly picks a pool and tries to retire a disk in the pool. If the disk is set as readonly, or it is a simple space or is used for cluster purposes (i.e. quorum resource) then the action is skipped DisableNetworkAdapters Disables one of the network adapter that carries the storage traffic.
StorageNodeNetworkDisconnectAction Disables one of the network adapters that carries the storage traffic.
StorageNodeDiskIoTimeoutOnceAction Times out a single read or write across the storage node. This does not time out the retry attempt for this IO, so the disk will not go unresponsive.
StorageNodeUpdateStorageProviderCacheAction Calls update-storageprovidercache command in PowerShell.

PrivateCloudSimulator - System.Solutions.AzureStack

Setup

  • Setup a hyper-converged solution. See here for an example.
  • We recommend making the number of volumes a multiple of the number of servers in your cluster. For example, if you have 4 servers, you will experience more consistent performance with 4 total volumes than with 3 or 5. This allows the cluster to distribute volume "ownership" (one server handles metadata orchestration for each volume) evenly among servers.
  • You need to use Resilient File System (ReFS) for Storage Spaces Direct. Otherwise, the job would fail.
  • By default, test creates 20 VMs per cluster node. Estimated average VM's VHD file size could be 40GB. To run this test in a 4-node cluster environment, your total virtual disk size should be at least 20 * 40 * 4 = 3200GB.
  • Minimum Configuration
    • This config contains the minimum of cluster nodes, slowest processor, least memory and lowest storage capacity supported by the solution family.
    • Please use the PrivateCloudSimulator - System.Solutions.AzureStack (MIN) job to validate this setup
  • Maximum Configuration
    • This config contains the maximum of cluster nodes and the maximum storage supported by the solution family.
    • Processor and memory should be equal or higher than the lowest supported value for the solution, but need not be the maximum possible supported value. The processor and memory values should be representative of the most common skus for the solution.
    • Please use the PrivateCloudSimulator - System.Solutions. AzureStack (MAX) job to validate this setup

Execute

  • Open HLK Studio

  • Follow the Windows HLK Getting Started guide to create a machine pool

  • Navigate to the Project tab and click Create Project

  • Enter a project name and press Enter

  • Navigate to the Selection tab

  • Select the machine pool containing the system under test

  • Select systems on the left panel and then select the PCS test controller (NOTE: Not the machine that needs to be certified).

    hlk studio with pcs test controller selected

  • Right-click on the selected device and select Add/Modify Features

  • In the features dialog, select System.Solution.AzureStack and click OK

  • Navigate to the Tests tab

  • Select PrivateCloudSimulator - System.Solutions.AzureStack

  • Click Run Selected

  • In the Schedule dialog,

    • Enter values for the required test parameters
      • DomainName: Test user's fully qualified domain name (FQDN).
      • UserName: Test user's user name
      • Password: Test user's password
      • ComputeCluster: compute cluster name
      • StoragePath: This location(s) will be on the disk device under test. Default value is "". It uses all the available CSVs from compute cluster. You can use different path by entering comma seperated paths. Example: "C:\ClusterStorage\Volume1,C:\ClusterStorage\Volume2"
      • VmSwitchName: Name of virtual switch to be used by VMs. Default value is "".
      • FreeDriveLetter: Default value is R. During setup, PcsFiles.vhd file is mounted to this drive letter on PCS controller. Make sure this drive letter is available.
      • IsCreateCluster: Use default value
      • IsRemoveCluster: Use default value
    • Map machines to roles
      • Test Controller: Select PCS test controller machine
  • Click OK to schedule the test.

  • Please refer to View PCS report in real-time through SQL Server Reporting Services to view the real-time results for the test run.

Duration

  • PCS Actions (listed below) will run for 96 hours.
  • The complete run may take an additional 24-36 hours (including time for setup and cleanup)

Actions

The profile defines the actions to execute to validate the storage Enclosure for Microsoft AzureStack. The table below lists the actions that are included in this profile.

Action Name Description
VmCloneAction Creates a new VM.
VmLiveMigrationAction Live-migrates the VM to another cluster node.
VmSnapshotAction Takes a snapshot of the VM.
VmStateChangeAction Changes the VM state (for example, to Paused).
VmStorageMigrationAction Migrates VM storage (the VHD(s)) between cluster nodes.
VmGuestRestartAction Restarts the VM.
VmStartWorkloadAction Starts a user-simulated workload.
VmGuestFullPowerCycleAction Power-cycles the VM.
ClusterCSVMoveAction Move the CSV disks to the best available node.
StorageNodePoolMove Moves a storage pool (created in Storage Spaces) to a different owner node in the storage cluster.
StorageNodeRestart Restarts a node in the storage cluster.
StorageNodeBugcheck Bug checks one node of the storage cluster.
StorageNodeDiskReadTimeoutAction This action goes through disks that tolerate errors (not read-only, clustered, no simple spaces) and waits for read IO. Once an IO is intercepted, it will cause the IO to timeout. If a single timeout is detected on any disk, the action is considered successful. This action is invoked on storage nodes every 15 minutes.
StorageNodeDiskWriteTimeoutAction This action goes through disks that tolerate errors (not read-only, clustered, no simple spaces) and waits for write IO. Once an IO is intercepted, it will cause the IO to timeout. If a single timeout is detected on any disk, the action is considered successful. This action is invoked on storage nodes every 15 minutes.
StorageNodeBusResetAction This action attempts to inject a bus reset to any of the physical disks backing the pool. First, a timeout to a read or write IO is attempted, if that is successful then the corresponding abort, reset LUN, and reset target commands are failed. If any of these succeed then a bus reset will be triggered. If any disk issues a bus reset, the action is then considered successful.
StorageNodePortDisableAllAction This action disables all the storage controllers in the node. All of the SCSI controllers are disabled, if one is successfully disabled then the action is considered passed. After the specified time, all of the controllers are then re-enabled.
StorageRetireAndRepairAction This action retires a disk from a pool and starts repair. If spaces doesn't get healthy, the action fails. The action randomly picks a pool and tries to retire a disk in the pool. If the disk is set as readonly, or it is a simple space or is used for cluster purposes (i.e. quorum resource) then the action is skipped DisableNetworkAdapters Disables one of the network adapter that carries the storage traffic.
StorageNodeNetworkDisconnectAction Disables one of the network adapters that carries the storage traffic.
StorageNodeDiskIoTimeoutOnceAction Times out a single read or write across the storage node. This does not time out the retry attempt for this IO, so the disk will not go unresponsive.
StorageNodeUpdateStorageProviderCacheAction Calls update-storageprovidercache command in PowerShell.

View PCS report in real-time through SQL Server Reporting Services

While PCS operations are running, reports are saved in a SQL database on the PCS Controller. Each report lists all operations that were performed, their pass percentages, and all resources that were acquired and released during the test. A new database is created for each test run to enable you to review data from previous test runs at any time.

To view the report, follow these steps:

  • By default, Internet Explorer Enhanced Security Configuration is enabled on Windows Server. You need to disable it to view the report.

    Open Server Manager => Local Server => Click IE Enhanced Security Configuration to turn it off for administrators and users.

  • Open IE from PCS controller and visit http://PcsControllerMachineName/Reports

    pcs reporting page in internet explorer

  • Click PCS Reports => PCSRuns.

  • Each PCS run is identified by a unique Pass Run ID.

    ie reporting showing pass run ids

  • Click a Pass Run ID (for example, click f44b3f88-3dbf-476e-9294-9d479ca0a369) to open a report from the PCS run. The data in these reports is live. While a test runs, you can monitor the progress of a test run in real-time.

    • An overview of all resources (nodes, cluster, and VMs) that participated in the test run.
    • All actions that were performed on each resource. The Pass and Fail columns report the number of actions that passed and failed.

    ie reporting showing run information

  • In the Overall Operation Information table, you can click links in the Action/Pass/Fail column to open detail pages, which give you more information about the action's results. For example, if you clicked the failure number 9 by the VMLiveMigrationAction entry, you would see the summary shown in the following illustration.

    ie reporting showing vmlivemigrationaction

  • The first entry above provides the following information:

    • Failure ID: When we encounter a failure in PCS, we generalize the Failure Message and generate a unique Hash for it. In above example the Failure ID is 97c12afd-23a8-3982-e304-a5dc6793950d

    • Failure Hash: Generalized failure message. In the example above, the failure hash is

      Virtual Machine <VIRTUAL MACHINE> live migration failed at progress <PERCENTAGE> (migration state: Migrating)
      Error: Virtual machine migration operation for '<VIRTUAL MACHINE>' failed at migration destination '<COMPUTE NODE>'. (Virtual machine ID <GUID>)
      Failed to receive data for a Virtual Machine migration: This operation returned because the timeout period expired. (0x800705B4).

    • Count Current Run: The count of actions of a particular type that failed with this particular error message during this run. In the above example, VMLiveMigrationAction was run 3 times.

    • Count All Runs: A count of actions that failed because of this particular failure across all PCS runs. For the VMLiveMigrationAction, this count was 3.

    • PCS Runs Affected: Tells how many runs have been affected by this failure. For VMLiveMigrationAction, only 1 PCS run was affected.

  • To look further into the error - you can click a failure ID on that screen to drill down to a global history of the failure type across all PCS runs. For example, click 97c12afd-23a8-3982-e304-a5dc6793950d to display the following. The page lists all failed operations, grouped by failure type, which has the effect of highlighting key features that you might need to investigate.

    ie reporting showing failing actions by cause

  • If you click the Action ID, you can drill down farther to see an Action Log Report. Errors are shown in red; Warnings are shown in yellow.

    ie reporting showing action log report

Troubleshoot a PCS run from the HLK Controller

There are multiple stages in PCS Execution Flow. Below is an example when viewing a result from HLK Manager => Explorers => Job Monitor => select Machine Pool => select the job in Job Execution Status.

pcs controller showing task execution status

If PCS failed at Setup, Execute, or Cleanup stage, you can browse job logs by right click the job name (or a child task name) => click Browse Job Logs. The log file names are PCS-E2Elaunch_Setup.log, PCSE2Elaunch_Execute.log, and PCS-E2Elaunch_Cleanup.log. Log files should contain information about failures. Try to search for unexpected exception near the end of log files.

Troubleshoot a PCS run from the PCS Controller

When a PCS job fails at Setup/Execute/Cleanup stage, you can rerun the stage directly from PCS controller. This method is useful to for troubleshooting problems in these stages.

  • Open elevated command prompt
  • ReRun ReRunPcsSetup.cmd, ReRunPcsExecute.cmd, or ReRunPcsCleanup.cmd script

Logs and Diagnose

PCS has three main stages: Setup, Execute, and Cleanup. A PCS job uses PCS-E2Elaunch.ps1 script to launch these three stages. Their log file names are called PCS-E2ELaunch_Setup.log, PCS-E2ELaunch_Execute.log and PCSE2ELaunch_Cleanup.log.

When a PCS run has completed, PCS analyzes logs during Cleanup stage. A run succeeded when the following criteria are met, with the analyzed report saved as PCSReport.htm.

  • All PCS actions has at least 90% pass rate
  • No unexpected crash of any cluster node, except the ones initiated by PCS for testing purpose

The following files are generated on PCS Controller during Cleanup stage.

  • PcsReport.htm: summary about the run.
  • ClusterName-PRE.mht.html: cluster validation test report that is run before Execute stage
  • ClusterName-POST.mht.html: cluster validation test report that is run after Execute stage
  • PcsLog-DateTime.zip: contains logs and is copied to the HLK Controller when test finished.
    • MHTML folder: contains PCS SQL logs
    • SDDCDiagnosticInfo folder: contains cluster logs and event logs

The issues seen or resulting from a PCS certification run has been observed to not be related to PCS itself many times. Below contains a basic guide to help narrow down some of the issues.

  • Run cluster validation test and check report for errors.
  • On the failover cluster manager, check whether all the nodes, vDisk, and Pool are in healthy condition. If they are not, it is fine to invest time on checking the logs/debugging before calling upon MSFT.
  • Open Hyper-V manager and make sure the VMs and vSwitches get enumerated (also possible by running Get-VM or Get-VMSwitch).
  • Make sure you are able create a vSwitch outside of PCS tests on one/all of the compute nodes.
  • Make sure you can create a VM on one/all of the nodes and can attach a vmNetworkAdapter it to a vSwitch.
  • Look for dump files generated due to bugchecks by running "dir /s *.dmp" from the %systemdrive% on the compute nodes.
  • Possible usage of LiveKD to look at kernel modules/threads that are stuck, if you do not have kernel debugger attached.
  • Check if compute nodes' license is active, as Eval version license get reset every 180 days.

Generate a ZIP file that contains PCS logs

You can run the following script from PCS controller to generate a ZIP file that contains required logs. This method is useful when job is cacelled or while test is running.

C:\pcs\PCS-E2ELaunch.ps1 -DomainName <string> -UserName <string> -Password <string> -ComputeCluster <string> [-StorageCluster <string>] -CollectLog [-CollectLogLevel <int>]

Parameters

  • DomainName: Test user's fully qualified domain name (FQDN).
  • UserName: Test user's user name
  • Password: Test user's password
  • ComputeCluster: Name of compute cluster name
  • StorageCluster: optional, Name of storage cluster name. Don't specify this parameter if Computer and Storage clusters are the same.
  • CollectLog: Required
  • CollectLogLevel: optional, default is 1. Enter 3 to collect verbose logs.

Generate PcsReport.htm file manually

While PCS is running, you can run the following cmdlets on PCS controller to generate a HTML report that lists unexpected bugchecks from all nodes.

Import-Module C:\PCS\PrivateCloudSimulator-Manager.psm1
Get-PCSReport

Customize PCS actions

  • Each PCS job has its own xml files that define its actions.

  • Each job could contain up to 3 xml files: PrivateCloudSimulator.xml, PrivateCloudSimulator_Create.xml, PrivateCloudSimulator_Storage.xml

  • These XML files can be found on HLK Controller. Below is an example for PrivateCloudSimulator - System.Solution.AzureStack job. The highlighted folder name is the name of HLK job.

    C:\Program Files (x86)\Windows Kits\10\Hardware Lab Kit\Tests\amd64\PCS\System.Solutions.AzureStack\PrivateCloudSimulator_Create.xml

Example 1: Enable/Disable an action

<ConfigurableType Type="Microsoft.PrivateCloudSimulator.VM.Actions.HyperV.VmCloneAction, Microsoft.PrivateCloudSimulator.VM.Actions.HyperV">
  <ConfigurableTypeField FieldName="Interval" ValueType="System.TimeSpan" Value="00:01:00" />
  <ConfigurableTypeField FieldName="StartupNumber" ValueType="System.Int32" Value="2" />
  <ConfigurableTypeField FieldName="InjectVMRTInGuest" ValueType="System.Boolean" Value="true" />
  <ConfigurableTypeField FieldName="BaseVHDPath" ValueType="System.String" Value="%BASEVHD%" />
</ConfigurableType>
  • Test Action name is VmCloneAction.
  • The Interval field sets the frequency with which the action runs. Use the format hh:mm:ss. For example, the value 02:00:00 repeats the action every 2 hours.
  • The StartUpNumber field defines the number of instances of that action to initiate on each node of the compute cluster. To disable an action, set this field to zero.
  • Don't modify other fields.

Example 2: Change VMs to use differencing disks

<ConfigurableType Type="Microsoft.PrivateCloudSimulator.VM.Actions.HyperV.VmCloneBase, Microsoft.PrivateCloudSimulator.VM.Actions.HyperV">
  <ConfigurableTypeField FieldName="VmClusteringPercentage" ValueType="System.Int32" Value="100" />
  <ConfigurableTypeField FieldName="UseDiffDisks" ValueType="System.Boolean" Value="false" />
</ConfigurableType>
  • PCS by default makes a copy of the provided guest OS VHD to create VMs that have dynamic virtual disks by default. To create VMs that have differencing disks instead, set the UseDiffDisks value to true.

Example 3: Change the number of created VMs per node

<ConfigurableType Type="Microsoft.PrivateCloudSimulator.VM.Actions.HyperV.VmCreationBase, Microsoft.PrivateCloudSimulator.VM.Actions.HyperV">
  <!-- MaxVmCount is Max Number of VMs on any one node -->
  <ConfigurableTypeField FieldName="MaxVmCount" ValueType="System.Int32" Value="20" />    
</ConfigurableType>
  • PCS by default creates 20 VMs per cluster node. The average VM size could be 40GB. In a 4-node cluster environment, it could take 20 * 4 * 40 = 3200GB disk space. If you are trying to certify your hardware, don't change the default value. You should consider adding more disks, instead of reducing the number.

Customize Action Logs

A PCS run has a RunId. A PCS action has an action ID. When a PCS action fails, PCS removes the variant (i.e. VM name) from the failure message and generates a unique hash value for it. Similar failures have same unique hash value. PCS then groups them together in SQL report site.

PCS uses .NET Trace Listeners to collect test results. These listeners are defined in Microsoft.PrivateCloudSimulator.exe.config.

  • SQLOnline: This listener logs the results into SQL database.
  • AnalyticalLogGather: This listener collects extra information when an action is failed.

When a particular action fails or a particular hash value is seen, you can configure AnalyticalLogGather listener to collect event logs, cluster logs, or call a script. This is defined in ActionFailureReactionPolicy.xml.

In ActionFailureReactionPolicy.xml, PCS supports two types of triggers and three types of reactions. Using this XML, you can define rules like "when trigger X is seen, take reactions Y and Z". Most actions would have NodeScope set to ReservedOnly and MaxLevel set to 3 (Critical, Error, and Warning events).

Trigger:

Type Data
ActionFail ActionFullName
KnownFailure FailureHash

Reaction:

Type Data
ETWCollection Channel, NodeScope, StorageLocation, MaxLevel
ClusterLogCollection UseLocalTime, NodeScope, StorageLocation, MaxTimeDuration (optional)
CustomPS ScriptFullPath, NodeScope, Argument

Valid NodeScope values are the following:

  • AllNodes
  • ComputeOnly
  • StorageOnly
  • EdgeOnly
  • NCOnly
  • ReservedOnly

Valid MaxLevel values are the following:

  • 0 (logs at all levels)
  • 1 (Critical)
  • 2 (Error)
  • 3 (Warning)
  • 4 (Information)
  • 5 (Verbose)

Examples:

<Trigger>
  <Type>ActionFail</Type>
  <Data Name="ActionFullName" Value="Microsoft.HyperV.Test.Stress.PrivateCloud.ComputeNode.Action.StorageNodeRestartAction">
  </Data>
  <ReactionMatchList>
    <!-- Details of Reaction are Defined Below and are referenced using the ID attribute-->
    <MatchingReaction ID ="1"></MatchingReaction>
    <MatchingReaction ID ="2"></MatchingReaction>
  </ReactionMatchList>
</Trigger>49
<Reaction ID="1">
  <Type>ETWCollection</Type>
  <Data Name="Channel" Value="Microsoft-Windows-Hyper-V-VMMS-Analytic"></Data>
  <Data Name="NodeScope" Value="ReservedOnly"></Data>
  <Data Name="StorageLocation" Value="C:\PCS\PCSEventData\%NODE%\%ActionId%\EventLogs"></Data>
  <Data Name="MaxLevel" Value="3"></Data>
</Reaction>

Action log files are saved to 'FORENSICLOGLOCATION' folder on PCS controller. By default, it is C:\PCS\PCSEventData.

For each failed action, the following information is collected from the reserved node(s). This log location can be seen in the action's SQL report page.

  • %MachineName%\%RunId%\ClusterLogs\%ActionId%
  • %MachineName%\%RunId%\EventLogs\%ActionId%
  • %MachineName%\%RunId%\CustomResponse\%ActionId%

FAQ

See Private Cloud Simulator FAQ

Appendix: Software Defined Datacenter (SDDC) Additional Qualifiers (AQs)

All server systems and components used in Windows Server 2016 WSSD offers must be certified for the Windows Server 2016 logo and meet the Windows Server 2016 Software-Defined Data Center (SDDC) additional qualifiers (AQs). The required HLK Feature names are listed in the table below.

COMPONENT TYPE: NIC

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
Device.Network.LAN.10GbOrGreater X X
Device.Network.LAN.VMQ X X
Device.Network.LAN.RSS X X
Device.Network.LAN.LargeSendOffload X X
Device.Network.LAN.ChecksumOffload X X
Device.Network.LAN.Base X X
Device.Network.LAN.VXLAN X
Device.Network.LAN.VMMQ X
Device.Network.LAN.MTUSize Required if using Encap offloads X
Device.Network.LAN.KRDMA X
Device.Network.LAN.GRE X
Device.Network.LAN.DCB Required if using Encap offloads X
Device.Network.LAN.AzureStack X

COMPONENT TYPE: SAS HBA

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
Device.Storage.Controller X X
Device.Storage.Controller.Flush X X
Device.Storage.Controller.PassThroughSupport X X
Device.Storage.Controller.Sas X X
Device.Storage.Controller.AzureStack X X

COMPONENT TYPE: NVMe Storage Devices

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
Device.Storage.ControllerDrive.NVMe X X
Device.Storage.Hd.AzureStack X X

COMPONENT TYPE: HDD (SAS)

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
Device.Storage.Hd X X
Device.Storage.Hd.DataVerification X X
Device.Storage.Hd.Flush X X
Device.Storage.Hd.PortAssociation X X
Device.Storage.Hd.Sas X X
Device.Storage.Hd.Scsi.ReliabilityCounters X X
Device.Storage.Hd.AzureStack X X
Device.Storage.Hd.FirmwareUpgrade X X

COMPONENT TYPE: HDD (SATA)

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
Device.Storage.Hd.Sata X X
Device.Storage.Hd X X
Device.Storage.Hd.DataVerification X X
Device.Storage.Hd.Flush X X
Device.Storage.Hd.PortAssociation X X
Device.Storage.Hd.AzureStack X X
Device.Storage.Hd.FirmwareUpgrade X X

COMPONENT TYPE: SSD (SAS)

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
Device.Storage.Hd X X
Device.Storage.Hd.DataVerification X X
Device.Storage.Hd.PortAssociation X X
Device.Storage.Hd.Sas X X
Device.Storage.Hd.AzureStack X X
Device.Storage.Hd.FirmwareUpgrade X X

COMPONENT TYPE: Server

Required HLK Features SDDC Standard AQ SDDC Premium and AzureStack AQ
System.Fundamentals.Firmware X X
System.Server.Virtualization X X
System.Server.AzureStack.Security X X
System.Server.Assurance X
System.Server.AzureStack.BMC X