Virtualization Fabric Design Considerations Guide
Who is this guide intended for? Information technology (IT) professionals within medium to large organizations who are responsible for designing a virtualization fabric that supports many virtual machines. Through the remainder of this document, these individuals are referred to as fabric administrators. People who administer virtual machines hosted on the fabric are referred to as virtual machine administrators, but they are not a target audience for this document. Within your organization, you may have the responsibility of both roles.
How can this guide help you? You can use this guide to understand how to design a virtualization fabric that is able to host many virtual machines in your organization. In this document, the collection of servers and hypervisors, and the storage and networking hardware that are used to host the virtual machines within an organization is referred to as a virtualization fabric. The following graphic shows an example virtualization fabric.
Figure SEQ Figure \* ARABIC 1: Example virtualization fabric
Note: Each diagram in this document exists on a separate tab of the Virtualization Fabric Design Considerations Diagrams document, which you can download by clicking the figure name in each table caption.
Although all virtualization fabrics contain servers for storage and hosting virtual machines, in addition to the networks that connect them, every organization’s virtualization fabric design will likely be different than the example illustrated in Figure 1 due to different requirements.
This guide details a series of steps and tasks that you can follow to assist you in designing a virtualization fabric that meets your organization’s unique requirements. Throughout the steps and tasks, the guide presents the relevant technologies and feature options available to you to meet functional and service quality (such as availability, scalability, performance, manageability, and security) level requirements.
Though this document can help you design a manageable virtualization fabric, it does not discuss design considerations and options for managing and operating the virtualization fabric with a product such as Microsoft System Center 2012 or System Center 2012 R2. For more information, see System Center 2012 in the TechNet library.
This guide helps you design a virtualization fabric by using Windows Server 2012 R2 and Windows Server 2012 and vendor-agnostic hardware. Some features discussed in the document are unique to Windows Server 2012 R2, and they are called out throughout the document.
Assumptions: You have some experience deploying Hyper-V, virtual machines, virtual networks, Windows Server file services, and Failover Clustering, and some experience deploying physical servers, storage, and network equipment.
Additional resources
Before designing a virtualization fabric, you may find the information in the following documents helpful:
Microsoft Cloud Services Foundation Reference Architecture – Reference Model
Microsoft Cloud Services Foundation Reference Architecture – Principles, Concepts, and Patterns
Both of these documents provide foundational concepts that are observed across multiple virtualization fabric designs and can serve as a basis for any virtualization fabric design.
Feedback: To provide feedback about this document, send e-mail to virtua@microsoft.com.
![]() |
Did you know that Microsoft Azure provides similar functionality in the cloud? Learn more about Microsoft Azure virtualization solutions. Create a hybrid virtualization solution in Microsoft Azure: |
Design considerations overview
The remainder of this document provides a set of steps and tasks that you can follow to design a virtualization fabric that best meets your requirements. The steps are presented in an ordered sequence. Design considerations you learn in later steps may require you to change decisions you made in earlier steps however, due to conflicts. Every attempt is made to alert you to potential design conflicts throughout the document though.
You will arrive at the design that best meets your requirements only after iterating through the steps as many times as necessary to incorporate all of the considerations within the document.
Step 1: Determine virtual machine resource requirements
Step 2: Plan for virtual machine configuration
Step 3: Plan for server virtualization host groups
Step 4: Plan for server virtualization hosts
Step 5: Plan for virtualization fabric architecture concepts
Step 6: Plan for initial capability characteristics
Step 1: Determine virtual machine resource requirements
The first step in designing a virtualization fabric is to determine the resource requirements of the virtual machines that the fabric will host. The fabric must include the physical hardware necessary to meet those requirements. The virtual machine resource requirements are dictated by the operating systems and applications that run within the virtual machines. For the remainder of this document, the combination of the operating system and applications that run within a virtual machine is referred to as a workload. The tasks in this step help you define the resource requirements for your workloads.
Tip: Rather than assessing the resource requirements of your existing workloads and then designing a virtualization fabric that is able to support each of them, you may decide to design a virtualization fabric that can meet the needs of most common workloads instead. Then separately address the workloads that have unique needs.
Examples of such virtualization fabrics are those offered by public cloud providers, such as Microsoft Azure (Azure). For more information, see Virtual Machine and Cloud Service Sizes for Azure.
Public cloud providers typically offer a selection of virtual machine configurations that meet the needs of most workloads. If you decide to take this approach, you can skip directly to Step 2: Plan for virtual machine configuration in this document. Additional benefits to using this approach are:
When you decide to migrate some of your on-premises virtual machines to a public cloud provider, if your on-premises virtual machine configuration types are similar to those of your public provider, migrating the virtual machines will be easier than if the configuration types are different.
It may allow you to more easily forecast capacity requirements and enable a self-service provisioning capability for your virtualization fabric. This means that virtual machine administrators within the organization can automatically self-provision new virtual machines without involvement from the fabric administrators.
Task 1: Determine workload resource requirements
Each workload has requirements for the following resources. The first thing you’ll want to do is answer the following questions listed for each of your workloads.
Processor: What processor speed or architecture (Intel or AMD) or number of processors are required?
Network: In gigabits per second (Gbps), what network bandwidth is required for inbound and outbound traffic? What’s the maximum amount of network latency the workload can tolerate to function properly?
Storage: How many gigabytes (GB) of storage do the application and operating system files of the workload require? How many GBs of storage does the workload require for its data? How many input/output operations per second (IOPS) does the workload require to its storage?
Memory: In gigabytes (GB), how much memory does the workload require? Is the workload non-uniform memory access (NUMA) aware?
In addition to understanding the previous resource requirements, it’s important to also determine:
Whether the resource requirements are minimum or recommended.
What are the peak and average requirement for each of the hardware requirements on an hourly, daily, weekly, monthly, or annual basis.
The number of minutes of downtime per month that are acceptable for the workload and the workload’s data. In determining this, factor in the following:
Does the workload run on only one virtual machine, or does it run on a collection of virtual machines acting as one, such as a collection of network load-balanced servers all running the same workload? If you are using a collection of servers, the expressed downtime should be clear about whether it applies to each server in the collection, all servers in the collection, or at the collection level.
Working and non-working hours. For example, if nobody will use the workload between the hours of 9:00 P.M. and 6:00 A.M., but it’s critical that it is available as much as possible between the hours of 6:00 A.M. and 9:00 P.M., with an acceptable amount of downtime per month of only ten minutes, this requirement should be specified.
The amount of data loss that is acceptable in the event of an unexpected failure of the virtual infrastructure. This is expressed in minutes because virtual infrastructure replication strategies are typically time-based. Although no data loss is often expressed as a requirement, consider that achieving it often comes at a premium price, and it might also come with lower performance.
Whether the workload files and/or its data must be encrypted on disk and whether its data must be encrypted between the virtual machines and its end users.
You have the following options available for determining the previous resource requirements.
Option |
Advantages |
Disadvantages |
---|---|---|
Manually assess and log resource utilization |
Able to report on whatever you choose |
Can require significant manual effort |
Use the Microsoft Assessment and Planning Toolkit to automatically assess and log resource utilization |
|
Reports may or may not provide all the data you require |
Note: If you choose to determine your resource requirements manually, you can download Virtualization Fabric Design Considerations Guide Worksheets and enter the information in the Workload resource req. worksheet. This guide references specific worksheets in that document.
Task 2: Define workload characterizations
You can define any number of workload characterizations in your environment. The following examples were selected because each of them requires a different configuration of virtualization fabric components, which will be discussed further in later steps.
Stateless: Write no unique information to their local hard disk after they’re initially provisioned and assigned unique computer names and network addresses. They may however, write unique information to separate storage, such as a database. Stateless workloads are optimal for running on a virtualization fabric because a “master” image can be created for the virtual machine. This image can be easily copied and booted on the virtualization fabric to add scale to the workload or to quickly replace a virtual machine that becomes unavailable in the event of a virtualization host failure. An example of a stateless workload is a web server running a front-end web application.
Stateful: Write unique information to their local hard disk after they’re initially provisioned and assigned unique computer names and network addresses. They may also write unique information to separate storage, such as a database. Stateful workloads typically require more complex provisioning and scaling strategies than stateless workloads. High availability strategies for stateful workloads might require shared state with other virtual machines. An example of a stateful workload is the SQL Server Database Engine.
Shared stateful: Stateful workloads that require some shared state with other virtual machines. These workloads often use Failover Clustering in Windows Server to achieve high availability, which requires access to shared storage. An example of a shared stateful workload is Microsoft System Center – Virtual Machine Manager.
Other: Characterizes workloads that may not run at all, or not run optimally, on a virtualization fabric. Attributes of such workloads are that they require:
Access to physical peripherals. An example of such an application is a telephony workload that communicates with a telephony network adapter in a physical host.
Resource requirements much higher than most of your other workloads. An example is a real-time application that requires less than one millisecond latency between application tiers.
These applications may or may not run on your virtualization fabric, or they may require very specific hardware or configuration that is not shared by most of your other workloads.
Note: You can define your workload characterizations in the Settings worksheet and then select the appropriate characterization for each workload in the Workload resource req. worksheet.
Step 2: Plan for virtual machine configuration
In this step, you’ll define the types of virtual machines you’ll need to meet the resource requirements and characterizations of the workloads you defined in Step 1.
Task 1: Define compute configuration
In this task, you’ll determine the amount of memory and processors that each virtual machine requires.
Task 1a: Define virtual machine generation type
Windows Server 2012 R2 introduced generation 2 virtual machines. Generation 2 virtual machines support hardware and virtualization features that are not supported in generation 1 virtual machines. It’s important to make the right decision for your requirements, because after a virtual machine has been created, its type cannot be changed.
A generation 2 virtual machine provides the following new functionality:
PXE boot by using a standard network adapter
Boot from a SCSI virtual hard disk
Boot from a SCSI virtual DVD
Secure Boot (enabled by default)
UEFI firmware support
Generation 2 virtual machines support the following guest operating systems:
Windows Server 2012 R2
Windows Server 2012
64-bit versions of Windows 8.1
64-bit versions of Windows 8
Specific versions of Linux. For a list of distribution and versions that support generation 2 virtual machines, see Linux Virtual Machines on Hyper-V.
The following table lists the advantages and disadvantages of generation 1 and generation 2 virtual machines.
Option |
Advantages |
Disadvantages |
---|---|---|
Generation 1 |
|
No access to new virtual machine functionality |
Generation 2 |
|
|
Important: Linux generation 2 virtual machines do not support Secure Boot. When you create a virtual machine and you intend to install Linux, you must turn off Secure Boot in the virtual machine settings.
Additional information:
Generation 2 Virtual Machine Overview
Task 1b: Define memory
You should plan the size of your virtual machine memory as you typically do for server applications on a physical computer. It should reasonably handle the expected load at ordinary times and at peak times. Insufficient memory can significantly increase response times and CPU or I/O usage.
Static Memory or Dynamic Memory
Static memory is the amount of memory assigned to the virtual machine. It is always allocated when the virtual machine is started and it does not change when the virtual machine is running. All of the memory is assigned to the virtual machine during startup and memory that is not being used by the virtual machine is not available to other virtual machines. If there is not enough memory available on the host to allocate to the virtual machine when it is started, the virtual machine will not start.
Static memory is good for workloads that are memory intensive and for workloads that have their own memory management systems, such as SQL Server. These types of workloads will perform better with static memory.
Note: There is no setting to enable static memory. Static memory is enabled when the Dynamic Memory setting is not enabled.
Dynamic Memory allows you to better use the physical memory on a system by balancing the total physical memory across multiple virtual machines, allocating more memory to virtual machines that are busy, and removing memory for less-used virtual machines. This can lead to higher consolidation ratios, especially in dynamic environments such as in the Virtual Desktop Infrastructure (VDI) or web servers.
When using static memory, if a virtual machine is assigned 10 GB of memory and it is only using 3 GB, the remaining 7 GB of memory is not available for use by other virtual machines. When a virtual machine has Dynamic Memory enabled, the virtual machine only uses the amount of memory that is required, but not below the minimum RAM that is configured. This frees up more memory for other virtual machines.
The following table lists the advantages and disadvantages for static memory and Dynamic Memory.
Option |
Advantages |
Disadvantages |
---|---|---|
Static memory |
|
|
Dynamic Memory |
|
|
The following are the memory configuration settings:
Startup RAM: Specifies the amount of memory required to start the virtual machine. The value needs to be high enough to allow the guest operating system to start, but should be as low as possible to allow for optimal memory utilization and potentially higher consolidation ratios.
Minimum RAM: Specifies the minimum amount of memory that should be allocated to the virtual machine after the virtual machine has started. The value can be set as low as 32 MB to a maximum value equal to the Startup RAM value. This setting is only available when Dynamic Memory is enabled.
Maximum RAM: Specifies the maximum amount of memory that this virtual machine is allowed to use. The value can be set from as low as the value for Startup RAM to as high as 1 TB. However, a virtual machine can use only as much memory as the maximum amount supported by the guest operating system. For example, if you specify 64 GB for a virtual machine running a guest operating system that supports a maximum of 32 GB, the virtual machine cannot use more than 32 GB. This setting is only available when Dynamic Memory is enabled.
Memory weight: Provides Hyper-V with a way to determine how to distribute memory among virtual machines if there is not enough physical memory available in the host to give every virtual machine its requested amount of memory. Virtual machines with a higher memory weight take precedence over virtual machines with lower memory weights.
Notes:
Dynamic Memory and virtual NUMA features cannot be used at the same time. A virtual machine that has Dynamic Memory enabled effectively has only one virtual NUMA node, and no NUMA topology is presented to the virtual machine regardless of the virtual NUMA settings.
When installing or upgrading the operating system of a virtual machine, the amount of memory that is available to the virtual machine during the installation and upgrade process is the value specified as Startup RAM. Even if Dynamic Memory has been configured for the virtual machine, the virtual machine only uses the amount of memory that is configured in the Startup RAM setting. Ensure that the Startup RAM value meets the minimum memory requirements of the operating system during the installation or upgrade procedures.
Guest operating system running in the virtual machine must support Dynamic Memory.
Complicated database applications like SQL Server or Exchange Server implement their own memory managers. Consult with workload’s documentation to determine if the workload is compatible with Dynamic Memory.
Additional information:
Task 1c: Define processor
The following configuration settings must be determined for configuring virtual machines:
Determine the number of processors required for each virtual machine. This will often be the same as the number of processors required by the workload. Hyper-V supports a maximum of 64 virtual processors per virtual machine.
Determine resource control for each virtual machine. Limits can be set to ensure that no virtual machine is able to monopolize the processor resources of the virtualization host.
Define a NUMA topology. For high-performance NUMA-aware workloads, you can specify the maximum number of processors, the memory amount allowed on a single virtual NUMA node, and the maximum number of nodes allowed on a single processor socket. For more information, read Hyper-V Virtual NUMA Overview.
Note: Virtual NUMA and Dynamic Memory cannot be used at the same. When you are trying to decide whether to use Dynamic Memory or NUMA, answer the following questions. If the answer to both is Yes, enable virtual NUMA and do not enable Dynamic Memory.
Is the workload running in the virtual machine NUMA-aware?
Will the virtual machine consume more resources, processors, or memory than are available on a single physical NUMA node?
Task 1d: Define supported operating systems
You need to confirm that the operating system required by your workload is supported as a guest operating system. Consider the following:
Supported Windows Guest Operating Systems for Hyper-V in Windows Server 2012 R2 and Windows 8.1
Supported Windows Guest Operating Systems for Hyper-V in Windows Server 2012 and Windows 8
For Linux: For information about supported Linux distributions, see Linux Virtual Machines on Hyper-V.
Note: Hyper-V includes a software package for supported guest operating systems that improves performance and integration between the physical computer and the virtual machine. This collection of services and software drivers is referred to as integration services. For the best performance, your virtual machines should be running the latest integration services.
Licensing
You need to ensure that the guest operating systems are properly licensed. Please review the vendor’s documentation for any specific licensing requirements when you are running a virtualized environment.
Automatic Virtual Machine Activation (AVMA) is a feature that was introduced in Windows Server 2012 R2. AVMA binds the virtual machine activation to the licensed virtualization server and activates the virtual machine when it starts up. This eliminates the need to enter licensing information and activate each virtual machine individually.
AVMA requires that the host is running Windows Server 2012 R2 Datacenter and that the guest virtual machine operating system is Windows Server 2012 R2 Datacenter, Windows Server 2012 R2 Standard, or Windows Server 2012 R2 Essentials.
Note: You need to configure AVMA on each host deployed in your virtualization fabric.
Additional information:
Automatic Virtual Machine Activation
Task 1e: Define virtual machine naming convention
Your existing computer naming strategy might indicate where the computer or server is physically located. Virtual machines can move from host to host, even to and from different datacenters, so the existing naming strategy might no longer be applicable. An update to the existing naming convention to indicate that the computer is running as a virtual machine can help locate where the virtual machine is running.
Task 2: Define network configuration
Each virtual machine will receive or send different types of network traffic. Each type of network traffic will have different performance, availability, and security requirements.
Generation 1 virtual machines can have a maximum of 12 network adapters—4 legacy network adapters and 8 virtual network adapters. Generation 2 virtual machines do not support legacy network adapters, so the maximum number of adapters that is supported is 8.
Task 2a: Determine network traffic types
Each virtual machine will send and receive different types of data, such as:
Application data
Data backup
Communications with client computers, servers, or services
Intracluster communication, if the workload is part of a guest virtual machine failover cluster
Support
Storage
If you already have existing networks that are dedicated to different types of network traffic, you may choose to use those for this task. If you’re defining new network designs to support your virtualization fabric, for each virtual machine, you can define which types of network traffic it will support.
Task 2b: Define network traffic performance options
Each network traffic type has maximum bandwidth and minimum latency requirements. The following table shows the strategies that can be used to meet different network performance requirements.
Strategy |
Advantages |
Disadvantages |
---|---|---|
Separation of traffic types to different physical network adapters |
Separates traffic so it is not being shared by other traffic types |
|
Hyper-V bandwidth management (Hyper-V QoS) |
|
|
SR-IOV |
|
|
|
|
|
Jumbo frames |
|
|
Task 2c: Define network traffic availability options
NIC Teaming, also known as load balancing and failover (LBFO), allows multiple network adapters to be placed in a team for the purposes of bandwidth aggregation and traffic failover. This maintains connectivity in the event of a network component failure. NIC Teaming is typically configured on the host, and when you create the virtual switch, it is bound to network adapter team.
The network switches that are deployed determine the NIC Teaming mode. The default settings in Windows Server 2012 R2 should be sufficient for the majority of deployments.
Note: SR-IOV is not compatible with NIC Teaming. For more information about SR-IOV, see Task 2b: Define network traffic performance options.
Additional information:
Task 2d: Define network traffic security options
Each network traffic type can have different security requirements, for example, requirements related to isolation and encryption. The following table explains strategies that can be used to meet various security requirements.
Strategy |
Advantages |
Disadvantages |
---|---|---|
Separation on different network adapters |
Separate traffic from other network traffic |
Does not scale well. The more networks you have, the more network adapters you need to install and manage on the host. |
IPsec with IPsec Task Offloading |
|
|
VLAN tagging |
|
|
|
|
|
|
Minimal impact on performance when enabled |
|
|
Minimal impact on performance when enabled |
Design decision - You can download Virtualization Fabric Design Considerations Guide Worksheets and change the sample data in the Virtual machine configs. worksheet to capture the decisions you make for all previous tasks in this step. For subsequent design decisions, this document references specific worksheets in this guide where you can enter your data.
Task 2e: Define virtual network adapters
With an understanding of the types of traffic required by the virtual machines, in addition to the performance, availability, and security strategies for the traffic, you can determine how many virtual network adapters each virtual machine will require.
A virtual network adapter is connected to a virtual switch. There are three types of virtual switches:
External virtual switch
Internal virtual switch
Private virtual switch
The external virtual switch provides the virtual machine with access to the physical network through the network adapter that is associated with the virtual switch it is connect to. A physical network adapter in the host can only be associated with a single external switch.
Generation 1 virtual machines can have a maximum of 12 network adapters—4 legacy network adapters and 8 virtual network adapters. Generation 2 virtual machines do not support legacy network adapters, so the maximum adapters supported is 8. A virtual network adapter can have one VLAN ID assigned to it, unless it is configured in trunk mode.
If you are going to assign virtual machine traffic to different VLANs, a network adapter that supports VLANs must be installed in the host and assigned to the virtual switch. You can set the VLAN ID for the virtual machine in the properties of the virtual machine. The VLAN ID that is set in the virtual switch is the VLAN ID that will be assigned to the virtual network adapter assigned to the host operating system.
Note: If you have a virtual machine that requires access to more networks than available adapters, you can enable VLAN trunk mode for a virtual machine network adapter by using the Set-VMNetworkAdapterVlan Windows PowerShell cmdlet.
Task 2f: Define IP addressing strategy
You need to determine how you will assign IP addresses to your virtual machines. If you don't, you can have IP address conflicts, which can have a negative impact on other virtual machines and physical devices on the network.
Additionally, unauthorized DHCP servers can cause havoc on your network infrastructure, and they can be especially difficult to track down when the server is running as a virtual machine. You can protect your network against unauthorized DHCP servers running on a virtual machine by enabling DHCPGuard in the settings of your virtual machines. DHCPGuard protects against a malicious virtual machine representing itself as a DHCP server for man-in-the-middle attacks.
Additional information:
Dynamic Host Configuration Protocol (DHCP) Overview
IP Address Management (IPAM) Overview
Task 3: Define storage configuration
To determine your storage configuration, you need to define the data types that the virtual machines will store and the type of storage they need.
Task 3a: Define data types
The following table lists the types of data that a virtual machine may need to store and where that type of data is often stored.
Data type |
Storage location for data type |
---|---|
Operating system files |
Within a virtual hard disk file that is stored by the virtualization host. Storage considerations for the virtualization host are addressed further in Step 4: Plan for server virtualization hosts below. |
Windows page file |
Often stored in the same location as the operating system files. |
Application program files |
Often stored in the same location as the operating system files. |
Application configuration data |
Often stored in the same location as the operating system files. |
Application data |
Often stored separately from the application and operating system files. For example, if the application was a database application, the database files are often stored on a high availability, efficient, network-based, storage solution that is separate from the location where the operating system or application program files are stored. |
Clustered Shared Volumes (CSV) and disk witness (required for guest virtual machine clustering) |
Often stored separately from the application and operating system files.
|
Crash dump files |
Often stored in the same location as the operating system files. |
Task 3b: Define storage types
The following table lists the types of storage that might be used for the data types defined in Step 2, Task 2a above.
Storage type |
Considerations |
---|---|
Virtual IDE disk |
Generation 1 virtual machines:
Generation 2 virtual machines do not support IDE devices. |
Virtual SCSI |
|
iSCSI initiator in the virtual machine |
|
Virtual Fibre Channel |
|
SMB 3.0 |
Access files stored on Server Message Block (SMB) 3.0 shares from within the virtual machine. |
Task 3c: Define virtual hard disk format and type
If you are using the virtual hard disk storage type, you must first select the VHD format that you’ll use from the options listed in the following table.
Disk format |
Advantages |
Disadvantages |
---|---|---|
VHD |
|
|
|
|
|
Used for shared storage for guest virtual machine clusters |
|
Next, select the type of disk you will use from the options listed in the following table.
Disk type |
Advantages |
Disadvantages |
---|---|---|
Fixed |
|
|
Dynamic |
Only uses disk space as required, rather than using all that’s been provisioned |
|
Differencing |
Can use less disk space if multiple differencing disks use the same parent |
|
Consider the following when you are selecting a virtual hard disk file type and format:
When you use the VHDX format, a dynamic disk can be used because it offers resiliency guarantees in addition to space savings that are associated with allocating space only when there is a need to do so.
A fixed disk can also be used, irrespective of the format, when the storage on the hosting volume is not actively monitored. This ensures that sufficient disk space is present when the VHD file is expanded at run time.
Checkpoints (formerly known as snapshots) of a virtual machine create a differencing virtual hard disk to store writes to the disks. Having only a few checkpoint s can elevate the CPU usage of storage I/O, but they might not noticeably affect performance (except in highly I/O-intensive server workloads).
However, having a large chain of checkpoints can noticeably affect performance because reading from the virtual hard disks can require checking for the requested blocks in many differencing disks. Keeping short checkpoint chains is important for maintaining good disk I/O performance.
Task 3d: Define which storage type to use for each data type
After you define the data types and storage types that virtual machines will store, you can determine which storage type and which virtual disk format and type you’ll use for each data type.
Task 4: Define virtual machine availability strategy
Though fabric administrators are responsible for the availability of the fabric, virtual machine administrators are ultimately responsible for the availability of their virtual machines. As a result, the virtual machine administrator must understand the capabilities of the fabric to design the appropriate availability strategy for their virtual machines.
The following tables analyze three availability strategies for virtual machines running workloads with the characterizations that are defined in Step 1, Task 2 above. Typically, the fabric administrator informs virtual machine administrators in advance when planned downtime activities are scheduled for the fabric so that virtual machine administrators can plan accordingly. The three availability strategies are:
Stateless
Stateful
Shared stateful
Stateless
Option |
Considerations |
---|---|
Virtual Machine Live Migration at the host level |
|
Load-balanced clusters (by using Windows Network Load Balancing) |
|
Load-balanced clusters (by using a hardware load balancer) |
|
Stateful
Option |
Considerations |
---|---|
|
Shared stateful
When running cluster-aware workloads, you can provide an additional layer of availability by enabling virtual machine guest clustering. Guest clustering supports high availability for workloads within the virtual machine. Guest clustering provides protection to the workload that is running in the virtual machines, even if a host fails where the virtual machine is running. Because the workload was protected by Failover Clustering, the virtual machine on the other node can take over automatically.
Option |
Considerations |
---|---|
|
Additional information:
Deploy a Guest Cluster Using a Shared Virtual Hard Disk
Using Guest Clustering for High Availability
Disaster Recovery
If there is a disaster, how quickly can you get the required workloads up and running so they can service clients? In some cases, the allotted time can be only a few minutes.
Replication of data from your main datacenters to your disaster recovery centers is required to ensure that the most up-to-date data can be replicated with an acceptable loss of data due to delays. By running workloads in virtual machines, you can replicate the virtual hard disks and the virtual machine configuration files from your primary site to a replica site.
The following table compares disaster recovery options.
Option |
Considerations |
---|---|
Hyper-V Replica |
|
Backup |
|
Notes:
To centrally manage and automate replication when running System Center 2012 R2 - Virtual Machine Manager you need to use Microsoft Azure Site Recovery.
To replicate virtual machines to Azure by using Microsoft Azure Site Recovery. Replicating a virtual machine to Azure is currently in preview mode.
Additional information:
Important:
Use the Hyper-V Replica Capacity Planner to understand the impact Hyper-V Replica will have on your network infrastructure; processor utilization on the primary, replica, and extended replica servers; memory usage on the primary and replica servers; and disk IOPS on the primary, replica, and extended replica servers that are based on your existing virtual machines.
Your workload might have a built-in disaster recovery solution, such as AlwaysOn Availability Groups in SQL Server. Consult with the workload documentation to confirm if Hyper-V Replica is supported by the workload.
Additional information:
System Center Data Protection Manager
Task 5: Define virtual machine types
To support the workloads in your environment, you might create virtual machines with unique resource requirements to meet the needs of every workload. Alternatively, you might take a similar approach to public providers of virtual machine hosting services (also referred to as Infrastructure-as-a-Service (IaaS).
See Virtual Machine and Cloud Service Sizes for Azure for a description of the virtual machine configurations offered by Microsoft Azure Infrastructure Services. As of this writing, the service supported 13 virtual machine configurations, each with different combinations of space for processor, memory, storage, and IOP.
Design decision - The decisions you make in all tasks of this step can be entered in the Virtual machine configs. worksheets.
Step 3: Plan for server virtualization host groups
Before you define individual server hosts, you may want to first define host groups. Host groups are simply a named collection of servers that are grouped together to meet the common goals that are outlined in the remaining tasks of this step.
Task 1: Define physical locations
You’ll likely group and manage hardware resources by physical location, so you’ll want to first define the locations that will contain fabric resources within your organization.
Task 2: Define host group types
You may create host groups for any number of reasons, such as to host workloads with specific:
Workload characterizations
Resource requirements
Service quality requirements
The following image illustrates an organization that has created five host groups in two locations.
Figure SEQ Figure \* ARABIC 2: Host group example
The organization created the host groups for the reasons outlined in the following table.
Host group |
Reasons for creating it |
---|---|
Stateless and stateful workload |
|
Accounting department stateful and stateless workloads |
Though the hardware configuration of the servers in this host group are the same as other stateless and stateful workload host groups in their environment, the Accounting department has applications that have higher security requirements than other departments in the organization. As a result, a dedicated host group was created for them so it could be secured differently than the other host groups in the fabric. |
Shared stateful workloads |
The workloads hosted by this host group require shared storage because they rely on Failover Clustering in Windows Server to maintain their availability. These workloads are hosted by a dedicated host group because the configuration of these virtual machines is different than the other virtual machines in the organization. |
High I/O stateful workloads |
All the hosts in this host group are connected to higher speed networks than the hosts in the other host groups. |
Though the organization could have spanned locations with their host groups, they chose to keep all members of each host group within the same location to ease their management. As you can see from this example, host groups can be created for a variety of reasons, and those reasons will vary across organizations. The more types of host groups you create in your organization, the more complex the environment will be to manage, which ultimately adds to the cost of hosting virtual machines.
Tip: The more standardized the server hardware is within a host group, the easier it will be to scale and maintain the host group over time. If you determine that you want to standardize the hardware within a host group, you can add the standardized configuration data to the Host groups worksheet in Virtualization Fabric Design Considerations Worksheets. For more information about physical hardware considerations, see Step 4: Plan for server virtualization hosts.
Consider that currently, most public cloud providers that host virtual machines:
Only host virtual machines that don’t require shared state.
Often only have one set of service quality metrics that they provide to all customers.
Do not dedicate specific hardware to specific customers.
We recommend that you start with one host group type that contains identical hardware, and only add additional host group types because the benefit of doing so outweighs the cost.
Task 3: Determine whether to cluster host group members
In the past, Failover Clustering in Windows Server was only used to increase server availability, but it has grown to provide significantly more functionality. Consider the information in the following table to help you decide whether you’ll want to cluster your host group members.
Option |
Advantages |
Disadvantages |
---|---|---|
Host group members are part of a failover cluster |
|
|
Host group members are not part of a failover cluster |
|
Virtual machines running on a host that fails must be manually (or you can use some form of automation) moved to a surviving host and restarted. |
Design decision - The decisions you make in all tasks of this step can be entered in the Settings worksheet.
Step 4: Plan for server virtualization hosts
In this step, you’ll define the types of hosts you’ll need to host the virtual machines you plan to run on your virtualization fabric. You will want limit the number of host configurations, in some cases to a single configuration, to ease procurement and support costs. Additionally, purchasing the wrong equipment will drive up the deployment costs.
Cloud Platform System
Microsoft brings its experience running some of the largest datacenters and cloud services into a factory-integrated and fully validated converged system. Cloud Platform System (CPS) combines Microsoft’s proven software stack of Windows Server 2012 R2, System Center 2012 R2, and Windows Azure Pack, with Dell’s cloud server, storage and networking hardware. As a scalable building block for your cloud, CPS shortens the time to value and enables a consistent cloud experience.
CPS provides a self-service, multi-tenant cloud environment for Platform-as-a-Service, Windows and Linux virtual machines, and includes optimized deployment packs for Microsoft applications like SQL Server, SharePoint, and Exchange. The factory integration decreases risk and complexity while accelerating deployment time from months to days. The simplified support process and automation of routine infrastructure tasks also frees up IT resources to focus on innovation.
For additional information, see the Cloud Platform System site.
Fast Track
Rather than designing your hardware (and software) configuration, you can purchase preconfigured hardware configurations from a variety of hardware partners through the Microsoft Private Cloud Fast Track program.
The Fast Track program is a joint effort between Microsoft and its hardware partners to deliver validated, preconfigured solutions that reduce the complexity and risk of implementing a virtualization fabric and the tools to manage it.
The Fast Track program provides flexibility of solutions and customer choice across hardware vendors’ technologies. It uses the core capabilities of the Windows Server operating system, Hyper-V technology, and Microsoft System Center to deliver the building blocks of a private cloud infrastructure as a service offering.
Additional information:
Microsoft Private Cloud Fast Track site
Task 1: Define compute configuration
In this task, you’ll determine the amount of memory, number of processors, and the version of Windows Server that are required for each host. The number of virtual machines to run on a host will be determined by the hardware components discussed in this section.
Note: To ensure that your solution is fully supported, all hardware that you purchase must carry the Certified for Windows Server logo for the version of Windows Server you are running.
The Certified for Windows Server logo demonstrates that a server system meets Microsoft’s highest technical bar for security, reliability and manageability. With other certified devices and drivers, it can support the roles, features, and interfaces for Cloud and Enterprise workloads and for business critical applications.
For the latest list of Certified for Windows Server hardware, see the Windows Server Catalog.
Task 1a: Define processor
Hyper-V presents the logical processors to each active virtual machine as one or more virtual processors. You can achieve additional run-time efficiency by using processors that support Second Level Address Translation (SLAT) technologies such as Extended Page Tables (EPTs) or Nested Page Tables (NPTs). Hyper-V in Windows Server 2012 R2 supports a maximum of 320 logical processors.
Considerations:
Workloads that are not processor intensive should be configured to use one virtual processor. Monitor host processor utilization over time to ensure that you’ve allocated processors for maximum effectiveness.
Workloads that are CPU intensive should be assigned two or more virtual processors. You can assign a maximum of 64 virtual processors to a virtual machine. The number of virtual processors recognized by the virtual machine is dependent on the guest operating system. For example, Windows Server 2008 with Service Pack 2 recognizes only four virtual processors.
Additional information:
Performance Tuning for Hyper-V Servers
Task 1b: Define memory
The physical server requires sufficient memory for the host and running virtual machines. The host requires memory to efficiently perform I/O on behalf of the virtual machines and operations such as a virtual machine checkpoint. Hyper-V ensures that sufficient memory is available to the host, and it allows remaining memory to be assigned to the virtual machines. Virtual machines should be sized based on the needs of the expected load for each virtual machine.
The hypervisor virtualizes the guest physical memory to isolate virtual machines from each other and to provide a contiguous, zero-based memory space for each guest operating system, the same as on non-virtualized systems. To ensure that you get maximum performance, use SLAT-based hardware to minimize the performance cost of memory virtualization.
Size your virtual machine memory as you typically do for server applications on a physical computer. The amount of memory assigned to the virtual machine should allow the virtual machine to reasonably handle the expected load at ordinary and peak times because insufficient memory can significantly increase response times and CPU or I/O usage.
Memory that has been allocated for a virtual machine reduces the amount of memory that is available to other virtual machines. If there is not enough available memory on the host, the virtual machine will not start.
Dynamic Memory enables you to attain higher consolidation numbers with improved reliability for restart operations. This can lead to lower costs, especially in environments that have many idle or low-load virtual machines, such as pooled VDI environments. Dynamic Memory run-time configuration changes can reduce downtime and provide increased agility to respond to requirement changes.
For more information about Dynamic Memory, see Task 1b: Define memory, which discusses how to determine memory for a virtual machine.
Additional information:
Task 1c: Define Windows Server operating system edition
The feature sets in Windows Server Standard and Windows Server Datacenter are exactly the same. Windows Server Datacenter provides an unlimited number of virtual machines. With Windows Server Standard, you are limited to two virtual machines.
In Windows Server 2012 R2, the Automatic Virtual Machine Activation (AVMA) feature was added. AVMA lets you install virtual machines on a properly activated server without having to manage product keys for each virtual machine, even in disconnected environments.
AVMA requires that the guest operating systems are running Windows Server 2012 R2 Datacenter, Windows Server 2012 R2 Standard, or Windows Server 2012 R2 Essentials. The following table compares the editions.
Edition |
Advantages |
Disadvantages |
---|---|---|
Standard |
|
Limited to two virtual machines |
Datacenter |
|
More expensive |
Hyper-V can be installed on a Server Core installation option of Windows Server. A Server Core installation reduces the space required on the disk, the potential attack surface, and especially the servicing requirements. A Server Core installation is managed by using the command line, Windows PowerShell, or by remote administration.
It is important to review the licensing terms of any software you are planning to use.
Microsoft Hyper-V Server
Microsoft Hyper-V Server provides a simple and reliable virtualization solution to help organizations improve their server utilization and reduce costs. It is a stand-alone product that contains only the Windows hypervisor, a Windows Server driver model, and virtualization components.
Hyper-V Server can fit into customers’ existing IT environments and leverage their existing provisioning, management processes, and support tools. It supports the same hardware compatibility list as the corresponding editions of Windows Server, and it integrates fully with Microsoft System Center and Windows technologies such as Windows Update, Active Directory, and Failover Clustering.
Hyper-V Server is a free download, and its installation is already-activated. However, every operating system that is running on a hosted virtual machine requires a proper license.
Additional information:
Automatic Virtual Machine Activation
Manage Hyper-V Server Remotely
Task 2: Define network configuration
In Step 2, Task 2 above, we discussed the design considerations for the virtual machine networking. Now we are going to discuss the networking consideration for the host. There are several types of network traffic that you must consider and plan for when you deploy Hyper-V. You should design your network configuration with the following goals in mind:
To ensure network QoS
To provide network redundancy
To isolate traffic to defined networks
Task 2a: Define network traffic types
When you deploy a Hyper-V cluster, you must plan for several types of network traffic. The following table summarizes the traffic types.
Traffic type |
Description |
---|---|
Management |
|
Cluster and CSVs |
|
Live migration |
Used for virtual machine live migration and shared nothing live migration |
Storage |
Used for SMB traffic or for iSCSI traffic |
Replica |
Used for virtual machine replication traffic through the Hyper-V Replica feature |
Virtual machine (tenant) traffic |
Note: See Step 2: Plan for virtual machine configuration for a list of virtual machine traffic types. |
Backup |
Used to back up virtual hard disk files |
Task 2b: Define network traffic performance options
Each network traffic type will have maximum and minimum bandwidth requirements and minimum latency requirements. Following are the strategies that can be used to meet different network performance requirements.
Policy-based QoS
When you deploy a Hyper-V cluster, you need a minimum of six traffic patterns or networks. Each network requires network redundancy. To start, you are talking about 12 network adapters in the host. It is possible to install multiple quad network adapters, but at some point you are going to run out of slots in your host.
Networking equipment is getting faster. Not so long ago, 1 GB network adapters were the top-of-the-line. 10 GB adapters in servers are becoming more common, and prices to support 10 GB infrastructures are becoming more reasonable.
Installing two 10 GB teamed network adapters provides more bandwidth than two quad 1 GB adapters, requires fewer switch ports, and simplifies your cabling needs. As you converge more of your network traffic types on the teamed 10 GB network adapters, policy-based QoS allows you to manage the network traffic to properly meet the need of your virtualization infrastructure.
Policy-based QoS enables you to specify network bandwidth control, based on application type, users, and computers. QoS policies allow you meet the service requirements of a workload or an application by measuring network bandwidth, detecting changing network conditions (such as congestion or availability of bandwidth), and prioritizing (or throttling) network traffic.
In addition to the ability to enforce maximum bandwidth, QoS policies in Windows Server 2012 R2 provide a new bandwidth management feature: minimum bandwidth. Unlike maximum bandwidth, which is a bandwidth cap, minimum bandwidth is a bandwidth floor, and it assigns a certain amount of bandwidth to a given type of traffic. You can simultaneously implement minimum and maximum bandwidth limits.
Advantages |
Disadvantages |
---|---|
|
|
Additional information:
Quality of Server (QoS) Overview
Policy-based Quality of Service
Data Center Bridging
Data Center Bridging (DCB), provides hardware-based bandwidth allocation to a specific type of traffic and enhances Ethernet transport reliability with the use of priority-based flow control. DCB is recommended when using FCoE and iSCSI.
Advantages |
Disadvantages |
---|---|
|
|
Additional information:
Data Center Bridging (DCB) Overview
SMB Direct
SMB Direct (SMB over remote direct memory access or RDMA) is a storage protocol in Windows Server 2012 R2. It enables direct memory-to-memory data transfers between the server and storage. It requires minimal CPU usage, and it uses standard RDMA-capable network adapters. This provides extremely fast responses to network requests, and as a result, this makes remote file storage response times on par with directly attached block storage.
Advantages |
Disadvantages |
---|---|
|
|
Receive Segment Coalescing
Receive segment coalescing (RSC) reduces CPU utilization for inbound network processing by offloading tasks from the CPU to an RSC-capable network adapter.
Advantages |
Disadvantages |
---|---|
|
|
Receive Side Scaling
Receive-side scaling (RSS) enables network adapters to distribute the kernel-mode network processing load across multiple processor cores in multiple core computers. The distribution of this processing makes it possible to support higher network traffic loads than would be possible if only a single core is used. RSS achieves this by spreading the network processing load across many processors and actively load balancing traffic that is terminated by the Transmission Control Protocol (TCP).
Advantages |
Disadvantages |
---|---|
|
|
SR-IOV
Hyper-V supports SR-IOV-capable network devices and allows the direct assignment of an SR-IOV virtual function of a physical network adapter to a virtual machine. This increases network throughput, reduces network latency, and reduces the host CPU overhead that is required for processing network traffic.
For additional information about SR-IOV, see Task 2b: Define network traffic performance options above.
Task 2c: Define network traffic high availability and bandwidth aggregation strategy
NIC Teaming, also known as load balancing and failover (LBFO), allows multiple network adapters to be placed into a team for the purposes of bandwidth aggregation and traffic failover. This helps maintain connectivity in the event of a network component failure.
This feature has been available from network adapter vendors. Introduced in Windows Server 2012, NIC Teaming is included as a feature in the Windows Server operating system.
NIC Teaming is compatible with all networking capabilities in Windows Server 2012 R2 with three exceptions:
SR-IOV
RDMA
802.1X authentication
From a scalability perspective, in Windows Server 2012 R2, a minimum of 1 and a maximum of 32 network adapters can be added to a single team. An unlimited number of teams can be created on a single host.
Additional information:
Microsoft Virtual Academy: NIC Teaming in Windows Server 2012
NIC Teaming (NetLBFO) Cmdlets in Windows PowerShell
Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management
Converged Data Center with File Server Storage
Task 2d: Define network traffic isolation and security strategy
Each network traffic type may have different security requirements for functions such as isolation and encryption. The following table lists the strategies that can be used to meet various security requirements.
Strategy |
Advantages |
Disadvantages |
---|---|---|
Encryption (IPsec) |
Traffic is secured while traversing the wire |
|
Separate physical networks |
Network is physically separated |
|
Virtual local area network (VLAN) |
|
|
Task 2e: Define virtual network adapters
With an understanding of the types of traffic required by the virtualization server hosts, and the performance, availability, and security strategies for the traffic, you can determine how many physical network adapters are required for each host and the types of network traffic that will be transmitted over each adapter.
Task 2f: Define virtual switches
To connect a virtual machine to a network, you need to connect the network adapter to a Hyper-V virtual switch.
There are three types of virtual switches that can be created in Hyper-V:
External virtual switch
Use an external virtual switch when you want to provide virtual machines with access to a physical network to communicate with externally located servers and clients. This type of virtual switch also allows virtual machines on the same host to communicate with each other. This type of network may also be available for use by the host operating system, depending on how you configure the networking.
Important: A physical network adapter can only be bound to one virtual switch at a time.
Internal virtual switch
Use an internal virtual switch when you want to allow communication between virtual machines on the same host and between virtual machines and the host operating system. This type of virtual switch is commonly used to build a test environment in which you need to connect to the virtual machines from the host operating system. An internal virtual switch is not bound to a physical network adapter. As a result, an internal virtual network is isolated from external network traffic.
Private virtual switch
Use a private virtual switch when you want to allow communication only between virtual machines on the same host. A private virtual switch is not bound to a physical network adapter. A private virtual switch is isolated from all external network traffic on the virtualization server, and from any network traffic between the host operating system and the external network. This type of network is useful when you need to create an isolated networking environment, such as an isolated test domain.
Note: Private and internal virtual switches do not benefit from hardware acceleration features that are available to a virtual machine that is connected to an external virtual switch
Design decision - The decisions you make in all the tasks of this step can be entered in the Virtualization hosts worksheets.
Tip: The name of virtual switches on different hosts that connect to the same network should have the same name. This eliminates confusion about which virtual switch a virtual machine should be connected to and it simplifies moving a virtual machine from one host to another. The Move-VM Windows PowerShell cmdlet will fail if the same virtual switch name is not found on the destination host.
Task 3: Define storage configuration
In addition to the storage required for the host operating system, each host requires access to storage where the virtual machine configuration files and virtual hard disks are stored. This task will focus on the virtual machine storage.
Task 3a: Define data types
The following are the sample data types you need to consider for your storage requirements.
Data type |
Storage location of data type |
---|---|
Host operating system files |
Typically on a local hard drive |
Host page file and crash dumps in Windows |
Typically on a local hard drive |
Failover cluster shared state |
Shared network storage or cluster shared volume |
Virtual hard disk files and virtual machine configuration file |
Typically on shared network storage or cluster shared volume |
The remainder of this step is focused on the storage required for the virtual machines.
Task 3b: Storage options
The following options are available for storing the virtual machine configuration files and virtual hard disks.
Option1: Direct-attached storage
Direct-attached storage refers to a computer storage system that is directly attached to your server, instead of being attached directly to a network. Direct-attached storage is not limited to only internal storage. It can also use an external disk enclosure that contains hard disk drives, including just-a-bunch-of-disks (JBOD) enclosures and enclosures that are connected through SAS or another disk controller.
Advantages |
Disadvantages |
---|---|
|
|
Option 2: Network-attached storage
Network-attached storage devices connect storage to a network where they are accessed through file shares. Unlike direct-attached storage, they are not directly attached to the computer.
Network-attached storage devices support Ethernet connections, and they typically allow an administrator to manage disk space, set disk quotas, provide security, and use checkpoint technologies. Network-attached storage devices support multiple protocols. These include network-attached file systems, Common Internet File Systems (CIFS), and Server Message Block (SMB).
Advantages |
Disadvantages |
---|---|
|
|
Option 3: Storage area network
A storage area network (SAN) is a dedicated network that allows you to share storage. A SAN consists of a storage device, the interconnecting network infrastructure (switches, host bus adapters, and cabling), and servers that are connected to this network. SAN devices provide continuous and fast access to large amounts of data. The communication and data transfer mechanism for a given deployment is commonly known as a storage fabric.
A SAN uses a separate network, and it is generally not accessible by other devices through the local area network. A SAN can be managed by using Storage Management Initiative Specification (SMI-S), Simple Network Management Protocol (SNMP), or a proprietary management protocol.
A SAN does not provide file abstraction, only block-level operations. The most common SAN protocols used are iSCSI, Fiber Channel, and Fiber Channel over Ethernet (FCoE). An SMI-S or a proprietary management protocol can deliver additional capabilities, such as disk zoning, disk mapping, LUN masking, and fault management.
Advantages |
Disadvantages |
---|---|
|
|
Option 4: Server Message Block 3.0 file shares
Hyper-V can store virtual machine files, such as configuration files, virtual hard disk files, and checkpoints, in file shares that use the Server Message Block (SMB) 3.0 protocol. The file shares will typically be on a scale-out file server to provide redundancy. When running a scale-out file server, if one nod is down, the file shares are still available from the other nodes in the scale-out file server.
Advantages |
Disadvantages |
---|---|
|
|
SMB Direct
SMB Direct works as part of the SMB file shares. SMB Direct requires network adapters and switches that support RDMA to provide full speed with low latency storage access. SMB Direct enables remote file servers to resemble local and direct-attached storage. In addition to the benefits of SMB, SMB Direct has the following advantages and disadvantages.
Advantages |
Disadvantages |
---|---|
|
|
Figure SEQ Figure \* ARABIC 3: Sample scale-out file server that uses converged networking with RDMA
Additional information:
Provide cost-effective storage for Hyper-V workloads by using Windows Server
Converged Data Center with File Server Storage
Task 3c: Define physical drive architecture types
The type of physical drive architecture that you select for your storage will impact the performance of your storage solution. For additional information about disk types, see Section 7.1 of Infrastructure-as-a-Service Product Line Architecture.
Task 3d: Define storage networking type
The storage controller or storage networking controller types that you use are determined by the storage option that you select for each host group. For more information, see Task 3b: Storage options.
Task 3e: Determine which storage type to use for each data type
With an understanding of your data types, you can now determine which storage option, storage controller, storage networking controller, and physical disk architectures best meet your requirements.
Design decision - The decisions you make in this task can be entered in the Virtualization hosts worksheet.
Additional information:
Networking configurations for Hyper-V over SMB in Windows Server 2012 and Windows Server 2012 R2
Windows Server 2012 Hyper-V Component Architecture Poster and Companion References
Task 4: Define server virtualization host scale units
Purchasing individual servers requires procurement, installation, and configuration for each server. Scale units enable you to purchase collections of servers (that typically contain identical hardware).They are preconfigured, which enables you to add capacity to the datacenter by adding scale units, rather than by adding individual servers.
The following image illustrates a scale unit that could have been purchased preconfigured from any number of hardware vendors. It includes a rack, an uninterruptable power supply (UPS), a pair of redundant network switches for the servers contained within the rack, and ten servers.
Figure SEQ Figure \* ARABIC 4: Example of a virtualization server host scale unit
The scale unit comes preconfigured and pre-cabled to the UPS and network switches. The unit simply needs to be added to a datacenter, plugged into electrical power, and connected to the network and storage. Then it is ready to be used. If the individual components were not purchased as a scale unit, the purchaser would need to rack and wire all of the components.
Design decision - If you decide to use server virtualization host scale units, you can define the hardware for your virtualization host scale units in the Host scale units worksheet.
Tip: You can purchase preconfigured scale units from a variety of Microsoft hardware partners through the Microsoft Private Cloud Fast Track program.
Task 5: Define server virtualization host availability strategy
Virtualization server hosts may become unavailable for planned reasons (such as maintenance) or unplanned reasons. Following are some strategies that can be used for both.
Planned
You can use live migration to move the virtual machines from one host to another host. This requires no downtime for virtual machines.
Unplanned
This scenario depends on the workload characterization types that the host is hosting.
For shared stateful workloads, use Failover Clustering within the virtual machines.
For stateful workloads, run as a high availability virtual machine on a Hyper-V cluster.
For stateless workloads, start new instances manually or through some automated means.
If you are using Failover Clustering in Windows Server with Hyper-V, consider whether to use the features listed in the following table. For additional information about each feature, click the hyperlink.
Functionality |
Considerations |
---|---|
Monitor a virtual machine for failures in networking and storage that are not monitored by the Failover Clustering service. |
|
|
|
Virtual machine anti-affinity |
Set anti-affinity for virtual machines that you do not want to run on the same node in a Hyper-V cluster. This could be for virtual machines that provide redundant service or are part of guest virtual machine cluster. Note: Anti-affinity settings are configured by using Windows PowerShell. |
|
|
|
|
Another reason to set virtual machine priority is that the cluster service can take offline a lower priority virtual machine when a high-priority virtual machine does not have the necessary memory and other resources to start.
|
Note: Hyper-V clusters can have a maximum of 64 nodes and 8,000 virtual machines.
Step 5: Plan for virtualization fabric architecture concepts
This step requires defining logical concepts to which the fabric architecture will align.
Task 1: Define maintenance domains
Maintenance domains are logical collections of servers that are serviced together. Servicing may include hardware or software upgrades or configuration changes. Maintenance domains typically span host groups of each type or within each location, though they don’t have to. The purpose is to prevent server maintenance from adversely impacting any consumers’ workloads.
Note: This concept applies to physical network and storage components.
Task 2: Define physical fault domains
Groups of virtualization server hosts often fail together as the result of a failed shared infrastructure component, such as a network switch or uninterruptable power supply (UPS). Physical fault domains help support resiliency within the virtualization fabric. It is important to understand how a fault domain impacts each of the host groups you defined for your fabric.
Note: This concept applies to physical network and storage components.
Consider the example in the following image, which overlays maintenance and physical fault domains over a collection of host groups within a datacenter.
Figure SEQ Figure \* ARABIC 5: Example of a maintenance and physical fault domain definition
In this example, each rack of servers is defined as a separate, numbered physical fault domain. This is because each rack contains a network switch at the top and a UPS at the bottom. All servers within the rack rely on these two components, and if either fails, all servers in the rack effectively fail.
Because all servers within a rack are also members of unique host groups, this design would mean that there is no mitigation in the event of a failure of any of the physical fault domains. To mitigate the issues, you could add physical fault domains of each host group type. In smaller scale environments, you could potentially add redundant switch and power supplies in each rack, or use Failover Clustering for virtualization server hosts across physical fault domains.
In Figure 5, each of the colored, dashed-line boxes defines a maintenance domain (they are labeled MD 1 through 5). Note how each of the servers in the load-balanced cluster of virtual machines is hosted on a server virtualization host that is contained within a separate maintenance domain and a separate physical fault domain.
This enables the fabric administrator to take down all virtualization server hosts within a maintenance domain without significantly impacting applications that have multiple servers spread across maintenance domains. It also means that the application running on the load-balanced cluster is not completely unavailable if a physical fault domain fails.
Design decision - The decisions you make for Tasks 1 and 2 can be entered in the Settings worksheet.
Task 3: Define reserve capacity
The failure of individual servers in the fabric is inevitable. The fabric design needs to accommodate individual server failure, just as it accommodates failures of collections of servers in fault and maintenance domains. The following illustration is the same as Figure 5, but it uses red to identify three failed servers.
Figure SEQ Figure \* ARABIC 6: Failed servers
In Figure 6, server virtualization hosts have failed in the following host groups, maintenance domains, and physical fault domains.
Host group |
Physical fault domain |
Maintenance domain |
---|---|---|
2 |
2 |
3 |
3 |
3 |
2 |
4 |
4 |
2 |
The application running on the load-balanced cluster is still available, even though the host in Physical fault domain 2 has failed, but the application will operate at a third less capacity.
Consider what would happen if the server virtualization host that hosted one of the virtual machines in Physical fault domain 3 also failed, or if Maintenance domain 2 was taken down for maintenance. In these cases, the capacity for the application would decrease by 2/3.
You may decide that’s unacceptable for your virtualization fabric. To mitigate the impact of failed servers, you can ensure that each of your physical fault domains and maintenance domains have enough reserve capacity so that capacity will never drop below the acceptable level that you define.
For more information about calculating reserve capacity, see Reserve Capacity in Cloud Services Foundation Reference Architecture – Principles, Concepts, and Patterns.
Step 6: Plan for initial capability characteristics
After completing all of the tasks in this document, you will be able to determine the initial costs to host virtual machines and storage on the fabric, in addition to the initial service quality levels that the fabric can meet. You won’t be able to finalize either of these tasks, however, until you implement your fabric management tools and human resources, which are discussed in the Next Steps section of this document.
Task 1: Define initial SLA metrics for storage and virtual machines
As a fabric administrator, you’ll probably define a service level agreement (SLA) that details the service quality metrics that the fabric will meet. Your virtual machine administrators will need to know this to plan how they’ll use the fabric.
At a minimum, this will likely include an availability metric, but it may also include other metrics. To get an idea of a baseline for virtualization fabric SLA metrics, you can review those offered by public cloud providers such as Microsoft Azure. For virtual machine hosting, that SLA guarantees that when a customer deploy two or more instances of a virtual machine running the same workload, and deploys those instances in different fault and upgrade domains (referred to as “maintenance domains” in this document), at least one of those virtual machines will be available 99.95% of the time.
For a full description of the Azure SLA, please see Service Level Agreements. Optimally, your virtualization fabric will meet or exceed those of public cloud providers.
Task 2: Define initial costs to host storage and virtual machines
With your fabric designed, you’ll also be able to calculate:
The hardware, space, power, and cooling costs of the fabric
The hosting capacity of the fabric
This information, combined with your other costs, such as the cost of your fabric management tools and human resources, will enable you to determine your final costs to host virtual machines and storage.
To get an idea of the baseline costs for virtual machines and storage, you can review the hosting costs of public cloud providers such as Microsoft Azure. For more information, see Virtual Machine Pricing Details.
Although not always the case, you will typically find that your hosting costs are higher than those of public providers because your fabric will be much smaller than the fabrics of large public providers who are able to attain volume discounts on hardware, datacenter space, and power.
Next steps
After you complete all the tasks in this document, you’ll have a fabric design that meets your organization’s requirements. You’ll also have an initial service characteristic definition that includes the costs and service-level metrics. You won’t be able to determine your final service-level metrics and costs until you determine the human resources costs and the management tools and processes that you’ll use for your fabric.
Microsoft System Center 2012 provides a comprehensive set of functionality to enable you to provision, monitor, and maintain your virtualization fabric. You can learn more about how to use System Center for fabric management by reading the following resources: