Rethinking Enterprise Storage - Putting the Pieces All Together
Recently, Microsoft published a book titled Rethinking Enterprise Storage – A Hybrid Cloud Model – the book takes a close look at an innovative infrastructure storage architecture called hybrid cloud storage.
Last week we published experts from Chapter 5. This week we provide an excerpt from Chapter 6, Putting All the Pieces Together. Over the next several weeks on this blog, we will continue to publish excerpts from each chapter of this book via a series of posts. We think this is valuable information for all IT professionals, from executives responsible for determining IT strategies to administrators who manage systems and storage. We would love to hear from you and we encourage your comments, questions and suggestions.
As you read this material, we also want to remind you that the Microsoft StorSimple 8000 series provides to customers innovative and game-changing hybrid cloud storage architecture and it is quickly becoming a standard for many global corporations who are deploying hybrid cloud storage. You can learn more about the StorSimple 8000 series here: www.microsoft.com/storsimple
Here are the chapters we will excerpt in this blog series:
Chapter 1 Rethinking enterprise storage
Chapter 2 Leapfrogging backup with cloud snapshots
Chapter 3 Accelerating and broadening disaster recovery protection
Chapter 4 Taming the capacity monster
Chapter 5 Archiving data with the hybrid cloud
Chapter 6 Putting all the pieces together
Chapter 7 Imagining the possibilities with hybrid cloud storage
That’s a Wrap! Summary and glossary of terms for hybrid cloud storage
So, without further ado, here is an excerpt from Chapter 6 of Rethinking Enterprise Storage – A Hybrid Cloud Model
Chapters 2-5 discussed how the Microsoft hybrid cloud storage (HCS) solution protects data, provides disaster recovery (DR) protection, manages storage capacity, and archives data. This chapter explains how it does all of this by leveraging fingerprints as data objects that are accessed directly on Microsoft Azure Storage and on-premises.
It’s important to have realistic performance expectations for enterprise storage and the Microsoft HCS solution is no exception. Its scale-across architecture delivers excellent performance while accommodating the enormously high latencies that cloud data transfers have. But, like all other storage solutions, it has strengths and weaknesses that need to be understood. The performance discussion in this chapter will give readers a way to understand the types of workloads that work best for the solution. To close the chapter, a number of use cases are discussed that show the versatility of the Microsoft HCS solution.
The comprehensive storage functionality of the Microsoft HCS solution is enabled by a data structure that follows the hybrid cloud management model discussed in Chapter 1. The Cloud-integrated Storage (CiS) system on-premises is much more than an edge device or gateway that transfers data between two dissimilar environments; it divides common storage tasks between on-premises and cloud storage. The CiS system and Microsoft Azure Storage assume complementary roles as they exchange data and management information across the hybrid cloud boundary. The CiS system manages the elements and operations that are pertinent to on-premises storage and Microsoft Azure Storage provides the application programming interfaces (APIs) and services for storing and protecting data off-site in the cloud. To get a better understanding of how this works, we return to an examination of fingerprints.
FIGURE 6-1 Servers access the Microsoft HCS solution through an iSCSI network. Primary storage, secondary archives storage and local snapshots are located on the storage system. Cloud snapshots, DR and tiered primary and secondary archive storage are located in Azure Storage. The data on Microsoft Azure Storage is also stored in a remote Azure data center through the use of geo-replication services in Azure.
Implementing any storage solution requires a solid understanding of its performance capabilities, strengths, and limitations. The Microsoft HCS solution is no different and its scale-across design results in some unique performance characteristics.
Even though there are high-performance components, such as NV-RAM and flash SSDs in the CiS system, the Microsoft HCS solution was designed primarily to be a solution for data growth problems. The cloud-as-a-tier storage function emphasizes this design goal by putting dormant data in the cloud where access times are many orders of magnitude slower than in the flash SSD tier. In addition to storing data, the CiS system uses NV-RAM and SSDs for its dedupe processes, leveraging performance components to create a capacity benefit.
The best performance with the Microsoft HCS solution is achieved when the least amount of data needs to be downloaded from cloud storage. In other words, the working set of the data easily fits within the capacity resources of the system.
The versatility of the Microsoft HCS solution allows IT teams to use it many different ways. The sections that follow describe a number of them.
A common scenario for implementing the Microsoft HCS solution occurs when an IT team is refreshing file-server storage that is running out of capacity. These older storage systems may be supporting legacy applications that are still needed by the organization, but are not heavily used and have diminishing value. They often contain a lot of dormant data but the IT team doesn’t have the time to find it and archive it. Sometimes this storage is comprised of internal server disks and sometimes it is part of an older SAN architecture. The IT team knows they need to improve their ability to manage storage, but they want to reduce their storage costs instead of increasing them.
Instead of buying a traditional storage system that has more capacity than they need, the IT team can deploy a Microsoft HCS solution and migrate the storage volumes from their existing file server storage systems onto the CiS system. Migrations can be accomplished by copying files from one volume to another using server software utilities, or by using SVMotion in VMware environments or Storage Live Migration in Hyper-V environments.
Chapter 3, “Accelerating and broadening disaster recovery protection,” was largely devoted to the problems IT teams have with DR and how the Microsoft HCS solution improves the situation. To summarize briefly, data growth is making an already bad situation worse because the amount of data that needs to be protected and later restored is too large using existing technologies and processes. Backup processing can’t be counted on to complete within the backup window, which means data might not be restorable. Many IT teams are struggling with how to protect a swelling volume of data across their systems and applications.
IT teams recognize that their ability to recover is impaired. In many cases, they cannot test their recovery plans because the disruption to production operations would be too great, or if they do test them, they encounter too many problems and can’t finish. As a result, they can only guess what the recovery time objectives (RTOs) or recovery point objectives (RPOs) might be for their applications, and although they find this situation unacceptable, they don’t know what to do about it.
The Microsoft HCS solution gives the IT team a new, efficient DR tool. Cloud snapshots are much more efficient than other data protection technologies, completing the task in much less time and requiring far less administrative effort. Just as importantly, successful recovery tests can be conducted with a relatively small amount of hardware and minimal interruptions to production operations. The CiS system used for the test requires an adequate Internet connection to access fingerprints and from there on, deterministic restores ensure that only the metadata map and working sets are downloaded to establish realistic RPOs and RTOs.
Microsoft SharePoint is used in many organizations as a way to share and exchange files. SharePoint files are stored in a Microsoft SQL Server database that becomes larger as more employees share data this way. Eventually, the database may become so large that backup and recovery of SharePoint data can become a problem for the IT team.
The Microsoft HCS solution addresses SharePoint backup and recovery by externalizing binary large object (BLOB) storage to the CiS system. This means the data files in the SharePoint database are relocated to the CiS system, where they are referenced by a link using a SharePoint API. Emptied of large data objects, backup and recovery is much faster.
Another benefit of externalizing BLOB storage with CiS systems is that BLOBs that become dormant will be tiered to Microsoft Azure Storage. Tiered BLOBs can be accessed again quickly and transparently the next time somebody wants to use them, but they no longer contribute to data growth problems on-premises.
Enterprise document management software helps organizations manage projects by automating workflows and organizing large numbers of related documents on file servers. Document management repositories have a tendency to become quite large and require a great deal of storage capacity. Aging project data is rarely accessed, but the organization may be required to keep it in online to comply with contract terms and government regulations. Off-site copies may also be required, which are made using time-consuming manual processes or, more expensively, with storage replication. The storage costs for document management can be high for a function that is largely historical record keeping.
Document management is a great example of a management function that can benefit from primary storage dedupe. Many of the files are derivatives of other files with slight modifications made for different aspects of the project. With so much commonality in the data, dedupe ratios can be very good. The Microsoft HCS solution dedupes primary storage and provides significant capacity relief for storing documents. The cloud storage-as-a-tier feature of the Microsoft HCS solution allows capacity to be expanded transparently to Microsoft Azure Storage. The scale-across architecture of the Microsoft HCS solution is an excellent match for the capacity needs of enterprise document management.
As discussed in Chapter 5, “Archiving data with the hybrid cloud,” all data stored on CiS systems can be configured to have long-term copies made periodically by cloud snapshots for archiving purposes. This establishes a blanket of coverage that the IT team can automate for all data stored in the Microsoft HCS solution.
The caveat for this is that data needs to be on the CiS system when a long-term cloud snapshot is run. Data that is deleted before then would only be held in Windows Azure Storage until the expiration of any cloud snapshots that reference it. The IT team might want to implement best practices that ensure data is not deleted prior to the next long-term cloud snapshot operation. For example, if the IT team sets up long-term cloud snapshots to run on the last day of the month, they would want to implement policies and take measures to keep data available for archiving purposes for a minimum of 31 days.
Archiving software stores data for historical purposes and creates metadata and indices for locating data that might be needed in the future. The archived data is typically stored on primary storage that is protected by backup or remote replication technology. Some IT teams want to continue to use their existing primary storage for recently archived data, but want to migrate older archives to secondary storage that costs less to own and operate.
The Microsoft HCS solution provides additional online capacity for archived data through its cloud-as-a-tier feature. Just as document management files tend to be derivatives of other files, archived data can also contain a high number of derivative files, which means primary dedupe can effectively reduce the capacity consumed. Archived data that is migrated to the Microsoft HCS solution can be protected by cloud snapshots with long-term data retention periods that ensure availability on Microsoft Azure Storage for many years into the future. Archived data stored on Microsoft Azure Storage is replicated three times across different fault domains in an Azure data center and the option of using geo-replication exists for additional protection.
Organizations that develop technology tend to make extensive use of server virtualization technology. VMs allow developers to experiment with different ideas and to quickly and inexpensively test ideas. While VMs are virtually free to setup, they can have a very real cost in storage capacity long after they are no longer being used. VMs that are not in use consume no processer resources, but if their storage volumes were not deleted, they continue to consume storage capacity. The IT team that supports the development team might not know if dormant volumes are important or not. Problems can occur when the primary storage they are on nears its capacity limits and the IT team has to make uninformed decisions about which ones to remove.
The Microsoft HCS solution can be used to support this type of development environment, either as primary storage for the VMs or as secondary storage that the IT team migrates VM volumes to. Either way, there will likely be good dedupe ratios due to the commonality between operating environments and applications across VMs. If the IT team ever wanted to move a VM volume back to its original or another storage system, they could use SVMotion or Storage Live Migration to accomplish the task.
To learn more about Microsoft HCS use cases visit www.microsoft.com/storsimple and be sure to download your copy of Rethinking Enterprise Storage: A Hybrid Cloud Model In order to