Before the cloud
Now that we have defined what cloud computing is, let us look at examples of how computing was utilized in different domains such as business computing, scientific computing, and personal computing before the emergence of cloud computing.
Business computing: Business computing frequently involves the use of management-information systems that drive logistics and operations, enterprise resource planning (ERP), customer relation management (CRM), office productivity, and business intelligence (BI). Such tools enabled more streamlined processes that led to improved productivity and reduced cost across a variety of enterprises.
As an example, CRM software allows companies to collect, store, manage, and interpret a variety of data about past, current, and potential future customers. CRM software offers an integrated view (in real-time or near real-time) of all organizational interactions with customers. For a manufacturing company, CRM software could be used by a sales team to schedule meetings, tasks, and follow-ups with clients. A marketing team could target clients with campaigns based on specific patterns. Billing teams can track quotes and invoices. As such, it is a centralized repository for storing this information. To enable this functionality, a variety of hardware and software technologies are utilized by the organization and sales teams in order to collect the data which needs to be stored and analyzed using various database and analytics systems.
Scientific computing: Scientific computing uses mathematical models and analysis techniques implemented on computers to attempt to solve scientific problems. A popular example is computer simulation of physical phenomena -- for example, using climate models to predict the weather. This field has disrupted the traditional theoretical and laboratory experimental methods by enabling scientists and engineers to reconstruct known events or to predict future situations by developing programs to simulate and study different systems under different circumstances. Such simulations typically require a very large number of calculations which are often run on expensive supercomputers or distributed computing platforms.
Personal computing: In personal computing, a user runs various applications on a general-purpose personal computer (PC). Examples of such applications include word processors and other office-productivity software, communication apps such as email clients, and entertainment software such as video games and media players. A PC user typically owns, installs, and maintains the software and hardware utilized to carry out such tasks.
Addressing Scale
One of the longstanding challenges in IT is scaling compute resources to meet demand -- for example, to increase a web site's capacity to meet the needs of a growing customer base or to handle "burst" loads during peak business hours or special events that draw massive audiences.
Increasing scale in computing has been an ongoing process, whether in increasing the number of customers and events to capture, monitor and analyze in CRM, or increasing the precision of numerical simulations in scientific computing or the realism in video gaming applications. Furthermore, the need for larger scale has been driven by the increase in the adoption of technology by various domains or the expansion of businesses and markets as well as the ongoing increase in the number of users and their needs. Organizations must account for the increase in scale as they plan and budget for the deployment of their solutions.
Organizations typically plan their IT infrastructure in a process called capacity planning. During the capacity planning process, the growth in usage of various IT services is gauged and used as a benchmark for future expansion. Organizations have to plan in advance to procure, set up, and maintain newer and better servers, storage devices, and networking equipment. Sometimes organizations are limited by software, because they may have procured a limited set of licenses and may require more to expand the infrastructure to cover a larger set of users.
The most basic form of scaling is known as vertical scaling, whereby old systems are replaced with newer, better-performing systems with faster CPUs and more memory and disk space. In many cases, vertical scaling consists of upgrading or replacing servers and adding capacity to storage arrays. This process can take months to plan and execute, and frequently requires brief periods of downtime as upgrades are rolled out.
Scaling can also be done horizontally by increasing or decreasing the number of resources dedicated to the system. An example of this is in high-performance computing, where additional servers and storage capacity can be added to existing clusters, thereby increasing the number of calculations that can be performed per second. Another example involves Web farms -- clusters of servers that host web sites and web applications -- where additional servers can be brought online to handle increased traffic. Just like vertical scaling, this process can take months to plan and execute, with downtimes also a possibility.
Since companies owned and maintained their IT equipment, as the cost of addressing scale continued to rise, companies identified other methods to reduce cost. Large companies consolidated the computing needs of different departments into a single large data center whereby they consolidated real estate, power, cooling, and networking in order to reduce cost. On the other hand, small and medium-size companies could lease real estate, network, power, cooling, and physical security by placing their IT equipment in a shared data center. This is typically referred to as a co-location service which was adopted by small to medium-sized companies who did not want to build their own data centers in-house. Co-location services continue to be adopted in various domains as a cost-effective approach to reduce operational expenses.
Business computing has addressed scale through vertical and horizontal scaling as well as consolidation of IT resources to data centers and co-location. In scientific computing, parallel and distributed systems have been adopted in order to scale up the size of the problems and the precision of their numerical simulations. One definition of parallel processing is the use of multiple homogenous computers that share state and function as a single large computer in order to run large scale or high precision calculations. Distributed computing is the use of multiple autonomous computing systems connected by a network in order to partition a large problem into subtasks that execute concurrently and communicate via messages over the network. The scientific community continued to innovate in these domains in order to address scale. In personal computing, scale has had an impact through increased demands brought on by richer content and memory-hungry applications. Users replace their PCs with newer, faster models or upgrade existing models to keep up with these demands.
Rise of Internet Services
The late 1990s marked a steady increase in the adoption of these computing applications and platforms across domains. Soon, software was expected to not only be functional, but also capable of producing value and insight for business and personal requirements. The use of these applications became collaborative; applications were mixed and matched to feed information to each other. IT was no longer just a cost center for a company, but a source of innovation and efficiency.
Figure 1.2: Comparing Traditional and Internet-Scale Computing.
The 21st century has been marked by an explosion in the volume and capacity of wireless communications, the World Wide Web, and the Internet. These changes have led to a network- and data-driven society, where producing, disseminating, and accessing digitized information is simplified. The Internet has created a global marketplace of more than 4 billion users. This rise in data and connections is valuable to businesses. Data creates value in several ways, including by enabling experimentation, segmenting populations, and supporting decision-making with automation1. By embracing digital technologies, the world's top 10 economies were expected to increase their output by more than a trillion dollars between 2015 and 2020.
The increasing number of connections enabled by the Internet has also driven its value. Researchers have hypothesized that the value of a network varies super-linearly as a function of the number of users. Thus, at Internet scale, gaining and retaining customers is a priority, and this is done by building reliable and responsive services, and making changes based on observed data patterns.
Some examples of Internet-scale systems include:
- Search engines that crawl, store, index, and search large (up to petabyte-sized) data sets. For instance, Google started as a giant web index that crawled and analyzed web traffic once every few days and matched these indices to keywords. Now, it updates its indices in near-real-time and is one of the most popular ways to access information on the Internet. Their index has trillions of pages with a size of thousands of terabytes3.
- Social networks like Facebook and LinkedIn that allow users to create personal and professional relationships and build communities based on similar interests. Facebook, for instance, now boasts more than 2 billion active users per month.
- Online retail services like Amazon maintain an inventory of millions of products and serve a global user base. In 2017, Amazon's online retail operation achieved net sales of $178 billion, up 31% from the year before.
- Rich, streaming multimedia applications allow people to watch and share videos and other forms of rich content. One such example, YouTube, serves up 5 billion videos per day and has 300 minutes of video uploaded to it every second.
- Real-time communications systems for audio, video, and text chatting like Skype which clock more than 50 billion minutes of calls per month.
- Productivity and collaboration suites that serve millions of documents to many concurrent users allowing real-time, persistent updates. For example, Microsoft 365 serves about 60 million active users each month.
- CRM applications by providers like SalesForce are deployed at over a hundred thousand organizations. Large CRMs now provide intuitive dashboards to track status, analytics to find the customers that generate the most business and revenue forecasting to predict future growth.
- Data mining and business intelligence applications that analyze the usage of other services (like those above) to find inefficiencies and opportunities for monetization.
Clearly, these systems are expected to deal with a high volume of concurrent users. This requires an infrastructure with the capacity to handle large amounts of network traffic, generate and securely store data, all without any noticeable delays. These services derive their value by providing a constant and reliable standard of quality. They also provide rich user interfaces for mobile devices and web browsers, making them easy to use but harder to build and maintain.
We summarize some of the requirements of Internet-scale systems here:
- Ubiquity --- Being accessible from anywhere at any time, from a multitude of devices. For instance, a salesperson will expect their CRM service to provide timely updates on a mobile device to make visits to clients shorter, faster, and more effective. The service should function smoothly under a variety of network connections.
- High availability --- The service must be "always up." Uptimes are measured in terms of number of nines. Three nines, or 99.9%, implies that a service will be unavailable for 9 hours a year. Five nines (about 6 minutes a year) is a typical threshold for a high-availability service. Even a few minutes of downtime in online retail applications can impact millions of dollars of sales.
- Low latency --- Fast and responsive access times. Even slightly slower page-load times have been shown to significantly reduce the usage of the affected web page. For instance, increasing search latency from 100 milliseconds to 400 milliseconds decreases the number of searches per user from 0.8% to 0.6%, and the change persists even after latency is restored to original levels.
- Scalability --- The ability to deal with variable loads resulting from seasonality and virality, which causes peaks and troughs in the traffic over long and short periods of time. On days such as "Black Friday" and "Cyber Monday", retailers such as Amazon and Walmart receive several times the network traffic than on average.
- Cost effectiveness --- An Internet-scale service requires significantly more infrastructure than a traditional application as well as better management[4]. One way to streamline costs is by making services easier to manage and reducing the number of administrators handling a service. Smaller services can afford to have a low service-to-admin ratio (for example, 2:1, meaning a single administrator must maintain two services). However, to maintain profitability, services like Microsoft Bing must have high service-to-admin ratio (for example, 2500:1, meaning a single administrator maintains 2500 services).
- Interoperability --- Many of these services are often used together and hence must provide an easy interface for reuse and support standardized mechanisms for importing and exporting data. For example, services such as Uber may integrate Google Maps into their products to provide simplified location and navigation information to users.
We will now explore some of the early solutions to the various problems above. The first challenge to be tackled was the large round-trip time for early web services that were mostly located in the United States. The earliest mechanisms to deal with the problems of low latency (due to distant servers) and server failure simply relied on redundancy. One technique for achieving this was by "mirroring" content, whereby copies of popular web pages would be stored at different locations around the world. This minimized the amount of load on the central server, reduced the latency experienced by end users, and allowed traffic to be switched over to another server in case of failures. The downside was an increase in complexity to deal with inconsistencies if even one copy of the data were to be modified. Thus, this technique is more useful for static, read-heavy workloads, such as serving images, videos, or music. Due to the effectiveness of this technique, most Internet-scale services use content delivery networks (CDNs) to store distributed global caches of popular content. For example, Cable News Network (CNN) now maintains replicas of their videos on multiple "edge" servers at different locations worldwide, with personalized advertising per location.
Of course, it did not always make sense for individual companies to buy dozens of servers across the world. Cost efficiencies were often gained by using shared hosting services. Here, shares of a single web server would be leased out to multiple tenants, amortizing the cost of server maintenance. Shared hosting services could be highly resource-efficient, as the resources could be over-provisioned under the assumption that not all services would be operating at peak capacity at the same time. (An over-provisioned physical server is one where the aggregate capacity of all the tenants is greater than the actual capacity of the server.) The downside was that it was nearly impossible to isolate the tenants' services from those of their neighbors. Thus, a single overloaded or error-prone service could adversely impact all its neighbors. Another problem arose because tenants could often be malicious and try to leverage their co-location advantage to steal data or deny service to other users.
To counter this, virtual private servers were developed as variants of the shared hosting model. A tenant would be provided with a virtual machine (VM) on a shared server. These VMs were often statically allocated and linked to a single physical machine, which meant they were difficult to scale and often needed manual recovery from any failures. Though they could no longer be overprovisioned, they had better performance and security isolation between co-located services than simple resource sharing.
Another problem of sharing public resources was that it required storing private data on third-party infrastructure. Some of the Internet-scale services described above could not afford to lose control over data storage, since any disclosure of their customers private data would have disastrous consequences. Hence, these companies needed to build their own global infrastructure. Before the advent of the public cloud, such services could only be deployed by large corporations like Google and Amazon. Each of these companies would build large, homogeneous data centers across the globe using off-the-shelf components, where a data center could be thought of as a single, massive warehouse-scale computer (WSC). A WSC provided an easy abstraction to globally distribute applications and data, while still maintaining ownership.
Due to the economies of scale, the utilization of a data center could be optimized to reduce costs. Even though this was still not as efficient as publicly sharing resources (the cloud), these warehouse-scale computers had many desirable properties that served as foundations for building Internet-scale services. The scale of computing applications progressed from serving a fixed user base to serving a dynamic global population. Standardized WSCs allowed large companies to serve such large audiences. An ideal infrastructure would combine the performance and reliability of a WSC with the sharing hosting model. This would enable even a small corporation to develop and launch a globally competitive application, without the high overhead of building large data centers.
Another approach to share resources was grid computing, which enabled the sharing of autonomous computing systems across institutions and geographical locations. Several academic and scientific institutions would collaborate and pool their resources in pursuit of a common goal. Each institution would then join a "virtual organization" by dedicating a specific set resources via well-defined sharing rules. Resources would often be heterogeneous and loosely coupled, requiring complex programming constructs to stitch together. Grids were geared towards supporting non-commercial research and academic projects and relied on existing open source technologies.
The cloud was a logical successor that combined many of the features of the solutions above. For example, instead of universities contributing and sharing access to a pool of resources using a grid, the cloud allows them to lease computing infrastructure that is centrally administered by a cloud service provider. As the central provider maintained a large resource pool to satisfy all clients, the cloud made it easier to dynamically scale up and down demand in a short period of time. Instead of open standards like the grid, however, cloud computing relies on proprietary protocols and requires the user to place a certain level of trust in the CSP.
References
- IBM (2017). What is big data? https://www.ibm.com/analytics/hadoop/big-data-analytics
- Google Inc. (2015). How Search Works. https://www.google.com/insidesearch/howsearchworks/thestory/
- Hamilton, James R and others (2007). On Designing and Deploying Internet-Scale Services