In the first part of this series on the Cloud, we looked at what “cloud computing” really means and explored some of the history of virtualisation how it was the catalyst in the huge growth in SaaS companies. In this article, we will be looking in a bit more depth at data centres which can be thought of as the home of the cloud.
A bit of history
During the dotcom boom 20 years ago, companies had only two choices for hosting their content. Either they could lease or buy servers from a hosting company or they could install their own rack of servers. Both of these options were expensive, inflexible and risky. For larger companies, there was no choice but to install your own server farm with all the risks and costs that went with them.
With the emergence of virtualisation and commercially-available hypervisors, large companies began to use virtualisation to improve the performance of their server farms. Many hosting companies realised that this was the future and started to follow suit. Over time these server farms began to morph into what we now call data centres.
At the same time, companies such as Google began to exploit virtualisation to build huge data centres to run their online services. In 2009, two of Google’s data centre architects wrote a seminal paper entitled “The Datacenter as a Computer”. In this paper, they introduced the concept of warehouse-scale computers where thousands of servers with access to Terabytes of storage act as effectively a single huge computer.
The rise of the data centre
Essentially, a data centre is just a large warehouse full of rack after rack of servers. Some modern data centres host hundreds of thousands of multi-core servers connected by internal networks that run at Terabits per second. As you can imagine, such large numbers of servers consume enormous amounts of power and can generate astonishing amounts of heat. Even a relatively small data centre can draw 50MW of power (enough to power a small town). From the outside, often the only clue that a warehouse is actually a data centre is the presence of excessive amounts of cooling.
Over time, the internal design of data centres has been perpetually evolving. In their modern iterations, data centres tend to fall into two categories – those that are used for cloud services and those that are used by online services such as Facebook, Google or Bing, or, to use the current terminology: multi-tenant data centres or single tenant data centres. While they are split into these two categories, there are relatively few differences in terms of architecture. Most data centres are divided internally into pods (both for efficient cooling and to simplify the networking and cabling). Each pod hosts several racks of servers and are often organised internally with hot and cold aisles for efficient cooling.
The benefits of massive scale
One of the interesting effects of this sort of scale is a shift in how operators view reliability. In the days when a company might have a few dozen servers, they would often pay a huge premium to get the highest possible quality components, especially storage and cooling fans. However, even the best quality server components only have a life-span of perhaps 10 years. Or to put it another way, they are expected to fail once every 90,000 hours. If you have a data centre with 100,000 servers then you actually start to expect at least one failure an hour.
There are two direct consequences to this. First, it pays to design your data centre to make access and repair as easy as possible. For instance, if all servers are mounted on rails, all components simply clip into place and any machine with a hardware failure can be specifically lit up. Second, it’s probably worth just buying cheaper components and expecting to replace them more often if you know failures are going to happen regardless. With this in mind, a data centre might buy components with a mean time to failure of 20,000 hours.
However, even allowing for an increased expectation of failure, data centre operators will carefully monitor all failures to try to spot patterns. A few years ago, one major operator spotted that they were getting a far greater rate of failure of their hard drives than expected. When they investigated, they found that the physical design of the servers meant that the hard drives were overheating as they weren’t placed in the path of the airflow through the server. A simple redesign to the airflow increased reliability hugely.
While the quality of servers and data centres is continually improving, we remain far away from a state of optimisation.
In the next part of our series on Cloud Computing, we will be taking a look at how cloud services are sold and used, and how changes in technology have affected businesses.