Saturday, May 31, 2014

Optimizing Data Centers Through Machine Learning

Google has published a paper outlining their approach on using machine learning, a neural network to be specific, to reduce energy consumption in their data centers. Joe Kava, VP, Data Centers at Google also has a blog post explaining the backfround and their approach. Google has one of the best data center designs in the industry and takes their PUE (power usage effectiveness) numbers quite seriously. I blogged about Google's approach to optimize PUE almost five years back! Google has come a long way and I hope they continue to publish such valuable information in public domain.



There are a couple of key takeaways.

In his presentation at Data Centers Europe 2014 Joe said:  
As for hardware, the machine learning doesn’t require unusual computing horsepower, according to Kava, who says it runs on a single server and could even work on a high-end desktop.
This is a great example of a small data Big Data problem. This neural network is a supervised learning approach where you create a model with certain attributes to assess and fine tune the collective impact of these attributes to achieve a desired outcome. Unlike an expert system which emphasizes an upfront logic-driven approach neural networks continuously learn from underlying data and are tested for their predicted outcome. The outcome has no dependency on how large your data set is as long as it is large enough to include relevant data points with a good history. The "Big" part of Big Data misleads people in believing they need a fairly large data set to get started. This optimization debunks that myth.

The other fascinating part about Google's approach is not only they are using machine learning to optimize PUE of current data centers but they are also planning to use it to effectively design future data centers.

Like many other physical systems there are certain attributes that you have operational control over and can be changed fairly easily such as cooling systems, server load etc. but there are quite a few attributes that you only have control over during design phase such as physical layout of the data center, climate zone etc. If you decide to build a data center in Oregon you can't simply move it to Colorado. These neural networks can significantly help make those upfront irreversible decisions that are not tunable later on.

One of the challenges with neural networks or for that matter many other supervised learning methods is that it takes too much time and precision to perfect (train) the model. Joe describing it as a "nothing more than series of differential calculus equations " is downplaying the model. Neural networks are useful when you know what you are looking for - in this case to lower the PUE. In many cases you don't even know what you are looking for.

Google mentions identifying 19 attributes that have some impact on PUE. I wonder how they short listed these attributes. In my experience unsupervised machine learning is a good place to short list attributes and then move on to supervised machine learning to fine tune them. Unsupervised machine learning combined with supervised machine learning can yield even better results, if used correctly.