My PUE is bigger than yours
There seems to be an endless comparison of the size of different data centres’ PUE. Whilst on the one hand this is encouraging as it demonstrates that the industry is facing up to the energy cost or environmental issues facing data centres, it is also a little depressing as the numbers are, in many cases pretty meaningless and not comparable. Why is this so, and what might we be able to do about it?
- Our PUE is X
- Our PUE is better than (competitor)
- Our Design / Target PUE is X
- Our data centre can achieve a PUE as low as X
- Our Average PUE is X
Whilst all these claims may appear to be talking the same language there are some complex issues lurking underneath which compromise our ability to understand or use these numbers.
The Green Grid has started trying to address some of these problems, particularly in the many creative ways that operators and designers decide what PUE to claim. Efforts so far largely deal with the measurement method and duration but still provide a broad range of options regarding the period of the measurement. It might be preferable to mandate a minimum 1 year of data for any PUE claim not specifically identified as being something else.
Some large operators, such as Google, have published quite a lot of material about their PUE and how they have achieved it. One issue with this is that this only seems to include Google designed data centres and not the legacy estate which they seem to be less keen to show off. This is perhaps not surprising given the frequency with which bizarre “research” estimates of the carbon cost of a Google search are published by the mainstream press.
Design PUE is an often heard claim and probably the least useful or informative. It means “this data centre that I have only just turned on and has a terrible PUE might, in the future, at full load and on a really cold day get close to this design PUE number”. Design PUE is the data centre equivalent of Peak Music Power Output for cheap audio goods.
(from Wikipedia, “Peak Music Power Output (PMPO) is a much more dubious figure of merit, of interest more to advertising copy-writers than to consumers”).
What would be of far more use is a realistic projection of “with our planned fill out rate we expect to average X PUE in year 1, Y PUE in year 2 etc”.
PUE varies with load
The big issue with a real data centre and PUE measurements is that the PUE varies with the IT electrical load. The data centre has a certain fixed load which it would draw even if you turned off all the IT equipment. This means that for most data centres the PUE is pretty poor at low load and then as they fill up the PUE tends toward an optimal conditions value. Most people now recognise this but very few have the full range of measurement data or the predictive capability to state how their data centre will perform.
So, when we install the first server it will be the least efficient from a total energy perspective. As we add more servers the data centre fixed overhead can be split over more machines and the apparent efficiency improves. This is much like running a new bus or train service. When we first run the service and there is only one passenger it is very inefficient because much of the fuel consumption is due to the vehicle and not the passenger. As the service slowly fills up with passengers over time our per passenger efficiency will improve substantially.
Not all IT load is good
The second problem is that if we assess a data centre based only on its PUE we are assuming that all IT load is good and the IT equipment is of equal efficiency and value. PUE only measures the achieved efficiency up to the IT equipment. If I have two data centres, one with a good PUE but old and inefficient IT equipment and another with a worse PUE but newer and better IT equipment I need to know more to work out which is the most efficient overall.
Save energy and get a worse PUE
The third problem is that if I implement effective measures to manage IT power based on workload such as configuring power management and shutting down unnecessary servers overnight, then my IT electrical load is going to vary as I save energy. The kicker is that as my IT electrical load falls my PUE goes up. If this data centre were being targeted or assessed on the PUE then this would be a good incentive to disable those power management capabilities or leave some decommissioned servers turned on to keep the PUE down.
Can’t we measure the output?
We could get around this problem if we could measure the IT work or output of the data centre. Unfortunately, there is no credible measure for data centre output. Do you count building the Google search index or just serving the queries; if you count the search index then how do you divide it across searches? How many Google map viewings are worth one YouTube video?
We can only usefully compare on the carbon to deliver the service. If you can tell me how much carbon a Google and Bing search creates and I can choose if I want to. Tell me how much carbon a Gmail box creates and I can consider whether to junk my Exchange server.
I don’t care about
There are a few things I really don’t care about which some parties seem to be getting unnecessarily hung up on. These are:
- Details of what you included in your PUE.
- Whether your ops room is on the IT UPS or which side of the PDU transformer the power meter is on as the seasonal or load based drift in PUE is bigger anyway.
- The PUE of a just a few of your data centres.
Please, don’t claim you are really green because some of your data centres show a good number, which one(s) does my service come from? Either tell us about the whole estate or nothing.
What should we do?
We need to standardise as an industry on an annual measurement of PUE. In other words, take the total supplied energy over a calendar year and the total delivered to the IT equipment and this is your PUE. The Green Grid has gone some way towards this with their reported / registered / certified measurements but we need to go further in eradicating the use of PUE as a meaningless marketing number before it actually does lose any meaning as a marker and we are left with no metrics.