Andrew Jones (@hpcnotes) presents an interesting question regarding power concerns for supercomputers, particularly at exascale:
Are we really saying, with our concerns over power, that we simply don’t have a good enough case for supercomputing — the science case, business case, track record of innovation delivery, and so on? Surely if supercomputing is that essential, as we keep arguing, then the cost of the power is worth it.
He goes on to ask why the public is willing to fund the massive expenditures required for specialized experimental facilities (e.g. ITER, LHC, SKA, etc.), but isn’t willing to fund an exascale computer unless the annual power bill can be kept to a “reasonable” number. Just to put these numbers in perspective, Wikipedia estimates the total project cost for LHC to be $9B, and targets for exascale power costs are $30M/year (for a computer operated for maybe five years). This $30M/year figure leaves out a lot of factors: R&D and acquisition costs for the computer hardware, non-power operational costs (personnel, facilities), and supporting infrastructure (e.g. storage and networks). My guess is that it will cost $1.5B-$2B to build a 30MW exaflop machine by 2019 and operate it for five years. Note that this cost estimate includes “in-kind” costs throughout the supply chain: no one party is actually going to write a check for $2B for an exascale machine. And, as an aside, I assume a power cost of approximately $1M per megawatt year, so a 30MW machine costs $30M/year to operate.
As Jones pointed out, this analysis misses the most costly and transformational part of moving to a 30MW exaflop machine: how do we create a new parallel programming model, create a production-quality software development ecosystem, AND get a new generation of programmers trained on this toolset, all in the next seven years? My answer? We don’t. Not going to happen. But… go back and look carefully at what I said: “a 30MW exaflop machine.” While I agree with Jones that “power is not the problem”, the rush to try to solve this “problem” is actually creating a real problem: by driving a radical change in computer hardware to meet artificial power constraints, we have made the software intractable. And without software, there is certainly no reason to buy the hardware.
If, however, you relax that power constraint, you will pay more in operational costs, but you greatly reduce the hardware and software R&D costs. In balance, you probably end up with the same $1.5-$2B total cost for a 150MW exaflop machine, but it is useable by 2019. The big data centers of the world (think Google, Microsoft, Amazon, etc.) already top 100MW facilities, so it is physically possible, and, apparently, worth the cost to those companies.
So, circling back to the original question, why doesn’t this make sense to the funding agencies? Why are they so hung up on a power bill?
My opinion is the title of this post: “familiarity breeds complacency,” or, to be more blunt, “supercomputing is boring.” Putting a man on the moon or finding the Higgs boson may not have direct practical benefit to the average citizen, but they involve big chunks of equipment doing things that haven’t been done before. Remember what happens to public enthusiasm after the second moon shot, shuttle launch, whatever? By launch twenty, it doesn’t even make the news. To make things worse, computers/game consoles/smart phones are so commonplace that people cannot grasp why we need to spend massive amounts of money on them. Don’t we have enough?
I find it interesting that the (re-)emergent supercomputing nations (e.g. China & Russia) think of supercomputing as infrastructure for economic development, not solely as scientific experiments as we tend to do in the US. Those countries are providing (or planning to provide) large supercomputing facilities to commercial entities, which may not have the wherewithal to build such resources on their own. In the US, the big, publicly-visible machines all belong to the DOE in one way or another, and are mostly used by DOE researchers. Non-computer companies with large computational needs (oil & gas, financials, etc.) build their own, and often keep them secret to maintain a competitive advantage: I’ve heard of a couple of examples where a company’s private system dwarfed the Top500 of the time. Sure, HPC Cloud computing could address the needs of smaller companies, if a startup pharma was willing to put all of its data and algorithms in the hands of a third party.
I don’t know if we can make supercomputing “sexy” again. I feel that the era of “We’re #1 in the Top500” as a reason to build a new machine has drawn to a close, so what application could make that investment worthwhile? Unfortunately, when I read the DOE Exascale Scientific Grand Challenges reports, I just don’t see a single project that, on its own, would convince Congress to fund an exascale machine. As a collection, those projects have merit and value, but no individual project is a moon shot, and it is difficult to turn a portfolio of projects into a compelling elevator speech.
Maybe it’s a case of “No one believes the CFD results except the one who performed the calculation, and everyone believes the experimental results except the one who performed the experiment.” –P.J. Roache
We need more examples of real discoveries that have been made by computation and later confirmed experimentally. I think there are some — but not many.