Petaflop Imperative

Published June 20, 2004

Aaron Ricadela, Information Week
Copyright 2004 CMP Media LLC

Breaking the petaflop barrier has all the symbolism of the four-minute mile. But businesses want progress on a more-important measurement: Time to insight.

On a spring night in Silicon Valley, IBM executive Nick Donofrio is poised to host a 40th birthday party at the Computer History Museum for the company's world-changing mainframe. That computer-lionized alongside industrial achievements like the Boeing 707 and Model T Ford-helped usher in the U.S. moon shot, automated airline reservations, and Medicare. But an hour before party time, Donofrio, an IBM lifer responsible for commercializing new technology, wants to talk about breaching a technical barrier looming over this decade: Construction of a "petaflop" computer, capable of a staggering 1,000 trillion computations per second-and perhaps equally capable of upending business and science.

IBM, he says, is closer than most people think. "We know a petaflop is a high-water mark, and I want it done," says Donofrio, a senior VP who joined the company in 1967. "We'll achieve a petaflop in 2005 or 2006. We're that close."

A petaflop machine-with a top speed of 1 quadrillion mathematical computations per second (the "flops" stands for floating point operations)-could help engineers virtually prototype entire cars and planes without building costly models, weather forecasters zero in on storms with crackerjack accuracy, traders instantly predict the ripple effect of a stock-price change throughout its sector, and doctors biopsy tissue samples for fast decisions while a patient still lies on the operating-room table.

"If a petaflop was delivered tomorrow, we probably wouldn't know what to do with it. By the end of the decade, we'd probably consume it," says Tom Tecco, director of computer-aided engineering and testing at General Motors Corp. New safety requirements in the United States and overseas prompted GM to install a 9-teraflop IBM supercomputer in April, and the company's teraflop requirements have been increasing at 35% to 50% a year. "I don't see that changing any time soon," Tecco says.

To get a perspective on how far we are from a petaflop machine, the world's fastest supercomputer today, the climate-modeling Earth Simulator in Yokohama, Japan, has a top speed of 40 trillion operations a second. So don't expect petaflops tomorrow. Most computer scientists-Donofrio's enthusiasm aside-don't believe they'll break the boundary until the decade's end. But many believe that doing so is critical to U.S. business competitiveness. "U.S. industry, government, and scientific enterprise have to have access to the next generation of tools to ask questions that may not be on their radar today," says Deborah Wince-Smith, president of the Council on Competitiveness, a government-funded research institute that advocates construction of a petaflop system. "The only driver of productivity growth for the U.S. is our innovation capacity. We're not going to compete on standardized products and services."

Being first to break the petaflop barrier also would provide a huge practical and psychological boost for American scientists and engineers, arming them with the world's most powerful tools at a time when the country's lead in science and technology is perceived to be slipping away-and indeed is by some measurements, such as patent awards and published papers. "Why was the four-minute mile more important than four minutes, one second?" asks Steve Wallach, VP at Chiaro Networks and a longtime supercomputer designer and consultant. "If you look at our whole society, we like milestones."

Here's one: Later this year, IBM plans to deliver a version of its experimental Blue Gene/L supercomputer capable of 360 teraflops-a third of the petaflop mark-to Lawrence Livermore National Laboratory in California. A follow-on machine called Blue Gene/P could reach a petaflop in three years, people at Livermore say. Blue Gene, a line of supercomputers IBM has been researching since 1999, was designed from the get-go to break the petaflop plateau, and they're shooting up the list of the world's 500 fastest computers. On the new list published this week by a group of academics in Tennessee and Mannheim, Germany, an 11.68 teraflop Blue Gene/L system is No. 4 on the top 500, and an 8.65 teraflop Blue Gene clocks in at No. 8. With one eye on the history books and the other on commercial payback down the road, IBM has poured $100 million into Blue Gene's development.

What's still very much in doubt, though, is whether such a computer could be effectively programmed, managed, and linked with other technologies used by the very businesses it's supposed to help most. Or as Donofrio puts it: "Will customers be able to run something meaningful on a petaflop computer?"

The question isn't rhetorical, and it hints at a measurement of supercomputing that's far more relevant to businesses than raw performance: Time to insight. That's the time it takes to understand a problem, plan a solution, write the software, and run the job.

Bob Graybill, the Defense Advanced Research Projects Agency program manager in charge of its petaflop effort, has been talking up-and funding-the same idea. With a cluster of PCs, small science teams can run a job themselves, instead of waiting for time at a National Science Foundation computing center. IBM recognizes this, and Blue Gene is supposed to be programmer-friendly because it runs industry-standards Fortran, C++, Linux, and the technical computing language MPI. IBM hopes that if scientists can write supercomputer software with the same tools they use to program PCs and workstations, they get the best of both worlds. "Wall-clock time it takes to get the answer to the question matters," says Bill Pulleyblank, director of exploratory server systems at IBM.

The U.S. government is betting big money that academic researchers and private-sector scientists will be able to write software that can take advantage of these theoretical supercomputing beasts. If they can't, the United States is going to end up with some very expensive, very smart white elephants.

The Energy Department last month said it's prepared to spend as much as $200 million in the next five years to install a Cray- and IBM-supplied supercomputer at Oak Ridge National Lab in Tennessee capable of 250 teraflops by 2007, and 1 petaflop by the decade's end. It would help research physics, biology, global warming, and nanoscience. Separate Pentagon-funded projects at Cray, IBM, and Sun Microsystems are trying to achieve petaflop performance by 2010. The president's IT Advisory Committee wants a petaflop supercomputer by 2010. And a Senate bill introduced in March proposes spending $800 million on supercomputing by 2009, largely to eclipse the Earth Simulator's performance. A House bill introduced in April seeks to make the White House responsible for ensuring that federal agencies with interests in supercomputing coordinate efforts. If successful, a few years after the technologies in government-sponsored petaflop machines debut, they'll likely trickle down to private companies, which own about half of the current top 500 supercomputers.

"We'll reach a petaflop in the next couple of years. We have to-it's a national imperative," says Marshall Peterson. Peterson is best known for assembling the computing apparatus that helped his former company, Celera Genomics, win the race to sequence the human genome in 2000. Now he's chief technology officer for the J. Craig Venter Science Foundation Inc., which owns four nonprofit biotech companies headed by Venter. Peterson says the biotech industry needs the kind of performance that IBM and Cray are building into their next-generation supercomputers to deliver useful products. Peterson expects to get time on the machine at Oak Ridge once it's live. "We want to be able to model cells, tissues, and organisms," he says. "A petaflop won't do that, but it's a start."

Still unresolved is how well biotech-industry software can run on both smaller computers and teraflop behemoths, and whether centralized supercomputers can yield useful results for an industry that wants to analyze unrelated data on patients' genetic makeup, hometown climate, and lifestyle choices-anything that could describe why different groups of people respond differently to the same drugs. "Just programming these monsters is problematic enough," Peterson says. "Then we have to deal with data access and the grid. All the data we need is not going to sit around that big computer. We're still trying to find the killer app."

IBM is trying to simplify the search. Blue Gene was conceived of as a computer to simulate protein-folding, a mysterious bodily process that could shed light on disease formation if it's well understood. That work is still under way-a small version of Blue Gene/L can run one protein simulation 12 times faster than IBM's popular SP supercomputers, Pulleyblank says. This summer, the company plans to start testing a broader palette of software with business appeal: Seismic imaging apps that help oil companies find petroleum reserves, computational chemistry software for biotech, and business-intelligence software for analyzing sales data. IBM this summer also plans to bring a Blue Gene/L machine online at its computing center in Poughkeepsie, N.Y., making time available to select customers over the Internet. Yet another pilot involves testing derivatives-pricing software with an unnamed bank. "IBM's a big company. They don't make a living on things you just sell to universities," says Phil Andrews, director of high-end computing at the San Diego Supercomputer Center.

More companies are incorporating high-performance workloads into their everyday computing chores. Businesses have been able to deploy mini-supers on the cheap by linking hundreds or thousands of PCs over standard Ethernet networks. But those clusters, and even more sophisticated "massively parallel processing" supercomputers such as IBM's SP, are tricky to program and spend lots of time waiting for data to show up from memory, slowing performance. And some users question whether Linux clusters' performance gains can last as systems grow bigger.

Derek Budworth, director of enterprise storage and servers at Boeing Co., says the PC clusters it uses for military contracts compute more cheaply than the 112-processor X1 supercomputer from Cray that it installed this year, largely to design Boeing's 7E7 Dreamliner, a super-efficient midsize jet scheduled to fly in '08. But getting answers takes longer with the lower-cost approach. "You have a trade-off between how soon you need your results and how much you want to pay," he says.

With Blue Gene, IBM says it can deliver world-beating performance at cluster prices-about $1 per million calculations per second, versus several dollars a megaflop for a specialized supercomputer like Cray's new X1. Pulleyblank says Blue Gene is meant to tackle the big issues holding back performance and affordability of supercomputers: slow data-shuttling, power-hungry and hot-running designs, and the tendency to consume lots of expensive real estate.

Still unsettled is whether the computer industry can develop new designs that overcome performance limits of off-the-shelf supercomputers, while pleasing big-science government users and the time-constrained private sector at the same time. David Shaw, chairman of the D.E. Shaw Group, an investment and technology-development company that applies computational techniques to financial trading, isn't positive it can. "If we have to develop novel architectures to achieve world leadership in supercomputing, is there enough commonality between national and commercial needs to support a common architecture for both?" he asks. "We don't fully know the answer." Shaw, a former computer-science professor and Clinton administration technology adviser, is involved in a supercomputing survey of big companies being fielded by the Council on Competitiveness. Regardless of whether science and business are aligned, Shaw says, both fields are reaching the point that "hooking commercial microprocessors together" isn't enough to solve all emerging problems in defense, intelligence, drug design, and materials science.

Include finance in the mix as well. One technologist on Wall Street says his company will probably manage 1 million networked computing devices within five years and is going to need high-performance computers that can grow to handle them. "Blue Gene is right on the money for us," he says. "We'd like to trade away from clock speed, power consumption, and cooling if we could. Heck, we'd take that in a heartbeat."

Not so fast, some experts say. The specialized architectures IBM and Cray say will reach a petaflop first are so hard to develop software for, there aren't likely many companies that will need that much juice. "If they could get certain kinds of database apps to run on [Blue Gene], that would be interesting to CIOs," says Larry Smarr, director of the California Institute for Telecommunications and Information Technology and founder of the National Science Foundation's supercomputing program. "The problem with specialized architectures is it takes too long" to develop software.

Merck & Co. is a big user of supercomputers from IBM, Sun, and others in its research division, which tests new molecular compounds for their efficacy as potential drugs. Company scientists have their eyes on Blue Gene, but historic milestones don't mean much to them. "The petaflop in and of itself isn't enough to gain our enthusiasm," says Irene Qualters, VP for research information systems. "It has to be a sustainable architecture we can invest in over the long haul." In many cases, traditional wet-lab instruments yield cheaper, better results than computer simulations, adds senior computing director Jeff Saltzman. "Blue Gene is a long-term investment with an uncertain probability of success," Saltzman says. "We're a moving target as far as our requirements. If we can't compute it, then we'll do experiments."

Yet it's not just biotech, aerospace, and finance companies that depend on superpowerful computers. Procter & Gamble Co., for example, used Silicon Graphics Inc.'s newest Altix 3000 supercomputer to design a new aroma-preserving container for Folgers coffee that costs about $7. Advocates of government support for the supercomputing effort contend it's that kind of tech-enabled innovation that will allow U.S. industry to compete with nations such as China, where wages are lower. "The worst thing we can do is think we can compete in advanced manufacturing with low wages," says Council on Competitiveness president Wince-Smith, a former technology policy adviser in the Reagan and first Bush administrations. "We'll compete in rapid prototyping using high-end computing."

Since Japan's Earth Simulator supercomputer shocked Washington two years ago, there's been a sense that the United States could lose its lead in other scientific disciplines, just as it did in climate science. The National Science Foundation last month reported that U.S. dominance in critical scientific fields is slipping, as measured in the number of patents awarded and papers published. The percentage of American Nobel Prize winners has fallen during the 2000s amid competition from Europe and Japan. And fewer American students are training to become scientists and engineers. Meanwhile, Japan, Taiwan, and South Korea have seen rapid growth in the number of patents awarded over the past 20 years. Europe is poised to take the lead in particle physics, with the world's largest supercollider in Switzerland scheduled to open in 2007. Spain is planning to build the second-most-powerful computer for general scientific use.

There's a business threat inherent in that shift. The Earth Simulator Center in Japan is reportedly negotiating deals with Japanese automakers to use time on the world's fastest computer to boost their quality and productivity. And NEC Corp. could disclose an even faster system next year. "We in the United States need to create new computer architectures that can boost computing power by many times over our current machines-and everybody else's," Energy Secretary Spencer Abraham said at a press conference last month.

In April, people from Dupont, BellSouth, GM, IBM, Lockheed Martin, Merck, and Morgan Stanley held the first meeting of the Council on Competitiveness' high-performance computing committee. Their goal: figure out how to square the interests of government scientists and policy makers with those of private-sector computer users. One area of work is trying to form new public-private partnerships so companies can access experimental computing architectures years before they become affordable enough for budget-conscious businesses.

Boeing director Budworth, who's active in the project, says the Dreamliner will be Boeing's first airplane whose assembly will be modeled more or less end to end on a supercomputer. Software that can plot the location of every part and tool and on the factory floor is becoming so sophisticated that a petaflop computer may soon be necessary to run it.

That kind of talk can only please Donofrio, who admits the Blue Gene project carries a lot of risk. "Blue Gene is an incredibly bold adventure for us," he says. "There are a lot of people who like this idea of deep computing. And we'd have to say Blue Gene is a pretty deep computer." His greatest fear? That there won't be users bold enough to follow IBM's lead and try to run one big problem on the whole thing. "The new horizon is 'What can you do with this thing?' Not 'What can a thousand of you do with it?'" he says. "That's innovation. That will light up the board."

Write to Aaron Ricadela at aricadela@cmp.com. Visit our Hardware Tech Center: informationweek.com/TC/hw

Super Size Me
The world's five fastest supercomputers, according to a new survey

Earth Simulator: 35.9 teraflops, NEC, Earth Simulator Center, Japan
Thunder: 19.9 teraflops, Digital Corp., Lawrence Livermore National Lab, California
ASCI Q: 13.9 teraflops, Hewlett-Packard, Los Alamos National Lab, California
Blue Gene/L: 11.7 teraflops, IBM, Watson Research Center, New York
Tungsten: 9.8 teraflops, Dell, National Center for Supercomputing Applications, Illinois

Data: Top500.org

http://informationweek.com/

News

Petaflop Imperative

Categories

Archive