News
TeraShake: SDSC Simulates the 'Big One'
Published November 30, 2004
by Paul Tooby, SDSC Senior Science Writer
Everyone knows that the "big one" is coming - a major earthquake on the San Andreas fault. The southern part of the fault has not seen a major event since 1690, and the accumulated movement may amount to as much as six meters, setting the stage for an earthquake that could be as large as magnitude 7.7. But scientists and engineers want to know in more detail just how intensely the earth will shake during such an event - and what impact this will have on structures, particularly in the populated sediment - filled basins of Southern California and northern Mexico.
Now, a collaboration of 33 earthquake scientists, computer scientists, and others from eight institutions has produced the largest and most detailed simulation yet of just what may happen during a major earthquake on the southern San Andreas fault. The simulation, known as TeraShake, used the new 10 teraflops DataStar supercomputer and large-scale data resources of the San Diego Supercomputer Center (SDSC) at UC San Diego (UCSD).
The collaboration is led by Tom Jordan, director of the Southern California Earthquake Center (SCEC) and professor of Earth Sciences at the University of Southern California (USC), under the SCEC Community Modeling Environment (SCEC/CME) NSF Information Technology Research grant. "In addition to enriching our understanding of the basic science of earthquakes," said Jordan, "the TeraShake simulation will contribute to estimating seismic risk, planning emergency preparation, and designing the next generation of earthquake- resistant structures, potentially saving lives and property." Professor J. Bernard Minster of the Institute of Geophysics and Planetary Physics (IGPP/SIO/UCSD), Reagan Moore, Distinguished Scientist and director of SDSC's SRB program, and Carl Kesselman, Director of the Center for Grid Technologies at USC's Information Sciences Institute (ISI) are co-PIs of the project.
The TeraShake simulation is a good example of cyberinfrastructure, involving not only large computation but also massive data and visualization. "The simulation generated 47 TB of data in a little more than four days," said Moore. "This required archiving 10 TB of data per day, the highest rate ever sustained for a single simulation at SDSC." Forty-seven TB, or 47,000 GB, is equivalent to about 47 million books, or nearly five times the printed collection of the Library of Congress.
To carry out this complex simulation required sustained cooperation among many people. "TeraShake is an outstanding example of interdisciplinary collaboration between the SCEC earthquake scientists and the core groups at SDSC, as well as the other participants in this groundbreaking research," said Moore.
Big Earthquake Impacts
The TeraShake simulation modeled the earth shaking that would rattle Southern California if a 230 kilometer section of the San Andreas fault ruptured from north to south, beginning near Wrightwood, California and producing a magnitude 7.7 earthquake.
The scientists emphasize that this research is not designed to predict when an earthquake will happen, but rather to predict in detail the resulting ground motion once the earthquake occurs. A key factor the TeraShake simulation will shed light on is the response of Southern California's deep, sediment-filled basins, from the Santa Clara Valley to the Los Angeles basin and down to the Coachella Valley. "In a major earthquake, a basin can jiggle like a bowl of jelly," said Minster. "The energy bounces off the boundaries and can produce unexpectedly large and long-lasting ground motions and resulting damage." Scientists have long known that the valley floors in these basins can experience extended shaking, but TeraShake filled in the details, with the southward-rupturing earthquake showing peak velocities of more than two meters per second and lower velocities lasting for more than three minutes in the Coachella Valley. For comparison, the strong motion in the 1906 San Francisco earthquake has been estimated by the USGS to have lasted in the range of 45 to 60 seconds.
In addition to information that will help scientists better understand the details of earthquakes, the TeraShake simulation will help answer questions such as which regions of Southern California will be hit hardest under various scenarios of large earthquakes, and the ground velocities that can be expected to shake buildings and infrastructure.
Big Simulation
"The TeraShake simulation is the fulfillment of a dream we've had for over ten years," said Minster. Previous simulations of Southern California have been limited to smaller domains and coarser resolutions, and advances in both supercomputers and related data technologies made the current simulation possible. "If we want to be able to understand big earthquakes and how they will impact sediment-filled basins, and finally structures, we need as much detail as possible," said Minster. "And this means massive amounts of data, produced by a high-resolution model running on the biggest supercomputer we can get, and this can only be done at a facility with the combined data and computing resources of SDSC."
The geographic region for the simulation was a large rectangular volume or box 600 km by 300 km by 80 km deep, spanning Southern California from the Ventura Basin, Tehachapi, and the southern San Joaquin Valley in the north, to Los Angeles, San Diego, out to Catalina Island, and down to the Mexican cities of Mexicali, Tijuana, and Ensenada in the south.
To model this region, the simulation used a 3,000 by 1,500 by 400 mesh, dividing the volume into 1.8 billion cubes with a spatial resolution of 200 meters on a side, and with a maximum frequency of .5 hertz-the biggest and most detailed simulation of this region to date. In such a large simulation, a key challenge is to handle the enormous range of length scales, which extends from 200 meters-especially important near the ground surface and rupturing fault-to hundreds of kilometers across the entire domain.
Another task was to prepare accurate input data for the domain. These inputs included the San Andreas fault geometry, and the subsurface 3-D crustal structure based on the SCEC Community Velocity Model. Seismologist Steven Day, professor of geological sciences at SDSU, provided the earthquake source, modeling the fault rupture as a 60 second duration slip, scaled down from the 2002 magnitude 7.9 Alaska earthquake on the Denali fault. In the future, the researchers plan to integrate a physics-based spontaneous fault rupture model to initiate the simulation.
Using some 18,000 CPU hours on 240 processors of the new 10 teraflops IBM Power4+ DataStar supercomputer at SDSC, the model computed 20,000 time Power4+ steps of about 1/100 second each for the first 220 seconds of the earthquake, producing a flood of data.
Data Challenges
"The TeraShake team faced unprecedented issues of data management," said Moore. "The simulation generated so much data-47 TB in some 150,000 files-and so rapidly that it pushed the envelope of SDSC's capabilities." Dealing with this data deluge required the efforts of the High- End Systems and Scientific Applications groups as well as the Data Grids Technologies group at SDSC to transfer the data, first to the disk-based Global Parallel file system, GPFS, and then to SDSC's archival tape storage-and moving it fast enough at 100 MB per second to keep up with the 10 TB per day of simulation output.
This massive data collection, a valuable resource for further research, was then registered into the SCEC Digital Library, which is managed by the SDSC Storage Resource Broker (SRB). The collection is being annotated with simulation metadata, which will allow powerful data discovery operations using metadata-based queries. In addition, each surface and volume velocity file was fingerprinted with MD5 checksums to preserve and validate data integrity. Data access, management, and data product derivation are provided through various interfaces to the SRB, including Web service and data grid workflow interfaces.
The TeraShake simulation is also part of a larger SCEC scientific program with data collections currently totalling 80 TB. To support research on this scale, SDSC is working to provide efficient online access to the growing SCEC data collections archived at SDSC.
Computational Challenges
"The large TeraShake simulation stretched SDSC resources across the board, facing us with major computational as well as data challenges," said Nancy Wilkins-Diehr, Manager of Consulting and Training at SDSC.
To simulate the earthquake, the scientists used the Anelastic Wave Model (AWM), a fourth-order finite difference code developed by Kim B. Olsen, associate professor of geological sciences at SDSU, that models 3-D velocity in the volume and surface of the domain. To enhance the code so it could scale up and run on the very large mesh size of 1.8 billion points, and with large memory allocation, SDSC computational experts Yifeng Cui, Giri Chukkapalli, and others in the Scientific Applications Group worked closely with Olsen and the other scientists who developed the AWM model. To successfully "build bridges" between the earthquake scientists and SDSC resources, the SDSC staff made use of their multidisciplinary expertise, which includes degrees in scientific and engineering disciplines, combined with extensive experience in the intricacies of today's parallel supercomputers.
For a large scale run such as TeraShake, new problems tend to emerge that are not significant in smaller scale runs. It took months of effort by the SDSC researchers and 30,000 allocation hours to port the code to the DataStar platform and resolve parallel computing issues, testing, validation, and performance scaling related to the large simulation.
SDSC's computational effort was supported through the NSF-funded SDSC Strategic Applications Collaborations (SAC) and Strategic Community Collaborations (SCC) programs. "TeraShake is a great example of why these programs are so important," said Wilkins-Diehr. "Allowing us to develop close collaborations between the computational scientists who use SDSC's supercomputers and our computational experts is crucial to achieving new science like TeraShake." The effort will also provide lasting value, with the enhanced AWM code now available to the earthquake community for future large- scale simulations.
Big Collaboration
"TeraShake owes its success to the enthusiastic teamwork over a number of months among groups with very different skills-seismologists, computer scientists, the computational experts in SDSC's Scientific Applications Group, the storage, HPC, and visualization groups at SDSC, and many others," said Marcio Faerman, a postdoctoral researcher in SDSC's Data Grids Technologies group who coordinated the team at SDSC. "These activities are not always visible, but they are essential."
For example, researchers from SIO provided the checkpoint restart capability, executed cross-validation runs, and helped define the metadata. SDSC's Scientific Applications Group and High-End Systems Group executed DataStar benchmarks to determine the best resource configuration for the run, and scheduled these resources for the simulation. The Data Grids Technologies group, which develops the SDSC SRB, designed and benchmarked the archival process. Steve Cutchin and Amit Chourasia of SDSC's visualization group labored long and hard to produce high resolution visualizations, including movies, of how the earthquake waves propagated, even while the simulation was still running. This helped the scientists ensure that the simulation was producing valid data and produced dramatic views of the enormous energy that may strike areas near the San Andreas fault during the "big one."
Earthquake Science
The long term goal of SCEC is to integrate information into a comprehensive, physics-based and predictive understanding of earthquake phenomena. TeraShake is an important step forward in this process, and the researchers presented the simulation results at the recent SCEC Annual meeting, attended by nearly 400 of the best earthquake seismologists in the country and world. "This is a very tough audience," said Minster, "and they positively loved the TeraShake results-many scientists who had been skeptical of large-scale simulations came to us using words like 'fantastic,' and 'amazing.'"
Seismologists see the TeraShake results as very valuable. "Because the TeraShake simulation is such high resolution, we can see things we've never seen before," explained Minster. "For example, we were surprised to see that the strong shaking in the Coachella Valley made it behave like a secondary earthquake source, and despite the southward-moving rupture, it reflected waves back northward to shake Los Angeles."
The earthquake research community is enthusiastic about making use of the capabilities demonstrated in TeraShake. "Many want to participate, they want the movies of TeraShake on the Web, and many want to know how to get the archived output to use in further research," said Minster. "Others want to team up for new simulations."
In the near future, the researchers plan to run multiple scenarios at the same resolution, for example, having the fault rupture from south to north, instead of north to south as in the first TeraShake run. Eventually, the scientists would like to be able to extend the simulations to even higher resolution to more accurately model the intricate details and higher frequency shaking of earthquakes, which affects structures.
But even doubling the spatial resolution from 200 to 100 meters, for example, will produce eight times the spatial data, along with twice as many time steps, for a total of 16 times more information-in the range of 800 TB. This exceeds the current capabilities of even the large resources of SDSC. And scaling the code to run in larger simulations will require additional efforts from SDSC's computational experts. These challenges will drive future cyberinfrastructure growth to support such simulations with one to two PB of disk and 10 to 20 PB of tape, and with GB/sec parallel I/O so that researchers can access and compute with these massive and fast-growing collections.
"Beyond TeraShake, expanding our capability to handle large simulations and data at SDSC is useful for other large-data simulations such as ENZO, an astrophysics simulation of the early universe, as well as data-intensive analyses of observed data collections like the multi-TB all-sky image collections of the National Virtual Observatory," said Moore. TeraShake demonstrates SDSC's capabilities as a leading site for end-to-end data- intensive computing, and is expected to encourage more researchers to explore how far the capabilities have grown to support their own large-scale computational and data problems.
In addition to SDSC, IGPP/SIO, USC, and ISI, other institutions taking part include San Diego State University (SDSU), the University of California Santa Barbara (UCSB), and Carnegie Mellon University (CMU), along with the Incorporated Research Institutions for Seismology (IRIS) and the US Geological Survey (USGS), which participate in the SCEC/CME Project.
Project Leaders
J. Bernard Minster, IGPP/SIO/UCSD, Kim B. Olsen and Steven Day, SDSU, Tom Jordan and Phil Maechling, SCEC/USC, Reagan Moore and Marcio Faerman, SDSC/UCSD
Participants
Bryan Banister, Leesa Brieger, Amit Chourasia, Giridhar Chukkapalli, Yifeng Cui, Steve Cutchin, Larry Diegel, Yuanfang Hu, Arun Jagatheesan, Christopher Jordan, Patricia Kovatch, George Kremenek, Amit Majumdar, Richard Moore, Tom Sherwin, Donald Thorp, Nancy Wilkins-Diehr, and Qiao Xin, SDSC/UCSD
Jacobo Bielak and Julio Lopez, CMU, Marcus Thiebaux, ISI, Ralph Archuleta, UCSB, Geoffrey Ely and Boris Shkoller, UCSD, David Okaya, USC