News
SDSC’s Research Data Services Division Serves the Research Community with Heart
Published June 14, 2022
Kimberly Mann Bruch, SDSC External Relations
If high-performance and data-intensive computing and cyberinfrastructure make up the soul of the San Diego Supercomputer Center, then the Research and Data Services Division might just be at its heart. Known as RDS to the SDSC community, this division provides services that enable researchers to attain their research and computing goals.
According to Brian Balderston, director of infrastructure for RDS, the division's services include foundational needs for researchers—power, network and systems—as well as systems integration support.
“Our services also serve more variable needs of researchers, such as cloud computing, storage for active workloads or archival use cases, as well as backup storage for disaster recovery. We have the expertise to guide PIs to their research goals and have cultivated a vast network to grow partnerships more broadly,” said Balderston.
RDS collaborates with researchers to identify, build and serve their research computing and data needs, which include compute, storage maintenance and research support. The division also offers on-premise and public cloud computing resources and solutions, often tailor-made to academic research needs. It offers on-premise services in the 19,000-square-foot, 3.5 MW (with potential capability of 13 MW), high-speed, network-connected data center. RDS also supports education with year-round internship opportunities for undergraduate students interested in software development, project management and other research computing experiences.
According to Christine Kirkpatrick, director of the RDS Division, much of what happens in RDS is only seen when things go wrong.
“We quietly work behind the scenes to anticipate what infrastructure will be needed for tomorrow’s science and to deliver research computing services with a high degree of up time and good customer service to researchers,” said Kirkpatrick. “All activities, especially the Data Center operations and the Help Desk, have carried on at full speed even with the uncertainty of our times. Our infrastructure teams, led by Brian Balderston, have been fortunate to grow during this period, due in large part to the spirit of fun and collegiality alive in RDS, as well as the flexibility afforded by leadership to retain and attract top talent.”
Kirkpatrick explained that the enterprise networking team is upgrading RDS’ backbone to 400Gb capacity. The storage service, USS/Qumulo, continues to be a runaway hit with contracts for 20 petabytes and growing in the three years since its inception. Additionally, the platforms and cloud integration teams continue to deliver excellent service to individual researchers, high-profile research partners in our region and internationally, UC San Diego departments and other UC partners.
Over the past few years, RDS has been steadily working on the intersection of artificial intelligence (AI)/Machine Learning (ML) and FAIR Principles (findable, accessible, interoperable and reusable research objects including data), as well as reproducibility.
“My own research is in data-centric AI, working at the intersection of ML and FAIR, with a focus on making AI more efficient to save time and power consumption—for costs and carbon footprint concerns. Our Senior Cloud Integration Engineer Kevin Coakley conducts research related to AI reproducibility,” said Kirkpatrick. “It can be easy to focus on employing techniques like ML, but many people don’t realize that ML processes may need to be run multiple times and that the results can vary between laboratories (the term that wraps up everything about a specific processing environment including the hardware and software versions). These differences in results can sometimes change the scientific inference meaning that is taken away. The Open Science Grid, led by our SDSC Director Frank Würthwein has been a tremendous resource for re-running ML processes on several different types of clusters.”
RDS Strategist Melissa Cragin is forging ahead on AI readiness, with a particular focus on research organizations, which will benefit from a shared understanding of the state of AI/ML in order to respond to emerging policy, regulation and ethics standards.
“The aim is to consider, for example, what might be needed for tracking institutional research data assets in order to provide oversight related to responsible research. For repository owners, this might mean adding new techniques that could make their holdings easier for reuse with AI methods,” explained Cragin. “This requires engaging with many topics that relate to FAIR (data), such as including rich metadata, but also aspects specific to AI such as provenance descriptions and original purpose so that decisions can be made on fit-for-use.”
RDS also leads the EarthCube Office (ECO) which provides coordination and technical resources to NSF’s $85 million Geosciences and Cyberinfrastructure portfolio. For example, the EarthCube initiative just completed a second notebook competition that treats notebooks—a resource for documenting and sharing computational processes, data, and code—as scholarly objects. Additionally, the RDS team earned an extension of its cooperative agreement with transition funding from the NSF to help implement sustainability recommendations to ensure lasting impact of the overall initiative.
Another NSF-funded initiative co-led by RDS is the West Big Data Innovation Hub, which aims to build and strengthen partnerships across academia, industry, nonprofits and government—connecting research, education and practice to harness the data revolution. Most recently, RDS staff, led by Kim Bruch, worked with the Pala Native American Youth Council on a national DataJam project and was awarded “Best New Team” at the final competition.
Other recent RDS accomplishments include the recent replacement of 20,000 pounds of toxic lead-acid batteries with a safer, environmentally friendly and cost-effective alternative. The project in partnership with Urban Electric Power will more than double available battery backup electricity.
“SDSC is the world's first enterprise application of this innovative rechargeable battery technology, and our partnership with Urban Electric Power has made our computing footprint greener,” said Kirkpatrick in a previous news article (April 20, 2022; SDSC/UC San Diego).
According to Balderston, RDS serves several of the core principles of the University—primarily research and education.
“Our work contributes to discoveries in the cosmos, under the oceans and novel healthcare advances. It supports efforts to combat natural disasters and to apply research for social good. And it serves the planet in the fight against climate change,” said Balderston. “We are also working to ensure that research efforts are achieved in an equitable and ultimately FAIR fashion. We provide stable services and functional structures that researchers can count on.”
RDS includes experts in the following areas: Platform Services, Cloud and Storage, Enterprise Network Services, Help Desk, Operations, Research Data Initiatives, Project Management Office and Student Interns. For more information about RDS, please visit the website.