COMPLECS: Data Storage and File Systems

High-performance computing (HPC) systems often have multiple specialized data storage and file systems mounted to them with different capabilities and tiered levels of performance. How you read, write, and store your data on them really matters. The correct use of a storage system will help optimize the performance and throughput of your research workload(s). But perhaps more important and less obvious, the misuse and abuse of some types of filesystems by a single end-user can negatively impact the collective performance of an entire HPC system for all users. Because of this storage social dilemma, it is critically important for you to know what use cases and input and output (I/O) access patterns are appropriate for the type(s) of data storage and file systems available to you.

In this first part of our series on Data Management, we introduce you to some of the more common data storage and file systems you’ll find mounted on HPC systems today. You will learn the basic hardware and software architecture of these filesystems, their capabilities, and the typical use case(s) in HPC. We also provide an overview of relevant Linux command-line tools that will enable you to gather information about these filesystems, measure your usage of their storage resources, and, if applicable, reconfigure them as needed for your specific research data and workload(s). Data backups, security, and file permissions are also highlighted. Additional topics about data storage and file systems will be covered as time permits.

---
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes

Computational and Data Science Research Specialist, SDSC

Marty Kandes is a Senior Computational and Data Science Research Specialist at the San Diego Supercomputer Center (SDSC). As part of the High-Performance Computing (HPC) User Services Group within the Data-Enabled Scientific Computing Division, he provides technical user support and services to the national research community leveraging the Advanced Cyberinfrasurcture (CI) and HPC resources designed, built and operated by SDSC on behalf of the U.S. National Science Foundation (NSF). Marty is also a member of the National Artificial Intelligence (AI) Research Institute for Intelligent CI with Computational Learning in the Environment (ICICLE). His current research interests include problems in distributed AI inference over wireless networks, data privacy in natural language processing, and secure interactive computing. He also contributes to many of the education, outreach, and training initiatives at SDSC, including serving as a Co-PI for the COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure (COMPLECS) CyberTraining program and as mentor for the Research Experience for High School Students (REHS) program. Marty received his Ph.D. in Computational Science from the Computational Science Research Center (CSRC) at San Diego State University (SDSU), where he studied quantum systems in rotating frames of reference through the use of numerical simulations. He also holds an M.S. in Physics from SDSU and dual B.S. degrees in Applied Mathematics and Physics from the University of Michigan, Ann Arbor.

Questions?

Contact SDSC Events Coordinator