COMPLECS: Data Storage and File Systems

High-performance computing (HPC) systems often have multiple specialized data storage and file systems mounted to them with different capabilities and tiered levels of performance. How you read, write, and store your data on them really matters. The correct use of a storage system will help optimize the performance and throughput of your research workload(s). But perhaps more important and less obvious, the misuse and abuse of some types of filesystems by a single end-user can negatively impact the collective performance of an entire HPC system for all users. Because of this storage social dilemma, it is critically important for you to know what use cases and input and output (I/O) access patterns are appropriate for the type(s) of data storage and file systems available to you.

In this second part of our series on Data Management, we introduce you to some of the more common data storage and file systems you’ll find mounted on HPC systems today. You will learn the basic hardware and software architecture of these filesystems, their capabilities, and the typical use case(s) in HPC. We also provide an overview of relevant Linux command-line tools that will enable you to gather information about these filesystems, measure your usage of their storage resources, and, if applicable, reconfigure them as needed for your specific research data and workload(s). Data backups, security, and file permissions are also highlighted. Additional topics about data storage and file systems will be covered as time permits.

---
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes

Computational & Data Science Research Specialist High-Performance Computing User Services Group Data-Enabled Scientific Computing Division, SDSC

Marty Kandes a Computational and Data Science Research Specialist in the High-Performance Computing User Services Group at SDSC. He currently helps manage user support for Comet — SDSC’s largest supercomputer. Marty obtained his Ph.D. in Computational Science in 2015 from the Computational Science Research Center at San Diego State University, where his research focused on studying quantum systems in rotating frames of reference through the use of numerical simulation. He also holds an M.S. in Physics from San Diego State University and B.S. degrees in both Applied Mathematics and Physics from the University of Michigan, Ann Arbor. His current research interests include problems in Bayesian statistics, combinatorial optimization, nonlinear dynamical systems, and numerical partial differential equations.

Questions?

Contact SDSC Events Coordinator

Data Storage and File Systems