High-performance computing (HPC) systems often have multiple specialized data storage and file systems mounted to them with different capabilities and tiered levels of performance. How you read, write, and store your data on them really matters. The correct use of a storage system will help optimize the performance and throughput of your research workload(s). But perhaps more important and less obvious, the misuse and abuse of some types of filesystems by a single end-user can negatively impact the collective performance of an entire HPC system for all users. Because of this storage social dilemma, it is critically important for you to know what use cases and input and output (I/O) access patterns are appropriate for the type(s) of data storage and file systems available to you.
In this second part of our series on Data Management, we introduce you to some of the more common data storage and file systems you’ll find mounted on HPC systems today. You will learn the basic hardware and software architecture of these filesystems, their capabilities, and the typical use case(s) in HPC. We also provide an overview of relevant Linux command-line tools that will enable you to gather information about these filesystems, measure your usage of their storage resources, and, if applicable, reconfigure them as needed for your specific research data and workload(s). Data backups, security, and file permissions are also highlighted. Additional topics about data storage and file systems will be covered as time permits.
---
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.