Whether analyzing experimental data collected from devices in the field on a laptop or generating simulated data from large-scale numerical calculations performed on high-performance computing (HPC) systems, how you move your data to where you need it, when you need it, is one of the most important aspects of creating your research workflows. And there are many ways to transfer data between the data storage and file systems you interact with. But which transfer method is right for you will depend on the answers to a few key questions about the data, namely: Where is the data located? How is the data organized? How much data is there? And where is the data going?
In this first part of our series on Data Management, we introduce you to the essential concepts and command-line tools you should learn when you first begin transferring data to and from HPC (or any remote) systems regularly. You will learn how to check the integrity of your data after a transfer has been completed, how to utilize file compression, and how to choose the right data transfer tool for different situations. We also introduce you to the common data storage and file systems your data may encounter, their advantages and limitations, and how their different characteristics may affect data transfer performance on one end or the other. Additional topics about data transfer will be covered as time permits.
---
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.