COMPLECS: Data Transfer

Remote event

Whether analyzing experimental data collected from devices in the field on a laptop or generating simulated data from large-scale numerical calculations performed on high-performance computing (HPC) systems, how you move your data to where you need it, when you need it, is one of the most important aspects of creating your research workflows. And there are many ways to transfer data between the data storage and the file systems you interact with. However, which transfer method is right for you will depend on the answers to a few key questions about the data: Where is the data located? How is the data organized? How much data is there? And where is the data going? 

In this second part of our series on Data Management, we introduce you to the essential concepts and command-line tools you should learn when you first begin transferring data to and from HPC (or any remote) systems regularly. You will learn how to check the integrity of your data after a transfer has completed, how to utilize file compression, and how to choose the right data transfer tool for different situations. We also introduce you to the common data storage and file systems your data may encounter, their advantages and limitations, and how their different characteristics may affect data transfer performance on one end or the other. Additional topics about data transfer will be covered as time permits. 

--- 
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes

Computational & Data Science Research Specialist, SDSC

Marty Kandes is a Senior Computational and Data Science Research Specialist at the San Diego Supercomputer Center (SDSC). As part of the High-Performance Computing (HPC) User Services Group within the Data-Enabled Scientific Computing Division, he provides technical user support and services to the national research community leveraging the Advanced Cyberinfrasurcture (CI) and HPC resources designed, built and operated by SDSC on behalf of the U.S. National Science Foundation (NSF). Marty is also a member of the National Artificial Intelligence (AI) Research Institute for Intelligent CI with Computational Learning in the Environment (ICICLE). His current research interests include problems in distributed AI inference over wireless networks, data privacy in natural language processing, and secure interactive computing. He also contributes to many of the education, outreach, and training initiatives at SDSC, including serving as a Co-PI for the COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure (COMPLECS) CyberTraining program and as mentor for the Research Experience for High School Students (REHS) program. Marty received his Ph.D. in Computational Science from the Computational Science Research Center (CSRC) at San Diego State University (SDSU), where he studied quantum systems in rotating frames of reference through the use of numerical simulations. He also holds an M.S. in Physics from SDSU and dual B.S. degrees in Applied Mathematics and Physics from the University of Michigan, Ann Arbor.