COMPLECS: Linux Tools for Text Processing

Many computational and data processing workloads require pre-processing of input files to get the data into a format that is compatible with the user’s application and/or post-processing of output files to extract key results for further analysis. While these operations could be done by hand, they tend to be time-consuming, tedious and, worst of all, error prone. In this session we cover the Linux tools awk, sed, grep, sort, head, tail, cut, paste, cat and split, which will help users to easily automate repetitive tasks. We conclude by showing how large language models (LLMs) such as ChatGPT could be used to write commands using these tools.

---
COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Robert Sinkovits

Director of Education and Training, SDSC

Dr. Sinkovits leads the education and training efforts at the San Diego Supercomputer Center, where he has been a computational scientist for more than 25 years. He has collaborated with researchers spanning many fields including physics, chemistry, astronomy, structural biology, finance, ecology, climate, immunology, and the social sciences, always with an emphasis on making the most effective use of high-performance computing resources. Dr. Sinkovits is the PI for the COMPLECS CyberTraining project and co-PI for the Voyager and Expanse supercomputer awards.

Questions?

Contact SDSC Events Coordinator