COMPLECS: Batch Computing: Working with the Linux Scheduler

Remote event

Understanding what a scheduler is and how it works is fundamental to learning how to run your batch computing workloads on high-performance computing (HPC) systems well. A scheduler manages all aspects of how your application will access and consume the compute, memory, storage, I/O, and network resources available to you on these systems. There are a number of different distributed batch job schedulers — also sometimes referred to as workload or resource managers — that you might encounter on an HPC system. For example, the Slurm Workload Manager is the most popular one in use today on HPC systems. However, at the core of every such system sits the Linux scheduler. 

In this first part of our series on Batch Computing, we will introduce you to the concept of a scheduler — what they are, why they exist, and how they work — using the Linux scheduler as our reference implementation and testbed. You will then learn how to interact with the Linux scheduler on your personal computer by running a series of example exercises intended to teach you about the most fundamental aspects of scheduling, including turning foreground processes into background ones and controlling their priority relative to the other processes running on your system. 

To complete the exercises covered in Part I, you will need access to a computer with either:

  • a Linux operating system (OS);
  • a Unix-like OS such as macOS;
  • a Linux-compatible OS environment such as the Windows Subsystem for Linux; or
  • a virtual machine running a Linux OS through a hypervisor like VirtualBox.

----

COMPLECS (COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure) is a new SDSC program where training will cover non-programming skills needed to effectively use supercomputers. Topics include parallel computing concepts, Linux tools and bash scripting, security, batch computing, how to get help, data management and interactive computing. Each session offers 1 hour of instruction followed by a 30-minute Q&A. COMPLECS is supported by NSF award 2320934.

Instructor

Marty Kandes

Computational and Data Science Research Specialist, SDSC

Marty Kandes is a Senior Computational and Data Science Research Specialist at the San Diego Supercomputer Center (SDSC). As part of the High-Performance Computing (HPC) User Services Group within the Data-Enabled Scientific Computing Division, he provides technical user support and services to the national research community leveraging the Advanced Cyberinfrasurcture (CI) and HPC resources designed, built and operated by SDSC on behalf of the U.S. National Science Foundation (NSF). Marty is also a member of the National Artificial Intelligence (AI) Research Institute for Intelligent CI with Computational Learning in the Environment (ICICLE). His current research interests include problems in distributed AI inference over wireless networks, data privacy in natural language processing, and secure interactive computing. He also contributes to many of the education, outreach, and training initiatives at SDSC, including serving as a Co-PI for the COMPrehensive Learning for end-users to Effectively utilize CyberinfraStructure (COMPLECS) CyberTraining program and as mentor for the Research Experience for High School Students (REHS) program. Marty received his Ph.D. in Computational Science from the Computational Science Research Center (CSRC) at San Diego State University (SDSU), where he studied quantum systems in rotating frames of reference through the use of numerical simulations. He also holds an M.S. in Physics from SDSU and dual B.S. degrees in Applied Mathematics and Physics from the University of Michigan, Ann Arbor.