header software2

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

The heterogenous nature of the MSA poses new challenges to job schedulers and resource management systems that will be approached in the DEEP-EST project.

 

Resource management and job scheduling will play a pivotal role for the success of the overall MSA. Existing job schedulers guarantee efficient use of monolithic supercomputers. However the MSA requires capabilities to manage heterogeneous resources, to enable co-scheduling of resource sets across modules, and to handle dynamically varying resource-profiles. At the same time similar new challenges arise on the side of resource management, which have to be ready for heterogeneous platforms and provide sufficient scalability for future production systems.

 

Resource Management

Resource management in the DEEP-EST project includes monitoring and controlling of the node-local resources, reporting their status to and taking requests from the global job scheduler. A dedicated entity located on each node of the system creates local processes of a distributed MPI session and explicitly assigns computing and other resources that the local node offers to these processes. This entity is also responsible for supervising processes during their lifetime, accounting resource usage, and to ensure a proper clean-up after termination. Part of the supervising activity is the forwarding of standard I/O channels and signals.

This resource management entity will be embodied by the ParaStation management daemon, which will in turn interact with SLURM as the batch system and job scheduler via its psslurm plugin. To meet the MSA needs, the existing ParaStation process manager infrastructure has to be extended to deal with the heterogeneity and diversity of different system parts as well as the resulting process structure of application bundles running on them. Extensions for managing resources that are not represented by regular MPI processes (e.g. the NAM or HPDA segments) have to be developed. The local management daemons form a comprehensive network across the whole MSA system and will serve to consolidate information across modules to facilitate the orchestration of the whole system.

 

Job Scheduling in DEEP-EST

On a Modular Supercomputer Architecture (MSA) system, applications will either use resources within a single module only, or run across different modules either at the same time, or successively in a workflow like model. This requires scalable scheduling and co-allocation of resources for jobs within and across modules.


In DEEP-EST, the widely used Open Source scheduler SLURM will be extended with features for efficient and scalable scheduling on MSA-systems. The proposed SLURM scheduler will include a parallel scheduling scheme where job scheduling for each module can be done independently of the other modules, and a communication mechanism between the parallel scheduling instances to enable co-allocation of resources. This Task will explore two implementation strategies and select the most convenient one for the project for realisation:

  • Implement separate scheduler plugins for each module based on SLURM’s internal plugin mechanism and enabling them to run in parallel. A communication mechanism that connects and shares job/resource information with different plugins will be developed to schedule jobs across modules. This approach fits in the standard mechanism of SLURM extensions.
  • Modify the central scheduler to be multithreaded to enable parallel scheduling of jobs within a module. The parallel schedulers will enable coordination through shared memory to schedule jobs that require resources across modules.

 

This approach creates multiple advantages such as being able to specify different scheduling policies for each module, ensure faster response times and improve scheduling scalability. The team will also investigate the need for and benefits of dynamic scheduling of workflows: since types and quantities of resources often differ during runtime, dynamic resources scheduling potentially improves job turnaround time and overall throughput. Furthermore, it will be explored how to use the energy data and information on data location in order to improve the module-local scheduling of resources.