Programming Environment
Driven by co-design, DEEP-SEA aims at shaping the programming environment for the next generation of supercomputers, and specifically for the Modular Supercomputing Architecture (MSA). To achieve this goal, all levels of the programming environment are considered.
Driven by co-design, DEEP-SEA aims at shaping the programming environment for the next generation of supercomputers, and specifically for the Modular Supercomputing Architecture (MSA). To achieve this goal, all levels of the programming environment are considered: from programming languages and APIs to runtime interaction with HW resources. DEEP-SEA tackles multiple parallel programming interfaces, including the most commonly used (MPI, OpenMP) as well as alternative approaches (such as PGAS-based libraries), up to Domain Specific Languages (DSLs).
The node-level programming models (OpenMP and OmpSs) will mainly focus on the use and management of new memory technologies and the data placements in heterogeneous memory hierarchies. The system-level programming models (MPI, GASPI, GPI-Space, and OmpSs@cluster) will address malleability, composability and resiliency aspects critical to MSA. The shared memory programming model OpenMP aims at providing a unified runtime to handle every resource on a node, running codes on CPU (with either threads or tasks) and GPU cores. As such, OpenMP is a good approach to handle most of the node complexity with a single family of annotations and APIs. DEEP-SEA will work on OpenMP by extending two runtimes: the MPC OpenMP implementation and OmpSs, which will be upgraded with heterogeneous memory support.
At the system level, one of the core goals is to better accommodate resources dynamically, as scientific applications and the supporting resource manager will evolve towards malleability. The Process Management Interface (PMIx) is an open standard designed to act as the interface between runtime systems and resource managers, and is currently being used by several MPI implementations (e.g., Open MPI). PMIx will be extended for malleability, interoperability and composability. This provides a solid basis for the extensions envisioned in DEEP-SEA. The programming model implementations, namely ParaStation MPI, and OmpSs@cluster, will develop the necessary updates for the new extensions (both runtime and user facing).
Three main open source implementations (all process-based) are available for MPI: MPICH, MVAPICH and Open MPI. Alternative, thread-based implementations exist, which promise better node-level scalability and interoperability than process-based implementations. To cover a wide spectrum of environments, the DEEP-SEA project will work on extending and optimising MPI through two major process-based MPI commercial implementations - ParaStation MPI from ParTec (a derivative of MPICH) and Atos Open MPI (a derivative of Open MPI) - as well as the thread-based MPI implementation MPC from CEA. Upgrades to these MPI implementation will cover the support for the new DEEP-SEA malleability interfaces, with new extensions to the MPI Sessions proposal targeted for MPI-4.0, and to better handle the architectural structure of MSA through topology-aware collectives, notified RMA and job-specific tuning and optimisation.
The DEEP-SEA project will also investigate emerging programming models. GASPI and its implementation GPI-2, provided by FHG, is a PGAS-based model. Such models aim at merging the classical hybrid MPI+threads programming in a single abstraction. In DEEP-SEA, GASPI will be improved to provide better levels of interoperability with other programming models. The common principle of task-based programming models is for application developers to define elementary units of computation, known as tasks, and submit them to a runtime system scheduler, which subsequently maps them on the available computing resources, in a way to optimise a given cost function (usually the overall execution time, but other factors such as energy consumption or memory subscription may also be taken into account). Representatives of system-wide task-based programming models are OmpSs@cluster and GPI- Space. Both are complementary, as they use different task granularity, with coarser granularity on the GPI- Space side.
To lighten the programming burden on the application developers’ shoulders, higher levels of abstraction such as DSL provide an API close to the semantics of the scientific domain. They hide the parallel programming details and generate specialised code for the target architecture. Portability is achieved by transfering the responsibility of writing performant code to the DSL compilation chain. In DEEP-SEA, CEA wil augment Nablab with new passes to tackle memory performance and ETHZ and KTH will enhance DaCe with performance and debug information. Interoperability between Nablab and DaCe internal representations will also be developed.
OmpSs
The MSA architecture provides and unprecedented level of flexibility, efficiency and performance by combining modules with different characteristics. Moreover, some module can be also heterogeneous, combining different compute, memory and network devices on the same node. These two levels of intra- and inter-node heterogeneity are hard to leverage with a programming models that rely only on traditional fork-join and/or Single Program Multiple Data (SPMD) execution models.