header news2

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View Privacy Policy

In a Birds of a Feather session at the ISC 2016 conference, three distinguished researchers will share their views on Programming Models for Exascale. The speakers are Professor Jesus Labarta from BSC, Dr Valeria Bartsch from Fraunhofer ITWM and Dr Mark Bull from EPCC. In this interview, the BoF organisers Estela Suarez and George Beckett get a sneak preview on the key topics for the session and the speakers’ viewpoints.

Q: In most Exascale panels or BoF sessions, hardware is in the limelight. Why do you want to talk about programming models, and why do you think now is the right time to do so?

Valeria: Imagine you have the coolest kitchen in the world, with state-of-the-art electrical appliances, but you do not have a cookery book. What good is this kitchen? We are here to provide the cookery book, in other words the programming models enabling you to handle the Exascale hardware.

Mark: That’s exactly the point. And as robust and efficient implementations of programming models take time to develop, we need to be working on them before the hardware appears, not after! Especially since it is likely that supercomputer systems are going to get even harder to program than they are now.

Jesus: Good point, Mark! I always remember a conversation with Don Grice from IBM when the TeraFLOPS barrier was broken. Back then he stated that there is no doubt that an ExaFlOPS machine would be feasible, but whether it would be usable (programmable) was another question. The current uncertainty, with many different proposals and doubts on what the wining model will be, probably pushes developers to a wait-and-see position. But such a position cannot continue for long.

Q: Coming back to hardware – which is very obviously intrinsically linked to the programming models. When considering the clear trend towards heterogeneous systems, the big question is: How can you program such beasts and exploit the hardware in an optimal way?

Mark: A single programming model that address all levels of parallelism from distributed memory down to vector lanes is the Holy Grail. But we aren’t there yet, and given the time taken for programming models to mature, we need something that will work in the short to medium term. So I think that means making multiple APIs work well together in the same application: we’ve already seen reasonable successes with MPI + OpenMP, for example, and I’d expect that trend to continue.

Jesus: We should forget about how to use the hardware as optimally as possible and instead focus on specifying our algorithms and programs with a top down nested parallel approach where every level contributes. The programming model should provide a clean and elegant mechanism to convey to the runtime the actual computations, dependences and data accesses by the different program components (tasks). I do believe that if we do so, we will discover that we do exploit the hardware in very efficient ways.

Valeria: This is kind of typical for the mainstream discussion in HPC. The answers focus on codes that have a long history of using HPC. But lets also take a look at e.g. Big Data applications that have started to appear in the HPC world. Here we see a completely different picture that comes with a huge advantage: You can use the full force of HPC tools to help parallelise the software without having to bother about already existing communication patterns.

Q: What steps does the community need to take – e.g. prepare applications or enhance skills in certain areas – to realise the benefits of the research and development being undertaken towards Exascale programming models?

Mark: I think applications will need to make use of some of the newer features in existing programming models - for example the task and target constructs in OpenMP (and not just parallel loops), and single-sided versus two-sided message passing.

Valeria: At Exascale, communication will be THE bottleneck. Due to the disruptive changes in hardware, even traditional HPC applications need to rethink their communication patterns. Hardware-driven RDMA [remote, direct memory access] already allows for one-sided and asynchronous programming. From a software point of view we have to prepare applications so that the computation and communication phases can overlap.

Mark: But we need to face it: These approaches require quite different ways of thinking about parallelism and substantial refactoring of applications, so training developers is going to be very important, as is developing tools to support them in understanding the performance of programs with more irregular and asynchronous behaviour.

Jesus: I fully agree: A (re-)education effort is badly needed. Today programmers often have a latency limited mentality (fork-join, explicit scheduling of computations to threads, and so on) and they believe that their mental models correctly correspond to the machine behaviour. With the complexity and variability in our systems, this is not the case today, and will be less and less in the future. We need to make programmers understand this new situation and change to a throughput-oriented mentality (task-based, with dependences and malleability).

Q: All three of you have been deeply involved in research on future programming models in the last years. Anything you want to highlight from your work?

Valeria: In the EXA2CT project we have been able to show that the overlap of communication and computation phases I’ve just mentioned is actually possible with the GASPI/GPI programming model. At an extreme scaling workshop on SuperMUC, our team has scaled up a Reverse Time Migration (RTM) code, a seismic imaging technique, by over three orders of magnitude to run on more than 65k cores. This is a great accomplishment.

Jesus: Our research effort on OmpSs aims at exploring programming model features (interface between the programmer and the system) that follow the previously mentioned philosophy. It enables the runtime to detect and exploit parallelism as well as to manage data and locality for the specific target architecture. Our research results on that look very promising.

Mark: Not from my personal research, but I’d like to mention the EU project INTERTWinE, which is specifically looking at solving some of the interoperability issues between different HPC programming APIs.

Q: What do you expect to be the programming model of an Exascale system? Do you think it will be the evolution of a current one or something completely new?

Valeria: The programming model we’ve developed at Fraunhofer ITWM, GASPI/GPI, will play an important role: It possesses all the important features for Exascale readiness: one-sidedness, asynchronous behaviour and zero-copies. It puts an application in the position to hide communication times completely. However it will be difficult to fully port traditional HPC applications. So interoperability between the programming models will become more and more important. This is an active field of research in the previously mentioned INTERTWinE project and the EPiGRAM project.

Jesus: I remember long discussions since the very first IESP meetings, on whether we need an evolutionary or revolutionary approach. I do believe that MPI + OpenMP, with the appropriate features in OpenMP, will be the model used at Exascale. This does not seem very revolutionary. In my opinion, the real revolution will consist in the change of mentality previously mentioned. The programmer should then leave the control to the runtime, rely on it and be confident that very high efficiency will be achieved.

Mark: I can see both approaches having a role – there may be some systems that are designed with a particular application or class of applications in mind, where it will be worthwhile to use a bespoke programming API to exploit them. But for general-purpose systems, applications developers will still need the reliability and portability of existing standards, though they may exploit new features and new combinations of programming models to achieve the high scalability needed.