– This interview is part of a series of interviews done on the occassion of the ISC'16 BoF on 'Programming Models for Exascale - Slow Transition or Complete Disruption'. You can find the other interviews here. –
Q: In most Exascale panels or BoF sessions, hardware is in the limelight. Why do you want to talk about software, or more specifically programming models, and why do you think now is the perfect timing?
Jesus: It has always been the perfect time but we have been blind to it.
The dazzling performance delivered for some kernels or miniapps when some new specific hardware appears is very often cheered without really considering the actual implication on the source code structure and maintainability (readability, expandability, reuse, …).
I always remember a conversation with Don Grice (IBM) when the Teraflop barrier was broken. At that time, he was saying that there was no doubt that the Exaflop machine would be feasible, but whether it would be usable (programmable) was another thing.
The issue has been here for quite some time. It may also be the case now that the perception that doing specific refactoring efforts for every new architecture or even system dimensioning is certainly non sustainable for application developers. Those that have done some specific efforts are perceiving the huge cost it implies. The shaky environment with many different proposals and doubts on what the wining model will be probably pushes developers to a wait and see position. But such position can not be continued for a long time.
Q: Coming back to hardware – which is very obviously intrinsically linked to the programming models. When considering the clear trend towards heterogeneous systems, the big question is: How can you program such beasts and exploit the hardware as optimal as possible?
Jesus: We should forget about how to use the hardware as optimal as possible. We should focus on specifying our algorithms and programs with a top down nested parallel approach where every level contributes. The programming model should provide clean and elegant mechanism to convey to the runtime the actual computations, dependences and data accesses by the different program components (tasks).
I do believe that if we do so, we will get the side effect of exploiting the hardware in very efficient ways. Of course this will require the development of smart and dynamic runtimes and resource management systems, but the necessary research is being done and results are coming.
Heterogeneous platforms are just a special case where specific optimized implementation of the task is provided (if the accelerator has a different ISA). This requires some additional intelligence form the runtime in its scheduling policies and taking care of handling the different address spaces and data movements (if the devices have their own).
In summary the way of addressing heterogeneity is by homogenising it!
Q: What steps does the community need to take – e.g. prepare applications or enhance skills in certain areas – to realise the benefits of the research and development being undertaken towards Exascale programming models?
Jesus: We need first to understand the actual and detailed behaviour of our applications, to then apply the most appropriate feature of the programming model and runtime to address the specific issue (e.g. load imbalance in one part of the code). Blindly applying the new features to a whole application is a huge effort that is certainly not cost effective. Applying refactoring techniques in non-relevant or inappropriate code sections may be a total effort loss if not counterproductive.
Performance analysis tools play a very important role here to show the real nature of the inefficiencies in our programs and systems and to enable reasoning on the actual fundamental causes of the problems. Tools should support the inception of very open/creative approaches of how to fix the issues (which do not necessarily imply directly attaching the causes).
Regarding programmers, I consider that an important (re-)education effort is needed. Today programmers often have a latency limited mentality (fork-join, explicit scheduling of computations to threads etc.) and they believe that their mental models correctly correspond to the machine behaviour. With the complexity and variability in our systems this is not the case today, and will be less and less in the future. We need to make programmers understand this new situation and change to a throughput oriented mentality (task based, with dependences and malleability). In this new mind-set I consider the programmer should use a methodology that could be described as “think global, specify local”. High level potential overlaps of computations (and communications) should be considered by the programmer when thinking of the overall application structure. When coming to the actual coding, annotations of data accessed by a task should be specified only thinking on how the task uses the data.
Q: All three of you have been deeply involved in research on future programming models in the last years. Anything you want to highlight from your work?
Jesus: Our research effort on OmpSs aims at exploring programming model features (interface between the programmer and the system) that follow the previously mentioned philosophy and enables the runtime to detect and exploit parallelism as well as manage data and locality for the specific target architecture.
Strategically, we try to actively contribute to OpenMP with our proposals and demonstrated benefits. We also try to be as close as possible to OpenMP in order to be able to leverage existing large OpenMP codes as starting point where to incrementally apply our proposed techniques.
We are strongly committed to provide a stable open source infrastructure (Mercurium compiler for C, C++, FORTRAN and NANOS runtime) that handles large production codes. We try to ensure, by not radically diverting from OpenMP, that the efforts to refactor applications to use OmpSs are not lost. We also give our users the opportunity to experiment with features in our environment that they might push for in front of the standardization committee if proven useful to their codes.
Q: What do you expect to be the programming model of an Exascale system? Do you think it will be the evolution of a current one or something completely new?
Jesus: I remember long discussions since the very first IESP meetings on whether we need an evolutionary or revolutionary approach.
I do believe that MPI+OpenMP with the appropriate features in OpenMP will be the model used at exascale. This does not seem very revolutionary.
By the way, even if at the time vendors and users seemed to be receptive to very revolutionary approaches I get the feeling a quite “conservative” MPI+OpenMP is the main choice by many.
The real revolution will consist from my point of view in the change of mentality previously mentioned. The programmer should leave the control to the runtime, rely on it and be confident that very high efficiency will be achieved. And this is a real revolution!
About the interviewee
Jesus Labarta is full professor on Computer Architecture at the Technical University of Catalonia (UPC) since 1990. Since 1981 he has been lecturing on computer architecture, operating systems, computer networks and performance evaluation. Since 2005 he is responsible of the Computer Science Research Department within the Barcelona Supercomputing Center (BSC). He has been involved in research cooperation with many leading companies on HPC related topics. His major directions of current work relate to performance analysis tools, programming models and resource management. He is involved in the development of the OmpSs programming model and its different implementations for SMP, GPUs and cluster platforms.