This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

In the beginning of the project, the concept for the NAM was extremely new and there was only limited practical information, on what to do with it. On top, the NAM prototype is restricted to a memory capacity of 2GB at the moment because of the hardware characteristics of the current Hybrid Memory Cube technology.

Hence, it was hard for application developers to imagine how they could potentially benefit from the NAM technology. Close collaboration was needed between the hardware experts and the application developers in order to determine what functionality should be aimed for as a first suitable NAM use case.

 

Co-Design discussions

Around half-way through the project – once the NAM design was finished to a large extent – discussions between the expert groups were triggered. Following these discussions, the application developers came up with a list of functionalities in the NAM API that their application would most likely benefit from. This wish-list included functionalities like for instance reduction or simple matrix operations. Eventually the teams agreed on a checkpoint/restart mechanism as the first use case for the NAM.

XOR Checkpointing with the NAM

This mechanism makes use of global reductions and fits well into the resiliency aspect of the DEEP-ER project. The decision is also based on the ability to re-use a great portion of the existing software stack, mitigating the risk of a delayed component in the project. In addition, the implementation of more complex functions in the FPGA hardware would have led to less time for hardware development and verification. A fully verified hardware base, however, is a mandatory element for any novel hardware component such as the NAM is.

To find out how XOR checkpointing works with the NAM, read this use case.