header news2

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

The DEEP-EST prototype has come together in multiple steps since 2019. It is the result of a strict and detailed co-design process with six application teams. It includes the following modules:


Hardware of DEEP-EST


  • Cluster Module (CM): Intel® Xeon® based HPC Cluster with high single-thread performance and a universal Infiniband interconnect.
  • Data Analytics Module (DAM): Intel® Xeon® based Cluster with non-volatile, byte-addressable memory, one Intel Stratix 10 FPGA and one NVIDIA Tesla V100 card per node. They are interconnected by 40 Gb/s Ethernet and a 100 Gb/s EXTOLL fabric.
  • Extreme Scale Booster (ESB): NVIDIA Tesla-based nodes with a small Intel® Xeon® CPU and EXTOLL 3D Torus interconnect; objective is to run all applications from the local V100’s HBM2 memory, and use GPUDirect technology to bypass CPU for network communication.
    The CM has 50 nodes, and the ESB is planned for 75 nodes - both make use of Megware's ColdCon liquid cooling technology. The DAM consists of 16 air-cooled nodes with a large memory buildout.
  • A storage and service module provides high-performance disk storage and  login nodes. This module uses 40 Gb/s Ethernet and runs the BeeGFS parallel file system.
  • For high-speed parallel I/O, the All-Flash Storage Module (AFSM) provides a total of 1.8 PB storage space, also on a BeeGFS file system.

A network federation infrastructure ties all the modules together, supporting MPI and IP communication. It is implemented using fabric-to-fabric gateways, and for MPI utilizes optimized high-bandwidth RDMA communication.


The ESB's EXTOLL 3D Torus interconnect provides the ability for Network Attached Memory nodes that will facilitate persistent shared memory resources at EXTOLL network speeds, and an experimental Global Communication Engine that optimizes collective MPI communication on the ESB, with major projected improvements compared to conventional implementations.


Thus, the DEEP-EST prototype demonstrates the benefits of the Modular Supercomputing Architecture: parts of complex applications or worklflows can run on the best matching architecture -- CM for codes with limited parallelism relying on per-thread performance, ESB for highly scalable codes, and the DAM for machine learning and data analytics codes that require huge memory, high I/O performance and can benefit from GPGPU or FPGA accelerators.


One overriding objective in the DEEP-EST co-design process was the minimization of energy used -- selection of leading-edge processors and accelerators, careful matching of compentents to achieve highest end-to-end performance, and provision of real-time energy monitoring data are key elements here. One noteworthy result is that we are able to measure the total node energy consumption for each CM and ESB node in realtime, including CPU, memory, accelerators and NICs. This enables higher-level software layers to reach the right decisions in minimizing energy use.