header software2

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View Privacy Policy

High frequency monitoring and operational data analytics in the DEEP projects is performed by the Data Center Data Base (DCDB) and its data anlytics framework Wintermute.

The Data Center Data Base (DCDB) is a modular, continuous and holistic monitoring and data analytics framework targeted at HPC environments.

DCDB consists of three main components as depicted in the figure:

  • Pusher: The main component for collecting data is the Pusher. It allows to run arbitrary data collection plugins and pushes the data to a Collect Agent via the MQTT protocol.
  • Collect Agent: The collect agent functions as intermediary between one Storage Backend and one or multiple Pushers, acting as a data broker.
  • Storage Backend: Here all the collected data is stored. By default a Cassandra database is used, but the framework is intended to support usage of other data storage solutions as well.

DCDB Architecture

DCDB is intended for holistic monitoring of HPC systems and their supporting infrastructures, such as system hardware or software, applications, I/O, power provisioning and cooling. The plugin-based Pusher allows for easy integration of different APIs and protocols for data collection: it has been designed for low overhead and allows for in-band and out-of-band monitoring with minimal performance impact on applications running on the HPC system. The distributed Storage Backend, on the other hand, provides horizontal and vertical scalability to collect data from large scale supercomputers with thousands of nodes at high frequency. Easy access to the collected data is provided via a Grafana-based visualization front-end, as well as via a user-space library and several command-line tools to perform sophisticated database queries. DCDB can also be easily leveraged for data collection in existing monitoring solutions via its RESTful APIs that provide access to the most recent data.

On top of its monitoring capabilities DCDB includes Wintermute, a plugin-based framework to enable Operational Data Analytics (ODA) on HPC systems: Wintermute is deeply integrated within DCDB and allows to ingest monitoring data as it is acquired, process it with state-of-the-art techniques via operators, and in turn enable analysis to improve a system’s efficiency and effectiveness. Due to its large variety of deployment options, Wintermute can be used to perform both in-band and out-of-band ODA, in an online or on-demand fashion, and can scale to large HPC installations with little overhead.