Basics
Tiering
Usually we classify HPC Systems in different tiers,
As indicated on the side the idea is that the computing power rises when moving up the pyramid while availability of such systems decreases.
A list of the most powerful HPC Systems or Supercomputers of the world is the Top500 where Frontier reached the Exaflop mark. In Europe Lumi is the fastest (global #3) with 309 Petaflops.
https://en.wikipedia.org/wiki/FLOPS#/media/File:Supercomputer_Power_(FLOPS),_OWID.svg
Before we talk about how these systems achieve astonishing performance we discuss how the performance is measured
What are flops and how are they measured
Flops or floating point operations per second is the usual measure of performance used here, as floating point operations are the building blocks of most compute intense workloads. It is usually much more reliable than instructions per seconds.
Needless to say, the prefix exa corresponds to .
For comparison, a current Intel or AMD CPU with 16 Cores will have about 220 Gflops, a recent apple watch about 3 Gflops, and the Intel Pentium II 0.4 Gflops (released 1999).
We already peeked into BLAS and LAPACK before, but they come into play here again. In order to create the Top500 list, or more precisely, in order to get your supercomputer on the list, you run the HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers.
HPL is a software package that solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark.
Of course there are a lot of factors that can potentially influence this performance and we are going to look into ways to maximize it when we look into the architecture.
Before we do that, we also want to mention, that the double precision (64 bit) here is of importance. As we have learned, the performance of GPUs highly depends on the used precision so this benchmark will only be one part of how the system performs.
Nevertheless, this benchmark or its predecessors have been around for a very long time and nobody has come up with a better way to measure performance so the number does represent a reliable and standardized way to measure performance.