Roofline-model-driven optimizations for elastic wave propagation on modern processorsAuthors:
Modern computing chips are tagged with a maximum performance, defined as themaximum attainable floating point operations per second (FLOPS) which, however,cannot be effectively attained by most software applications. Very frequentlywave simulation applications can only reach a small percentage of the peakperformance of such computers in their most naive versions. Hence codeoptimization, i.e. being able to improve the percentage of usage of thehardware, leads to great returns in terms of time-to-solution and energyusage. Nevertheless code optimization is a non-trivial task which can appearendless unless some thorough study of the code algorithmics and the computerarchitecture are carried out.
Full wave elastic propagation is expensive both, computational- anddevelopment-wise. Hence, maximizing the efficiency of propagation codes atthe minimum development cost is important. To that goal, a mechanism that evaluates boththe current and attainable maximum efficiency of your application isneeded. Within this work we will show our experience in enhancing an elasticwave propagation code by means of a roofline-directed optimization strategy. Thisstrategy guides developers in finding potentially benefitial optimizations andassessing when the maximum possible performance has been achieved.
The roofline model provides with insight into an application's behavior by placingits performance into a graphical representation bounded by both the maximum(attainable) FLOPS and memory bandwidth. In order to use the model, ameasure of the application's operational intensity and efficiency is needed.The model identifies the upper limit of the application's performanceat its current operational intensity, thus suggestinga set of profitable optimizations and their implementation order. This, in turn, helpsreducing code development costs while attainingthe best performance enhancement possible for the targetapplication.
In this work we apply the roofline model to an elastic wave propagationapplication. We show the process that we have followed to optimize aproduction-ready code, almost doubling its performance in a few series of stepsby means of reducing performance gaps in the roofline model in a sequential way.