BOLT: A Lightning-Fast OpenMP Runtime

BOLT: BOLT is OpenMP over Lightweight Threads

OpenMP is a directive-based parallel programming model for shared memory computers. Thanks to its simple incremental parallelization method, OpenMP has been widely used in many applications. While current OpenMP implementations based on OS-level threads (e.g., pthreads) perform well on computation-bound codes that can be evenly divided among threads, they are encountering some challenges observed in recent HPC trends. OpenMP applications are demanded to express more parallelism to fully utilize increasing CPU cores. Irregular or non-traditional applications use OpenMP task constructs to express fine-grained parallelism rather than traditional work sharing constructs.

BOLT targets a high-performing OpenMP implementation, especially specialized for fine-grain parallelism. Unlike other OpenMP implementations, BOLT utilizes a lightweight threading model for its underlying threading mechanism. It adopts Argobots, a new holistic, low-level threading and tasking runtime, in order to overcome shortcomings of conventional OS-level threads. The BOLT implementation is based on the LLVM OpenMP runtime, and thus it can be used with GNU C/C++ Compilers, Clang/LLVM, and Intel C/C++ Compilers.

Best Paper Award at PACT ’19!

Our paper on the BOLT runtime system, titled “BOLT: Optimizing OpenMP Parallel Regions with User-Level Threads,” won a Best Paper Award at the 28th international conference on Parallel Architectures and Compilation Techniques (PACT ’19), held in Seattle, Washington, in September 2019!

Efficient Nested Parallelism

The growing hardware parallelism in HPC compute nodes is pushing applications to chunk work more fine-grained to expose parallelism opportunities. This is often achieved through nested parallelism either in the form of parallel regions or explicit tasks. BOLT spawns OpenMP threads and tasks as Argobots work units and manages them through its efficient work-stealing scheduler. Thanks to lightweight Argobots threads, BOLT can minimize threading and tasking overheads and offer a significantly better trade-off between high concurrency and thread management overheads.

High ABI Compatibility

BOLT was derived from LLVM OpenMP~7.0 to inherit its optimized and modern OpenMP support as well as its application binary interface (ABI) compatibility with other widely used OpenMP runtimes. This ABI compatibility with GCC, LLVM and Intel OpenMP enables BOLT to run OpenMP-parallelized applications compiled with the GNU, Intel, and Clang C/C++/Fortran compilers without modification and recompilation of user programs. BOLT can be used even with commercial and closed source OpenMP-parallelized codes.


Hybrid programming mixing OpenMP and MPI requires better interoperability between two programming models, which is usually connected through the common threading model. These challenges might be difficult or inefficient to be handled in the current OpenMP implementation due to their underlying heavyweight threading model. BOLT interoperates with several MPI libraries including MPICH and Open MPI through the Argobots threading layer rather than OS-level threads.