Generally speaking, performance programming benefits from a prior knowledge about the application. The more we know about the problem we are solving, the more effectively we can modify application source to dramatically improve some aspect of performance. Due to tight system design constraints, this process is especially important at the exascale.
I will describe current work in my research group aimed at solving two performance programming problems. Our approach is to build a custom, domain-specific source-to-source translator that incorporates the knowledge of a performance programming expert.
The translators perform semantic level optimizations, which are unavailable to a traditional compiler working with conventional language constructs.
The first translator, Bamboo, transforms annotated MPI source into a data driven form that tolerates communication automatically. Running on up to 96K processors of a Cray XE-6, our translator meets or exceeds the performance of hand coded overlap variants.
The second translator, Mint, transforms annotated C++ stencil codes into highly optimized CUDA that comes close (80%) to the performance of carefully hand coded CUDA.
Domain specific translation is an effective means of managing development costs. Both translators enable the domain scientist to remain focused on the domain science, while realizing performance usually attributed to expert coders.