To counter the loss of dennard scaling chip designers have focused on energy-efficient compute accelerators. It imperative to improve the energy efficiency of data delivery and ensure that the memory interface is comprehensible to a wide variety of programmers.
In this talk, I will focus on our recent IEEE MICRO TOP PICKs work on providing energy efficient cache coherence for accelerators. I will discuss a new time-based coherence framework, called Temporal Coherence (TC), that exploits globally synchronized counters in single-chip systems to develop a streamlined coherence protocol. I will present two memory models: TC-strong and TC-weak for accelerators, and evaluate the performance benefits of weak memory models . Finally, I will discuss important on-chip communication optimizations enabled by TC to reduce data transfer latency and network bandwidth.