With regard to hardware support for shared-memory concurrency, an
inherent trade-off between programmability and performance is
presumed. For instance, the most intuitive memory consistency model,
sequential consistency (SC), is presumed to be too expensive to
support; likewise primitive synchronization instructions such as
memory fences and atomic read-modify writes (RMWs) are costly in
current processors; finally, there are question marks about whether
cache coherence protocols will scale with increasing number of cores.
In this talk, I will argue that it is indeed possible to provide
hardware support that enhances programmability without sacrificing
performance. The key insight is semantics-directed design: hardware
design should be guided by precise formal specifications instead of
ad-hoc informal ones. I will illustrate this idea by showing how SC
can be enforced efficiently using a novel technique to enforce memory
ordering dubbed conflict ordering. Second, I will show how RMWs can
be implemented efficiently in x86 architectures. Third, I will
introduce a scalable approach to cache coherence called
consistency-directed coherence. I will conclude by outlining the
challenges of verifying such consistency-directed (and conventional!)
protocols.