In this talk we will describe our group's work on efficient stacked DRAM cache designs. In the first part, we will look at multi-core CPUs workloads and improve the DRAM cache design from the perspective of performance and energy. For improving the performance we proposed an Bi-Modal DRAM Cache, which supports two different cache block sizes and organizes data with high spatial locality as large blocks and the rest as small blocks to improve both hit rate, off-chip bandwidth, and cache capacity utilization. The design also includes a
novel Way-Locator, to improve the hit-latency of tags-in-DRAM organization. To improve the energy efficiency of stacked DRAM cache, we proposed Micro-Refresh DRAM Cache design which eliminates more than 90% of refresh overheads in stacked DRAM caches. Interestingly, eliminating refresh of useful cache lines with long reuse distance can potentially improve the overall average memory latency, leading to marginal gains in performance. Incidentally both Bi-Modal cache and MicroRefresh DRAM came were based on insights derived from an analytical performance model for DRAM Cache which we will not describe in the talk.
For Integrated Heterogeneous System Architectures which pack pack latency-oriented CPU cores with throughput-oriented GPU cores, we propose HAShCache, Heterogeneity-Aware Shared DRAM Cache. HAShCache address the disparate demands from CPU and GPU cores
for DRAM Cache and memory accesses by prioritizing CPU requests at the DRAM Cache Controller, by selectively bypassing DRAM Cache for CPU requests, and by controlling the occupancy of GPU lines in the DRAM cache.