Current advances in computer science and other disciplines rely on the massive computation horsepower of data parallel architectures, such as GPUs. Programming data parallel architecture is not easy, as it requires the efficient handling of data movements across the memory hierarchy of thousands of processing cores.
To date, data movement problems have been primarily studied in uni-core and multi-core programming systems. Thus, shifting to a many-core programming paradigm presents the new challenges of 1) scalability, 2) software and hardware interface, and 3) addressing the trade-off between performance and energy. First, the data movement models in uni-core and multi-core processors do not scale well, thus, this project develops scalable analytical models and yet provides powerful heuristics in practice. Second, it is important to redefine the responsibilities of software and hardware. Given the complexity of many-core architecture, it is impossible to solve data movement problems using software-only or hardware-only approaches. This project optimizes data movements with a cross-stack design principle that aims to combine the strengths of software and hardware. Third, previous studies have focused on performance without much consideration to issues of power and energy efficiency. This project targets both performance and energy, models the energy cost of data movement and integrates this information into the power/energy model for the entire system. Overall, this project can help shape future software-hardware cache interfaces and lay the foundation for the design of next-generation cache systems