CS 533 Parallel Computer Architectures

Spring 2018

Tentative Reading List. We may add/change some papers

1) Cache Coherence and Scalability:

1a) P. Stenstrom. "A Survey of Cache Coherence Schemes for Multiprocessors". IEEE Computer, 1990. Also see Dubois, Annavaram, and Stenstrom's book or Culler and Singh's book.
1b) D. Lenoski at al. "The Directory-Based Cache Coherence Protocol for the DASH Multiprocessors". ISCA 1990.
1c) A. Gupta et al. "Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes". ICPP 1990.
1d) A. Gupta & W. Weber "Cache Invalidation Patterns in Shared-Memory Multiprocessors". Transactions on Computers, July 1992.
1e) J. Torrellas, M. Lam & J. Hennessy "False Sharing and Spatial Locality in Multiprocessor Caches". Transactions on Computers, June 1994.

2) Memory Consistency Models:

2a) K. Gharachorloo et al. "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors". ISCA 1990.
2b) K. Gharachorloo et al. "Performance Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors". ASPLOS 1991.
2c) K. Gharachorloo et al. "Two Techniques to Enhance the performance of Memory Consistency Models". ICPP 1991.
2d) S. Adve & K. Gharachorloo. "Shared Memory Consistency Models: A Tutorial" WRL Research Report 95/7, 1995

3) Prefetching and Forwarding:

3a) T. Mowry et al. "Design and Evaluation of a Compiler Algorithm for Prefetching". ASPLOS 1992.
3b) Y. Solihin et al. "Using a User-Level Memory Thread for Correlation Prefetching". ISCA 2002.

4) Synchronization:

4a) J. Goodman et al. "Efficient Synchronization Primitives for Large Scale Cache-Coherent Multiprocessors". ASPLOS 1989.
4b) J. Mellor-Crummey and M. Scott. "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors". ACM TOCS 1991.

5) Multithreading:

5a) D. Tullsen et al. "Simultaneous multithreading: Maximizing On-Chip Parallelism". ISCA 1995.
5b) D. Tullsen et al. "Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor". ISCA 1996.

6) Multiple Processors on a Chip:

6a) K. Olukotun et al. "The Case for a Single-Chip Multiprocessor".ASPLOS 1996.
6b) G. Sohi et al. "Multiscalar Processors". ISCA 1995.
6c) V. Krishnan and J. Torrellas. "A Chip Multiprocessor Architecture with Speculative Multithreading". IEEE Trans Comp 1999.

7) Speculative Parallelization and Execution:

7a) J. Steffan et al. "A Scalable Approach to Thread-Level Speculation".ISCA 2000.
7b) J. Martinez et al. "Speculative Synchronization: Applying Thread-Level Speculation to Expliticly Parallel Applications". ASPLOS 2002

8) Processor and Memory Integration:

8a) D. Patterson et al. "A Case for Intelligent DRAM". IEEE Micro 1997.
8b) Y. Kang et al. "FlexRAM: Toward an Advanced Intelligent Memory System".ICCD 1999.

9) Reliability:

9a) J. Oplinger et al. "Enhancing Software Reliability with Speculative Threads". ASPLOS 2002.
9b) M. Prvulovic et al. " ReEnact: Using Thread-Level Speculation to Debug Data Races in Multithreaded Codes". ISCA 2003.
9c) S. Mukherjee et al. "Detailed Design and Evaluation of Redundant Multithreading Alternatives". ISCA 2002.
9d) M. Prvulovic et al. "ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors". ISCA 2002.
9e) J. Nakano et al. " ReViveI/O: Efficient Handling of I/O in Highly-Available Rollback-Recovery Servers". (optional) HPCA 2006.

10) Interaction of Operating Systems with Architecture:

10a) J. Torrellas et al. "Characterizing the Caching and Synchronization Performance of a Multiprocessor Operating System". ASPLOS 1992.
10b) B. Vergese et al. "Operating System Support for Improving Data Locality on CC-NUMA Compute Servers". ASPLOS 1996.
10c) P. Trancoso et al. "The Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors". HPCA 1997.
10d) L Barroso et al. "Memory System Characterization of Commercial Workloads", ISCA 1998.
10e)J. Torrellas, Andrew Tucker and Anoop Gupta. "Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors". Journal of Parallel and Distributed Computing (JPDC), February 1995.

11) Message Passing Architectures:

11a) D. Culler and J. Singh's book: Chapter 10.

11b) L. Ni and P. McKinley. "A Survey of Wormhole Routing Techniques in Direct Networks". IEEE Computer 1993.
11c) W. Dally. "Performance Analysis of k-ary n-cube Interconnection Networks," IEEE Trans. on Computers, 1990.
11d) S. Scott and G. Thorson. "The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus ". Hot Interconnects IV, 1996.

12) Dataflow Architectures:

12a) R. Iannucci. "Toward a Dataflow/Von Neumann Hybrid Architecture". ISCA 1988.
12b) A. Veen. "Dataflow Machine Architecture". ACM Computing Surveys, December 1986.

13) Systolic Architectures:

13) M. Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, K. Sarocky, J. Webb. "The Warp Computer: Architecture, Implementation, and Performance." IEEE Transactions on Computers 1987.

14) Data-Parallel Architectures:

14) J. Nickolls. "The Design of the MasPar MP-1. A Cost Effective Massively Parallel Computer". COMPCON 1991.

15) Cache-Only Memory Architectures:

15) Cache-Only Memory Architecture