1) Cache Coherence and Scalability:
1a) P. Stenstrom. "A
Survey of Cache Coherence Schemes for Multiprocessors". IEEE Computer,
1990. Also see Dubois, Annavaram, and Stenstrom's book or
Culler and Singh's book.
1b) D. Lenoski at al. "The
Directory-Based Cache Coherence Protocol for the DASH Multiprocessors".
ISCA 1990.
1c) A. Gupta et al. "Reducing
Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence
Schemes". ICPP 1990.
1d) A. Gupta & W. Weber "Cache
Invalidation Patterns in Shared-Memory Multiprocessors". Transactions on
Computers, July 1992.
1e) J. Torrellas, M. Lam & J. Hennessy
"False Sharing and Spatial Locality in Multiprocessor Caches". Transactions
on Computers, June 1994.
|
2) Memory Consistency Models:
2a) K. Gharachorloo et al. "Memory
Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors".
ISCA 1990.
2b) K. Gharachorloo et al. "Performance
Evaluation of Memory Consistency Models for Shared-Memory Multiprocessors".
ASPLOS 1991.
2c) K. Gharachorloo et al.
"Two Techniques to Enhance the performance of Memory Consistency
Models". ICPP 1991.
2d) S. Adve &
K. Gharachorloo.
"Shared Memory Consistency Models: A Tutorial" WRL Research
Report 95/7, 1995
|
3) Prefetching and Forwarding:
3a) T. Mowry et al. "Design
and Evaluation of a Compiler Algorithm for Prefetching".
ASPLOS 1992.
3b) Y. Solihin et al.
"Using a User-Level Memory Thread for Correlation Prefetching".
ISCA 2002.
|
4) Synchronization:
4a) J. Goodman
et al. "Efficient Synchronization Primitives for Large Scale Cache-Coherent
Multiprocessors". ASPLOS 1989.
4b) J. Mellor-Crummey
and M. Scott. "Algorithms for Scalable Synchronization on Shared-Memory
Multiprocessors". ACM TOCS 1991.
|
5) Multithreading:
5a) D. Tullsen et al.
"Simultaneous multithreading: Maximizing On-Chip Parallelism".
ISCA 1995.
5b) D. Tullsen et al.
"Exploiting
Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading
Processor". ISCA 1996.
|
6) Multiple Processors on a Chip:
6a) K.
Olukotun
et al. "The Case for a Single-Chip Multiprocessor".ASPLOS 1996.
6b) G. Sohi et al.
"Multiscalar
Processors". ISCA 1995.
6c) V. Krishnan and J.
Torrellas.
"A Chip Multiprocessor Architecture with Speculative Multithreading".
IEEE Trans Comp 1999.
|
7) Speculative Parallelization and Execution:
7a) J. Steffan et al.
"A
Scalable Approach to Thread-Level Speculation".ISCA 2000.
7b) J. Martinez et
al.
"Speculative Synchronization: Applying Thread-Level Speculation to
Expliticly Parallel Applications". ASPLOS 2002
|
8) Processor and Memory Integration:
8a) D. Patterson et al.
"A
Case for Intelligent DRAM". IEEE Micro 1997.
8b) Y. Kang et al.
"FlexRAM:
Toward an Advanced Intelligent Memory System".ICCD 1999.
|
9) Reliability:
9a) J. Oplinger et al.
"Enhancing Software Reliability with Speculative Threads".
ASPLOS 2002.
9b) M. Prvulovic et al.
" ReEnact: Using Thread-Level Speculation to Debug Data Races in Multithreaded
Codes". ISCA 2003.
9c) S. Mukherjee et al.
"Detailed Design and Evaluation of Redundant Multithreading
Alternatives". ISCA 2002.
9d) M. Prvulovic et al.
"ReVive: Cost-Effective Architectural Support for Rollback Recovery
in Shared-Memory Multiprocessors". ISCA 2002.
9e) J. Nakano et al.
" ReViveI/O: Efficient Handling of I/O in Highly-Available Rollback-Recovery
Servers". (optional) HPCA 2006.
|
10) Interaction of Operating Systems with Architecture:
10a) J. Torrellas et al. "Characterizing
the Caching and Synchronization Performance of a Multiprocessor Operating
System". ASPLOS 1992.
10b) B. Vergese
et al. "Operating System Support for Improving Data Locality on CC-NUMA
Compute Servers". ASPLOS 1996.
10c) P. Trancoso et al. "The
Memory Performance of DSS Commercial Workloads in Shared-Memory Multiprocessors".
HPCA 1997.
10d) L Barroso et
al. "Memory System Characterization of Commercial Workloads",
ISCA 1998.
10e)J. Torrellas, Andrew Tucker and Anoop Gupta.
"Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory
Multiprocessors". Journal of Parallel and Distributed Computing (JPDC), February
1995.
|
11) Message Passing Architectures:
11a) D. Culler and J. Singh's book: Chapter 10.
11b) L. Ni and P. McKinley.
"A Survey of Wormhole Routing Techniques in Direct Networks". IEEE
Computer 1993.
11c) W. Dally. "Performance
Analysis of k-ary n-cube Interconnection Networks," IEEE Trans. on
Computers, 1990.
11d) S. Scott and G. Thorson. "The
Cray T3E Network: Adaptive Routing in a High Performance 3D Torus ". Hot Interconnects IV, 1996.
|
12) Dataflow Architectures:
12a) R. Iannucci. "Toward
a Dataflow/Von Neumann Hybrid Architecture". ISCA 1988.
12b) A. Veen. "Dataflow Machine Architecture". ACM Computing Surveys, December 1986.
|
13) Systolic Architectures:
13) M.
Annaratone, E. Arnould, T. Gross, H. T. Kung, M. Lam, O. Menzilcioglu, K.
Sarocky, J. Webb. "The Warp Computer: Architecture, Implementation,
and Performance." IEEE Transactions on Computers
1987.
|
14) Data-Parallel Architectures:
14) J. Nickolls. "The
Design of the MasPar MP-1. A Cost Effective Massively Parallel Computer".
COMPCON 1991.
|
15) Cache-Only Memory Architectures:
15) Cache-Only Memory Architecture
|