Chapter 3 – Instruction-Level Parallelism and its Exploitation (Part 4)

ILP vs. Parallel Computers
Dynamic Scheduling (Section 3.4, 3.5)
Dynamic Branch Prediction (Section 3.3)
Hardware Speculation and Precise Interrupts (Section 3.6)
Multiple Issue (Section 3.7)
Static Techniques (Section 3.2, Appendix H)
Limitations of ILP (Section 3.10)
Multithreading (Section 3.12)
Putting it Together (Miniprojects)

Limits of ILP

How much can ILP buy us?
Limits studies make optimistic assumptions to find the limit for ILP
But may miss impact of compiler, future advances
A highly optimistic study [Wall'93]
  - Infinite number of physical registers (no register WAW, WAR)
  - Infinite number of in-flight instructions
  - Perfect branch prediction
  - Perfect memory address alias analysis
  - Single cycle FU
  - Single cycle memory (perfect caches)

Limits of ILP (contd.)

(This and next four figures are from an old edition of the book)

<table>
<thead>
<tr>
<th>Benchmark</th>
<th>gcc</th>
<th>espresso</th>
<th>fppp</th>
<th>dudo</th>
<th>tomcatv</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>55</td>
<td>63</td>
<td>18</td>
<td>119</td>
<td>150</td>
</tr>
</tbody>
</table>

Limits of ILP – Impact of Optimistic Assumptions

Limiting Instruction window size
Finding dependences among n instr requires n^2 comparisons
2000 instructions implies 4 million comparisons!
Following use 2K window and 64 issue limit

© 2003 Elsevier Science (USA). All rights reserved.
**Limits of ILP – Impact of Optimistic Assumptions**

Realistic branch prediction
- No charge for mispredictions
- Following use tournament predictor

![Graph showing instruction issues per cycle for different branch prediction schemes.](image)

**Limits of ILP – Impact of Optimistic Assumptions**

Finite registers
- Following uses 256 int and 256 fp for renaming

![Graph showing instruction issues per cycle for different number of registers available for renaming.](image)

**Limits of ILP – Impact of Optimistic Assumptions**

Imperfect memory alias analysis

![Graph showing instruction issues per cycle for different alias analysis techniques.](image)

**But Limits Studies may be Pessimistic!**

For most optimistic study
- WAR and WAW hazards through memory
- Unnecessary dependences (e.g., loop iteration count)
- Overcoming data flow limit – value prediction

For more realistic studies
- Address value prediction and speculation
- Speculating on multiple paths

* This figure has been taken from Computer Architecture, A Quantitative Approach, 3rd Edition Copyright 2003 by Elsevier Inc. All rights reserved. It has been used with permission by Elsevier Inc.
Multithreading: Instruction + Thread Level Parallelism

Often superscalar instruction slots are wasted
Why not use them for other threads?

Multithreading
  - Coarse-grained
  - Fine-grained
  - Simultaneous multithreading (SMT) or hyperthreading

(Vs. multiprocessing)

Impact of SMT: 1 vs. 4 threads for TPC-C

SMT Speedup & Energy Efficiency: 1 vs. 4 threads