

CS415 Compilers

Instruction Scheduling (part 2)

These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

- Recitations and office hours start this week (today!)
- Office hours have been posted soon
- First homework will be posted by Friday
- First project will be instruction scheduling

# RUTGERS Local Instruction Scheduling

## Readings: EaC 12.1-12.3, Appendix A (ILOC)

#### Definition

A basic block is a maximal length segment of straight-line (i.e., branch free) code. Control can only enter at first instruction of basic block and exit after last instruction.

Local: within single basic block

Global: across procedures/functions

# RUTGERS Instruction Scheduling (Engineer's View)

#### The Problem

Given a code fragment (basic block) for some target machine and the latencies for each individual operation, reorder the operations to minimize execution time

### The Concept



#### The Task

- Produce correct code
- Minimize wasted (idle) cycles
- Scheduler operates efficiently

The Optimization Goal: Generate fast code

# RUTGERS

## Data Dependences (stmt./instr. level)

Dependences  $\Rightarrow$  defined on memory locations / registers

Statement/instruction b depends on statement/instruction a if there exists:

- true of flow dependence
   a writes a location/register that b later reads (RAW conflict)
- anti dependence
   a reads a location/register that b later writes (WAR conflict)
- output dependence
   a writes a location/register that b later writes (WAW conflict)

Dependences define ORDER CONSTRAINTS that need to be respected in order to generate correct code.

| true | anti | output |
|------|------|--------|
| a =  | = a  | a =    |
| = a  | a =  | a =    |

cs415, spring 22 Lecture 3

## RUTGERS Instruction Scheduling (The Abstract View)

To capture properties of the code, build a precedence/dependence graph G

- Nodes  $n \in G$  are operations with type(n) and delay(n)
- An edge  $e = (n_1, n_2) \in G$  if  $n_2$  depends on  $n_1$

```
loadAl
                   r0,@w \Rightarrow r1
                   r1,r1 \Rightarrow r1
b:
     add
                   r0,@x \Rightarrow r2
     loadAl
                   r1,r2 \Rightarrow r1
     mult
d:
                   r0,@y \Rightarrow r3
     loadAl
e:
                   r1,r3 \Rightarrow r1
     mult
                   r0,@z \Rightarrow r2
     IoadAl
q:
h:
     mult
                   r1,r2 \Rightarrow r1
     storeAl
                   r1
                       ⇒ r0,@w
```

true
anti

d
e
g

The Code

**The Precedence Graph** 

All other dependences (output & anti) are covered, i.e., are satisfied through the dependencies shown

| <b>Operation</b> | Cycles (latency/delay) |
|------------------|------------------------|
| load             | 3                      |
| loadl            | 1                      |
| loadAl           | 3                      |
| store            | 3                      |
| storeAl          | 3                      |
| add              | 1                      |
| mult             | 2                      |
| fadd             | 1                      |
| fmult            | 2                      |
| shift            | 1                      |
| output           | 1                      |
| outputAl         | 1                      |

## RUTGERS Instruction Scheduling (The Abstract View)

To capture properties of the code, build a precedence graph G

Lecture 3

- Nodes  $n \in G$  are operations with delay(n)
- An edge  $e = (n_1, n_2) \in G$  if  $n_2$  depends on  $n_1$

```
S(n):
                         r0,@w
             loadAl
                                  ⇒ r1
        a:
                         r1,r1
        b:
             add
                                ⇒ r1
                         r0,@x \Rightarrow r2
             loadAl
                         r1,r2
        d:
             mult
                                ⇒ r1
                         r0,@y \Rightarrow r3
             IoadAl
        e:
                         r1,r3
             mult
                                ⇒ r1
                         r0,@z \Rightarrow r2
             IoadAl
    12
             mult
                         r1,r2
                                ⇒ r1
    15
             storeAl
    17
                         r1
                                  \Rightarrow r0,@w
   20
              The Code
   20
cycles
```

cs415, spring 22



The Precedence/Dependence Graph

All other dependences (output & anti) are covered, i.e., are satisfied through the dependencies shown

# RUTGERS Instruction Scheduling

### The big picture

- 1. Build a dependence graph, P
- 2. Compute a *priority function* over the nodes in P
- 3. Use list scheduling to construct a schedule, one cycle at a time (can only issue/schedule at most one instructions per cycle)
  - a. Use a set of operations that are ready
  - b. At each cycle
    - I. Choose a ready operation (priority-based) and schedule it
    - II. Increment cycle
    - III. Update the ready set

### Local list scheduling

- The dominant algorithm for many years
- A greedy, heuristic, local technique

| Operation | Cycles |
|-----------|--------|
| load      | 3      |
| loadl     | 1      |
| IoadAl    | 3      |
| store     | 3      |
| storeAl   | 3      |
| add       | 1      |
| mult      | 2      |
| fadd      | 1      |
| fmult     | 2      |
| shift     | 1      |
| output    | 1      |
| outputAl  | 1      |

### Build a simple local scheduler (basic block)

- non-blocking loads & stores
- out of order execution
- different latencies load/store vs. arith. etc. operations
- different heuristics
- forward / backward scheduling

# RUTGERS Scheduling Example

### 1. Build the dependence graph

```
S(n):
                             r0,@w
              loadAl
         a:
                                        ⇒ r1
                             r1,r1
         b:
              add
                                        \Rightarrow r1
              loadAl
                             r0,@x \Rightarrow r2
         C:
                             r1,r2 \Rightarrow r1
              mult
         d:
              loadAl
                             r0,@y \Rightarrow r3
         e:
                             r1,r3 \Rightarrow r1
              mult
   11
                             r0,@z \Rightarrow r2
   12
              loadAl
         g:
                             r1,r2 \Rightarrow r1
              mult
   15
   17
              storeAl
                             r1
                                        \Rightarrow r0,@w
   20
                The Code
```



The Dependence Graph

 $\Rightarrow$  20 cycles

cs415, spring 22 Lecture 3

## Scheduling Example

- 1. Build the dependence graph
- 2. Determine priorities: longest latency-weighted path

| a: | IoadAl  | r0,@w | ⇒r1                 |
|----|---------|-------|---------------------|
| b: | add     | r1,r1 | ⇒r1                 |
| C: | loadAl  | r0,@x | ⇒ r2                |
| d: | mult    | r1,r2 | ⇒ r1                |
| e: | loadAl  | r0,@y | $\Rightarrow$ r3    |
| f: | mult    | r1,r3 | ⇒r1                 |
| g: | IoadAl  | r0,@z | ⇒ r2                |
| h: | mult    | r1,r2 | ⇒ r1                |
| i: | storeAl | r1    | $\Rightarrow$ r0,@w |



**The Code** 

**The Dependence Graph** 

# RUTGERS List Scheduling Example

#### **The Code**

### 

cs415, spring 22

### CYCLE = 0

READY - SET

ACTIVE - SET

#### **The Generated Code**





The Dependence Graph (longest latency-weighted)

Finishing instruction scheduling

Register Allocation EaC 13.1 - 13.3 (Top-down and Bottom-Up Allocation)