

# CS415 Compilers ILOC, Code Shape, and Instruction Scheduling

These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

## RUTGERS Announcements

- Recitations and office hours start this week
- Office hours will be posted soon
- Please go to <u>https://www.cs.rutgers.edu/courses/415/classes/spring\_2022\_kremer/</u> to download lecture slides
- Lecture videos for first three lectures are/will be available on canvas <a href="https://rutgers.instructure.com/courses/160913">https://rutgers.instructure.com/courses/160913</a>
- Please go to piazza for questions
   https://rutgers.instructure.com/courses/160913/external\_tools/1590
- · Reminder: Get ilab account

### Review - Traditional Compiler



- Use an intermediate representation (IR)
- Front end maps legal source code into IR
- Back end maps IR into target machine code

Typically, front end is O(n) or O(n log n), while back end is NPcomplete

### Backend - Register Allocation

Part of the compiler's back end



- Critical properties
- Produce correct code that uses k (or fewer) registers
- Minimize added loads and stores
- Minimize space used to hold spilled values
- Operate efficiently
   O(n), O(n log<sub>2</sub>n), maybe O(n<sup>2</sup>), but not O(2<sup>n</sup>)

### Backend - Instruction Scheduling

### Part of the compiler's back end



## RUTGERS Local Instruction Scheduling

## Readings: EaC 12.1-12.3, Appendix A (ILOC)

#### Definition

A basic block is a maximal length segment of straight-line (i.e., branch free) code. Control can only enter at first instruction of basic block and exit after last instruction.

Local: within single basic block

Global: across procedures/functions

## RUTGERS Instruction Scheduling

#### Motivation

- Instruction latency (pipelining)
  several cycles to complete instructions; instructions can be issued
  every cycle
- Instruction-level parallelism (VLIW, superscalar) execute multiple instructions per cycle

#### **Issues**

- Reorder instructions to reduce execution time
- Static schedule insert NOPs to preserve correctness
- Dynamic schedule hardware pipeline stalls
- Preserve correctness, improve performance
- Interactions with other optimizations (register allocation!)

## RUTGERS Instruction Scheduling

#### Motivation

- Instruction latency (pipelining)
  several cycles to complete instructions; instructions can be issued
  every cycle
- Instruction-level parallelism (VLIW, superscalar)
   execute multiple instructions per cycle

#### **Issues**

- Reorder instructions to reduce execution time
- Static schedule insert NOPs to preserve correctness
- Dynamic schedule hardware pipeline stalls
- Preserve correctness, improve performance
- Interactions with other optimizations (register allocation!)
- Note: After register allocation, code shape contains real, not virtual registers ==> register may be redefined

## UTGERS Memory Model / Code Shape

### Source code



### Memory Model / Code Shape

#### Source code



Assume A, B, C are integer values of 4 bytes address(A) = 1024 + offset(A) = 1028 address(B) = 1024 + offset(B) = 1032 address(C) = 1024 + offset(C) = 1036

byte data addresses

This convention is used in activation records or stack frames. We use it here for consistency.

```
More general:
address(X) = base_address + offset(X)
```

## RUTGERS ILOC (Intermediate Language for Optimizing Compilers)

Instruction scheduling on basic blocks in "ILOC"

- Pseudo-code for a simple, abstracted RISC machine
  - → generated by the instruction selection process
- Simple, compact data structures
- Here: we only use a small subset of ILOC

#### Naïve Representation:

| loadI  | 2  |     | r1 |
|--------|----|-----|----|
| loadAl | r0 | @ y | r2 |
| add    | r1 | r2  | r3 |
| loadAl | r0 | @ x | r4 |
| sub    | r4 | r3  | r5 |

#### **Quadruples:**

- table of k x 4 small integers
- simple record structure
- easy to reorder
- all names are explicit

ILOC is described in Appendix A of EAC.

ILOC simulator "sim" is available on ilab: ~uli/cs415/ILOC\_Simulator/sim

## RUTGERS

### Memory Model / Code Shape

### ILOC: EaC Appendix A

### Source code ILOC code

```
loadI 5 \Rightarrow r1

// compute address of A in r2

B = 6;

C = A + B; store r1 \Rightarrow r2 // content(A) = r1

loadI 6 \Rightarrow r3

// compute address of B in r4

...

store r3 \Rightarrow r4 // content(B) = r3

add r1, r3 \Rightarrow r5

// compute address of C in r6
```

store  $r5 \Rightarrow r6$  // content(C) = r1 + r3



byte data addresses

Is this code correct?

## RUTGERS

### Memory Model / Code Shape

### ILOC: EaC Appendix A

#### Source code ILOC code foo (var A, B) $loadI 5 \Rightarrow r1$ // compute address of A in r2 A = 5; B = 6; C = A + B; store r1 $\Rightarrow$ r2 // content(A) = r1 end foo; $loadI 6 \Rightarrow r3$ // compute address of B in r4 X = 1call foo(X,X); store $r3 \Rightarrow r4$ // content(B) = r3add r1, r3 $\Rightarrow$ r5 print C; // compute address of C in r6 store $r5 \Rightarrow r6$ // content(C) = r1 + r3



byte data addresses

Incorrect for call-by-reference!

Is this code correct?

Aliasing: Two variables or source-code names may refer to the same memory location.

### Examples:

- formal call-by-reference parameters a and b
- pointers a->f and b->f
- array elements: a(i, j) and a(k, l)

Challenge: When is it safe to keep a variable's value in a register across STORE instructions, i.e., while other STORE instructions are executed?

## RUTGERS Memory Model / Code Shape

### 

- → Values that may safely reside in registers are assigned to a unique virtual register (alias analysis)
- → Register allocation/assignment maps virtual registers to limited set of physical registers
- → Register allocation/assignment pass needed to make code "work"

### memory-memory model

- → All values reside in memory, and are only kept in registers as briefly as possible (load operands from memory, perform computation, store result into memory)
- → Register allocation/assignment has to try to identify cases where values can be safely kept in registers
- $\rightarrow$  Safety verification is hard at the low levels of program abstraction
- → Even without register allocation/assignment, code will "work"

### Next topic

More instruction scheduling EaC 12.1 - 12.3