

# RISC-V Processor Architecture

Prof. Dr. Rolf Drechsler

Dr. Muhammad Hassan

M.Sc. Jan Zielasko

M.Sc. Milan Funck

**Problem** 

Algorithm

Program

Instruction Set
Architecture

Microarchitecture

Logic

**Digital Circuits** 

**Analog Circuits** 

**Devices** 

Physics



#### Discussed the last time ...

| 31 2                | 5 24 20                 | 19 15           | 14 12  | 11 7               | 6      | 0      |
|---------------------|-------------------------|-----------------|--------|--------------------|--------|--------|
| funct7              | rs2                     | rs1             | funct3 | rd                 | opcode | R-Type |
| 7 bits              | 5 bits                  | 5 bits          | 3 bits | 5 bits             | 7 bits |        |
| lm                  | m <sub>11:0</sub>       | rs1             | funct3 | rd                 | opcode | I-Type |
|                     |                         |                 | i      |                    | į .    |        |
| lmm <sub>11:5</sub> | rs2                     | rs1             | funct3 | Imm <sub>4:0</sub> | opcode | S-Type |
|                     |                         |                 |        |                    |        |        |
| lmm 12, 10:5        | rs2                     | rs1             | funct3 | lmm 4:1,11         | opcode | В-Туре |
|                     | lmn                     | <b>1</b> 31, 12 | rd     | opcode             | U-Type |        |
|                     | lmm 20, 10:1, 11, 19:12 |                 |        |                    | opcode | J-Type |



#### Today's Agenda

- An Overview of RISC-V Implementation
  - Datapath
  - Controller
- Building blocks
  - Program Counter (PC)
  - Arithmetic Logic Unit (ALU)
  - Instruction Memory
  - Data Memory
  - Registers
- Implementing ADD/SUB
- Adding ADDI Instruction
  - Immediate generator



#### IPhone Launch





### How Do We Build a Single Cycle Processor?

- Processor (CPU)
  - The active part of the computer that does all the work (data manipulation and decision making)
- Datapath ("the brawn")
  - portion of the processor that contains hardware necessary to perform operations required by the processor.
- Control ("the brain")
  - portion of the processor (also in hardware) that tells the datapath what needs to be done.



I/O-Memory Interfaces



# Goal – Design HW to Execute All RV321 Instructions







#### Combinational Logic Blocks









#### One-Instruction-Per-Cycle RISC-V Machine

- CPU is composed of two types of subcircuits
  - Combination logic blocks
  - State elements
- On every tick of the clock, the computer executes one instruction
  - Current outputs of the state elements drive the inputs to combinational logic
  - ...whose outputs settle at the inputs to the state elements before the next rising clock edge.
- At the rising edge of clock
  - All the state elements are updated with the combinational logic outputs...
  - and execution moves to the next clock cycle.





#### State Elements Required by RS32I ISA



During CPU execution, each RV32I instruction reads and/or updates these state elements.



#### Program Counter

• The **Program Counter** is a 32-bit register



- Input
  - N-bit data input bus
  - Write Enable "Control" bit (1: asserted/high, 0: deasserted/0)
- Output:
  - N-bit data output bus
- Behavior:
  - If Write Enable is 1 on rising clock edge, set Data Out=Data In.
  - At all other times, Data Out will not change; it will output its current value.





#### Register File

- Input
  - One 32-bit input data bus, dataW.
  - Three 5-bit select busses, rs1, rs2, and rsW.
  - **RegWEn** control bit.
- Output
  - Two 32-bit output data busses, data1 and data2
- Registers are accessed via their 5-bit register numbers:
  - rs1 selects register to put on data1 bus out.
  - rs2 selects register to put on data2 bus out.
  - rsW selects register to be written via dataW when RegWEn=1.
- Clock behavior: Write operation occurs on rising clock edge.
  - Clock input only a factor on write!
  - All read operations behave like a combinational block:
    - If rs1, rs2 valid, then data1, data2 valid after access time.

#### 32 Registers in RISC-V





#### Memory

- 32-bit byte-addressed memory space; and
- Memory access with 32-bit words.
- Memory words are accessed as follows:
  - Read: Address addr selects word to put on dataR bus.
  - Write: Set MemRW=1.
  - Address addr selects word to be written with dataW bus.



If MemRW=0, MEM behaves like a combinational block.

- Like RegFile, clock input is only a factor on write.
  - If MemRW=1, write occurs on rising clock edge.
  - If MemRW=0 and addr valid, then dataR valid after access time.



**DMEM** 

dataR

addr

addr

MemRW

Inst

**IMEM** 

#### Two Memories – IMEM and DMEM

- Current abstraction: Memory holds both instructions and data in one contiguous 32-bit memory space.
- In our processor, we'll use two "separate" memories:
  - IMEM: A read-only memory for fetching instructions.
  - DMEM: A memory for loading (read) and storing (write) data words.
  - Under the hood, these are placeholders for caches. (more later)
- Because IMEM is read-only, it always behaves like a combinational block:
  - If addr valid, then instr valid after access time.



#### Design the Datapath in Phases

- Task Execute an instruction
  - All necessary operations, starting with fetching the instruction.
- Problem A single monolithic block would be bulky and inefficient
- Solution Break up the process into stages, then connect the stages to create the whole datapath
  - Smaller stages are easier to design!
  - **Modularity**: Easy to optimize one stage without touching the others.





Stages of Instruction Execution

5 basic stages

1. Instruction Fetch (IF)

2. Instruction Decode (ID)

+ Read Registers

3. Execute (EX)
Arithmetic Logic
Unit (ALU)

Register file

4. Memory
Access (MEM)

PC

REG[]

5. Write back to Register (WB)

Processor-Memory Interface

**Control Path** 

Program Counter (PC)



We will implement a single-cycle processor

All stages of one RV32I instruction execute within the same clock cycle.



#### Not All Instructions Need 5 Stages

- The control logic selects needed datapath lines based on the instruction
  - MUX selector, ALU op selector, write enable, etc.





#### Implementing the add Instruction

Example add-only program 0x100 add x18,x18,x10 0x108 add x18,x18,x18

- Suppose we had a single instruction in our RISC-V ISA
  - add add rd, rs1, rs2

| 31 25   | 24 20 | 19 15 | 14 12  | 11 / | 6       | )  |
|---------|-------|-------|--------|------|---------|----|
| funct7  | rs2   | rs1   | funct3 | rd   | opcode  | R- |
| 0000000 | rs2   | rs1   | 000    | rd   | 0110011 |    |

R-Type

The add instructions makes two changes to the processor state

• PC 
$$PC = PC + 4$$



#### Datapath for add

PC = PC + 4

Increment PC to next instruction.

Split instruction to index into RegFile.

R[rd] = R[rs1] + R[rs2] Feed read register values into ALU.

Write ALU output to destination register.





#### Create Full Datapath Step-by-Step





#### Implementing the sub Instruction

- Now we support two instructions in our RISC-V ISA
  - add/sub

| 31 25   | 24 20 | 19 15 | 14 12  | 11 / | 6       | )   |
|---------|-------|-------|--------|------|---------|-----|
| funct7  | rs2   | rs1   | funct3 | rd   | opcode  |     |
| 0000000 | rs2   | rs1   | 000    | rd   | 0110011 | add |
| 0100000 | rs2   | rs1   | 000    | rd   | 0110011 | sub |

sub rd, rs1, rs2

- Instruction bit inst[30] selects between add/sub
- Details left to control logic

sub is same as add, ALU subtracts operands instead of adding them

$$PC PC = PC + 4$$



#### Datapath for sub



PC = PC + 4

Increment PC to next instruction.

Split instruction to index into RegFile.

R[rd] = R[rs1] - R[rs2] Feed read register values into ALU.

Write ALU output to destination register.





#### Supporting All R-Type Instructions

| funct7  | rs2 | rs1 | funct3 | rd | opcode  |      |
|---------|-----|-----|--------|----|---------|------|
| 0000000 | rs2 | rs1 | 000    | rd | 0110011 | add  |
| 0100000 | rs2 | rs1 | 000    | rd | 0110011 | sub  |
| 0000000 | rs2 | rs1 | 001    | rd | 0110011 | sll  |
| 0000000 | rs2 | rs1 | 010    | rd | 0110011 | slt  |
| 0000000 | rs2 | rs1 | 011    | rd | 0110011 | sltu |
| 0000000 | rs2 | rs1 | 100    | rd | 0110011 | xor  |
| 0000000 | rs2 | rs1 | 101    | rd | 0110011 | srl  |
| 0100000 | rs2 | rs1 | 101    | rd | 0110011 | sra  |
| 0000000 | rs2 | rs1 | 110    | rd | 0110011 | or   |
| 0000000 | rs2 | rs1 | 111    | rd | 0110011 | and  |
|         |     |     |        |    |         |      |

The Control Logic decodes funct3, funct7 instruction fields and selects appropriate ALU function by setting the control line ALU<sub>Sel</sub>



#### Implementing the addi Instruction

- Let's add a new instruction
  - addi

| lmm <sub>11:0</sub> | rs1 | funct3 | rd | opcode  |
|---------------------|-----|--------|----|---------|
| Imm <sub>11:0</sub> | rs1 | 000    | rd | 0010011 |

I-Type

addi rd, rs1, imm

**addi** updates the same two states as before. But now we need to build **immediate imm** 

- RegFile Reg[rd] = Reg[rs1] + imm
- PC PC = PC + 4



#### Datapath for addi





#### New Mux to Select Immediate for ALU

R[rd] = R[rs1] + imm

PC = PC + 4

Increment PC to next instruction.

Split instruction to index into RegFile.

Feed read register values into ALU.

Write ALU output to destination register.





#### New Block to Generate 32-bit Immediate





#### Summary

- All data lines carry information
- Control logic determines what is useful/needed vs. what is ignored
  - e.g., ALUSel: chooses ALU operation; Bsel: chooses register/immediate for ALU input B.





# RISC-V Processor Architecture

Prof. Dr. Rolf Drechsler

Dr. Muhammad Hassan

M.Sc. Jan Zielasko

M.Sc. Milan Funck

**Problem** 

Algorithm

Program

Instruction Set
Architecture

Microarchitecture

Logic

**Digital Circuits** 

**Analog Circuits** 

**Devices** 

Physics