Computer Architecture
Angshuman Paul
Assistant Professor
Department of Computer Science & Engineering
Levels of Abstraction
ElectronsProgramAlgorithmProblem
System Software
Instruction Set Architecture
Microarchitecture
Logic
DeviceApplication ProgramAlgorithmProblem
We have studiedLevels of Abstraction
ElectronsProgramAlgorithmProblem
System Software
Instruction Set Architecture
Microarchitecture
Logic
DeviceApplication ProgramAlgorithmProblem
Our next focusImage Source: https://www.industrialpcpro.com/imb -g41a -industrial -micro -atx-motherboardThe MotherboardSubsystems of a Digital Computer 
(A Simplified View) 
Central 
Processing UnitArithmetic & 
Logic Unit
Control UnitMemory
Input DevicesOutput DevicesSubsystems of a Digital Computer 
(A Simplified View) 
Central 
Processing UnitArithmetic & 
Logic Unit
Control UnitMemory
Input DevicesOutput DevicesSame memory for 
program and dataVon Neuman Architecture
Harvard Architecture
Separate memory for 
program and data
•The brain of the computer 
system
•Major component:
•Arithmetic Logic Unit (ALU)
•Control Unit (CU)
•Registers (Local Storage)
•General Purpose
•Special PurposeThe Processor
Programs & Instructions
•The processor executes programs
•A program is composed of a set of instructions
•Instruction set
•A set containing all the instructions that a processor understandsRepresentation of Instructions
•Machine language
•Pattern of binary symbols (0 and 1)
•Suitable for representing execution at the gate level
•Difficult to interpret by designer
•Understood by the digital hardware
•Assembly language
•Mnemonics for representing instructions
•Useful for designer
•Not interpretable by the hardwareComponents of an Instruction (An Example)
•C code: 
•a = b + c
•MIPS code: 
•add $s0, $s1, $s2
•MIPS: Microprocessor without Interlocked Pipelined StagesComponents of an Instruction (An Example)
add $s0, $s1, $s2
Opcodeadd $s0, $s1, $s2Components of an Instruction (An Example)
OperandsComponents of an Instruction
•Operation code (opcode)
•Perform the operation : ADD, SUB, MPY, etc.
•Source operand
•Onthe content of: register, memory location
•Destination operand
•Put the result hereMIPS Registers: General Purpose Registers
•32 bit registers
•32 registers
•MIPS is load store architecture
•Only load and store can access memory directly
•Usage of general -purpose registers
•Holding temporary variables in expression evaluation
•Passing parametersThe Number of Registers
•More registers
•Possible to allocate more variables in registers (fast accesses)
•Reduction in number of memory access
•Longer specifiers for registers (more bits for encoding)
•Increase in register access time (physical registers)
•More registers to save during context switchThe Memory
•Large capacity
•Slower access
•Addressing mode
•Decides how to access data
•Direct from memory
•From registersMemory Organization
•Viewed as a large single dimensional 
array
•Byte addressable
•An address is an index into the arraybits
0
1
2
3
4
5
6
7Instructions that Access Memory
•C Code:  A[10] = A[10]+b
•MIPS code: 
•lw$t0, 40($s2)     [s2 hold the starting address of the array A]
•add $t0, $s1, $t0
•sw$t0, 40($s2)Instruction Format: Major Considerations
•Byte -Ordering
•Instruction Length
•Number of operands
•0, 1, 2, 3
•Specification of operand location
•Register, immediate, indirect, relative, etc.
•Possible size of operands
•Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Byte Ordering
•Big endian
•Little endian
Byte Ordering
•Big endian: IBM z/Architecture, Freescale ColdFire
•Little endian: The Intel x86 and AMD64 / x86 -64
•Bi-endian: ARM AArch64, C -Sky, Power ISA, and RISC -V
•Can be switched using softwareInstruction Format: Major Considerations
•Byte -Ordering
•Instruction Length
•Number of operands
•0, 1, 2, 3
•Specification of operand location
•Register, immediate, indirect, relative, etc.
•Possible size of operands
•Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Instruction Length
•Fixed: MARIE (16 bit)
•Less complexity
•Possible wastage of space
•Variable length: x86
•Common instruction: smaller size
•Saves space
•More complex to decodeInstruction Format: Major Considerations
•Byte -Ordering
•Instruction Length
•Number of operands
•0, 1, 2, 3
•Specification of operand location
•Register, immediate, indirect, relative, etc.
•Possible size of operands
•Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Number of Operands: Zero Address Instructions
•Zero address instructions: in stack based instruction sets (1960s and 1970s)
•Stack: a portion of memory
•Implicit operands on stack
•Push a
r
d•Push a
a
r
dNumber of Operands: Zero Address Instructions•Push a
•Push b
a
r
dNumber of Operands: Zero Address Instructionsb
a
r
d•Push a
•Push bNumber of Operands: Zero Address Instructions•Push a
•Push b
•add 
b
a
r
dNumber of Operands: Zero Address Instructions•Push a
•Push b
•add 
b
a
r
dNumber of Operands: Zero Address Instructions•Push a
•Push b
•add
•add does not use the address field
•Zero -address instruction a+b
r
dNumber of Operands: Zero Address InstructionsNumber of Operands: One Address Instructions
•One address instructions: in accumulator based instruction sets (1960s)
•Accumulator: a general purpose register in CPU
•Load X
….
b
….X
….
q
….YNumber of Operands: One Address Instructions
•One address instructions: use an implied accumulator for operation on data
•Accumulator: a general purpose register in CPU
•Load X
•acc=Mem[X]….
b
….X
….
q
….YNumber of Operands: One Address Instructions
•One address instructions: use an implied accumulator for operation on data
•Accumulator: a general purpose register in CPU
•Load X
•acc=Mem[X]
•Add Y….
b
….X
….
q
….YNumber of Operands: One Address Instructions
•One address instructions: use an implied accumulator for operation on data
•Accumulator: a general purpose register in CPU
•Load X
•acc=Mem[X]
•Add Y
•acc= acc+ Mem [Y]….
b
….X
….
q
….YNumber of Operands: Two Address Instructions
•Two address instructions: May use 
•Either two registers (register register based)
•Add R1, R2    (R1= R1+R2)
•Or a register and a memory location (register memory based)
•Add R1, X   (R1= R1+Mem[X])
•Or two memory locations (memory memory based)
•Add X, Y  (Mem[X]=Mem [X]+ Mem [Y])•Three address instructions:
•Example: Add R1, R2, R3 (R1= R2+R3)
•May involve registers and memory locationsNumber of Operands: Three Address Instructions•y=(a+b)/(c+d) 
•Assume that a, b, c, and d are stored in memory locations A, B, C, and D
•Three address:
•Add R1, A, B (R1=Mem[A]+Mem[B])
•Add R2, C, D (R2=Mem[C]+Mem[D])
•DivR3, R1, R2 (R3=R1/R2)Number of Operands: An Example•y=(a+b)/(c+d) 
•Assume that a, b, c, and d are stored in memory locations A, B, C, and D
•Two address:
•Load R1, A (R1=Mem[A])
•Add R1, B (R1=R1+Mem[B])
•Load R2, C (R2=Mem[C])
•Add R2, D (R2=R2+Mem[D])
•DivR1, R2 (R1=R1/R2)Number of Operands: An Example•y=(a+b)/(c+d) 
•Assume that a, b, c, and d are stored in memory locations A, B, C, and D
•One address:
•Load C (Acc= Mem[C])
•Add D (Acc= Acc+Mem [D])
•Stor R (Mem[R]=Acc)
•Load A (Acc= Mem[A])
•Add B (Acc= Acc+Mem [B])
•DivR (Acc=Acc/Mem[R])Number of Operands: An Example•Fewer address
•Less complex but low flexibility
•May require more memory access: increase in execution time
•More instructions per program: more number of instruction fetches
•More addresses
•More involvement of registers: faster operations
•Longer instructions for specifying more operands
•Less instructions per programNumber of Operands: Relative AnalysisInstruction Format: Major Considerations
•Byte -Ordering
•Instruction Length
•Number of operands
•0, 1, 2, 3
•Specification of operand location
•Register, immediate, indirect, relative, etc.
•Possible size of operands
•Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Operand Location: Addressing Modes
•Determines how operands will be accessed 
•Addressing modes are related to specific operands but not the instruction as a whole 
•Possible addressing modes
•Implied
•Immediate
•Direct
•Indirect
•Register
•Register Indirect
•Relative
•Displacement (Indexed) 
•Stack
•Different instruction sets use different sets of addressing modesImplied Addressing
•Operand location is implicitly specified in the opcode itself
•Does not contain any explicit operand
•Example: INCA
•Increment the content of accumulator
•Zero address stack based instructions also use implied addressingImmediate Addressing
•Operand is part of instruction
•addi $t1, $t2, 5
•add 5
•No memory/register reference for the corresponding operand
•Faster execution
•Explicit specification of the corresponding operand: less flexibility Direct (Absolute) Addressing
•Address of operand is specified in the instruction
•add r1, 2066H
•Memory access required
•Limited address space (maximum 2𝑟, where ) r is 
the number of address field in the instruction….
b
….2066H
….
….
….Address A OpcodeInstruction
Memory
OperandDirect AddressingIndirect Addressing
•Memory location 
pointed to by 
address field of 
the instruction 
contains the 
address of the 
operandAddress A Opcode
Memory
OperandAddress of operand•Two memory access required for one operand
•Slower
•If the content of memory is of n bit
•Size of addressable memory 2𝑛
•Usually 
•No. of bit in a memory location > No. of address bits in an instruction
•Access to large address spaceIndirect Addressing•Operand is held in a register mentioned in the instruction
•Add R1, R2
•Limited number of registers
•May be used for storing limited amount of data a given point of time
•Small address field needed 
•Shorter instructions
•Faster fetch
•Involves registers
•Fastest memoryRegister AddressingRegister Address R OpcodeInstruction
Register Bank
OperandRegister Addressing•The memory address of the operand is held in a register mentioned in 
the instruction
•Add R1, (R5)
•PC=R1+Mem[R5]
•Indirect addressing with one memory access
•Large address spaceRegister Indirect AddressingRegister Indirect Addressing
Register Address R OpcodeInstruction
Memory
OperandAddress of Operand Register Bank•The content of the register is automatically incremented or decremented after 
instruction execution
•Add R1, (R2)+
•R1 = R1 +M[R2]
•R2 = R2 + d 
•Useful for implementing stacksAuto Indexing•The program counter holds a base address
•The instruction specifies an immediate offset
•Effective address of operand= Base address + offsetRelative Addressing (PC Relative)
Opcode Offset
Base AddressInstruction
PCOperand•Offset limited, accessible address is also limitedRelative Addressing
Opcode Offset
Base AddressInstruction
CPU Register (e.g. PC)Operand•A CPU register is used as index register
•An explicit offset is specified in the instruction
•Load R1, 1050 (R2)     (R1=Mem[1050+R2])
•May be used for accessing the elements of an arrayIndexed Addressing
Opcode Index register Offset
IndexInstruction
CPU Register R2
Operand•Operand is (implicitly) on top of stack
•Push, PopStack AddressingInstruction Format
Opcode Operand 1 Operand 2•A simple instruction format
•The instruction format may depend on the type of instructionTypes of Instructions
•Data movement instructions
•Register
•Memory
•Stack
•I/O instructions
•Data processing instructions
•Arithmetic
•Logical
•Control
•Systems Control
•Transfer of controlData Movement Instructions
•Most frequently used
•Possible data movements
•Memory -> Register
•Register -> Register
•Register -> Memory
•Restrict the instructions that can move data to and from memory
•Load -store architecture
•Reduction of memory traffic 
•Example Load, Store, Move, Exchange, Clear, Set etc.Data Movement Instructions
•May have different instructions depending on the 
•Size of operand
•32 bit word
•Half -word etc.
•Location of operands
•Memory -> Register
•Register -> Register
•Example IBM S/390
•May have different instruction for I/OTypes of Instructions
•Data movement instructions
•Register
•Memory
•Stack
•I/O
•Data processing instructions
•Arithmetic
•Logical
•Control
•Systems Control
•Transfer of controlData Processing Instructions: Arithmetic
•For operations on
•Integer
•Floating point numbers
•May have different arithmetic instructions depending on data size
•May use a separate set of register for floating point numbersData Processing Instructions: Logical
•Bitwise operations: AND, OR, NOT, XOR, TEST, CMP , SET
•Bit manipulation instructions 
•Logical left/ right shift
•arithmetic left/ right shiftArithmetic ShiftsArithmetic Shift: Right Shift
1 0 1 1
1 1 0 1Signed Negative Number
S
S0 0 1 1
0 0 0 1Signed Positive Number
S
SArithmetic Shift: Left Shift
1 0 1 1
0 1 1 0Signed Negative Number
S
S0 0 0 1 1
0 1 1 0Signed Positive Number
S
S0Arithmetic Shift: Left Shift
1 0 1 1
0 1 1 0Signed Negative Number
S
S0 0 1 1
0 1 1 0Signed Positive Number
S
SArithmetic Shift: Left Shift
1 0 1 1
0 1 1 0Signed Negative Number
S
S0 0 1 1
0 1 1 0Signed Positive Number
S
S
11 0 1 1
0 1 1 0Left Shift
0 1 0 1 1
0 1 0 1Right Shift
0Logical ShiftTypes of Instructions
•Data movement instructions
•Register
•Memory
•Stack
•I/O
•Data processing instructions
•Arithmetic
•Logical
•Control
•Systems Control
•Transfer of controlControl Instructions: Transfer of Control
•PC holds the address of the next instruction to be fetched
•Transfer of control: do not execute the next PC value
•Transfer control to another part of the instruction spaceControl Instructions: Transfer of Control
•Branch:
•Conditional: tests a condition and based on the outcome, jumps to a given location
•Unconditional: jumps to a location without testing for any condition
•BRZ X (branch to X if result is zero) 
•BRP X (positive) 
•BRN X (negative) 
•BRE X, R1, R2 (equal)
•Skip
•CALL, RETControl Instructions: Transfer of Control
Memory Location Instruction
2600 H ADD A, R1, R2
2604 H BRP 2616 H
2608H …
2612 H …
2616 H SUB A, R3, R2
2620 H BR 2600 H
2624 H …Control Instructions: Transfer of Control
Memory Location Instruction
2600 H ADD A, R1, R2
2604 H BRP 2616 H
2608H …
2612 H …
2616 H SUB A, R3, R2
2620 H BR 2600 H
2624 H …Conditional 
BranchUnconditional 
BranchControl Instructions: Transfer of Control
Source: https://www.freetimelearning.com/c -language/c -language -unconditional -control -statements.phpControl Instructions: Transfer of Control
Source: https://www.freetimelearning.com/c -language/c -language -unconditional -control -statements.phpMemory Location Instruction
2600 H ADD A, R1, R2
2604 H BRP 2616 H
2608H …
2612 H …
2616 H SUB A, R3, R2
2620 H BR 2600 H
2624 H …Conditional 
BranchUnconditional 
BranchControl Instructions: Transfer of Control
•Branch:
•Conditional: tests a condition and based on the outcome, jumps to a given location
•Unconditional: jumps to a location without testing for any condition
•BRZ X (branch to X if result is zero) 
•BRP X (positive) 
•BRN X (negative) 
•BRE X, R1, R2 (equal)
•Skip: branching with implied address
•Typically skips one instruction
•CALL, RETControl Instructions: Transfer of Control
•Procedure calls
•Special branching instructions
•Requires to store return address
•Storage of return address: different for different machines
•Memory
•Register 
•StackControl Instructions: Systems Control
•Halting instructions: 
•NOP: no operation
•No change in processor state
•Advances the program counter
•Why do we need this?
•Halt: 
•Brings processor to an orderly halt
•May be restarted by an interrupt, external action etc.
•Sleep mode
•Halt in OS: invoked by privileged system softwareControl Instructions: Systems Control
•Interrupt instructions: 
•Normal execution gets suspended
•May be invoked by I/O or an instruction to get serviced
•Asynchronous interrupt
•Generated by hardware devices other than processor
•May be generated at any time
•Synchronous interrupt 
•Produced by control unit during instruction execution
•Handles exception (such as division by zero)Instruction Set: Use of Assembly Language
•CPU: executes instructions
•Primitive operations
•We represent instructions using assembly language
•Readable by human
•Different CPUs implement different sets of instructions  
•Instruction set
•Instruction Set Architecture: specific to CPUs
•Intel 80x86, 
•IBM/Motorola PowerPC (old Macintosh), 
•MIPS, etc...Instruction Set Architectures
•Early trend in the design of instruction set architecture 
•Add more and more instructions to the instruction set
•Separate instructions for as many operations as possible
•Compound instructions:  long execution time
•VAX Poly
•Complex Instruction Set Computers (CISC)
•RISC –Reduced Instruction Set Computers
•Keep the instruction set small and simple
•Makes it easier to build fast hardware
•Let software do complicated operations by composing simpler ones.RISC
•Instructions are easy to decode
•Reduced CPI
•Many general purpose registers
•Increase in the number of instructions per program
•Too many assembly language instructions per program
•Difficult to ensure correctness
•Most instruction sets (e.g., MIPS, ARM)CISC
•Instructions are difficult to decode
•Increased CPI
•Less number of general purpose registers
•Less number of instructions per program
•Assembly language programming is easier
•Intel x86RISC vs CISC: An Example
•Multiplying two numbers stored in memory locations X and Y
CISC RISC
MULT X,Y LOAD R1, X
LOAD R2, Y
MUL A, R1, R2MIPS Architecture
•MIPS –a RISC architectures
•Why MIPS instead of Intel x86?
•Simple  
•Used in embedded apps
•RISC Registers
•A fast storage location for data inside the processor
•limited number, expensive
•Arithmetic, logic operations can only be performed on the content of registers 
•Faster execution of programsGeneral Purpose Registers in MIPS 32
•32 general purpose registers in MIPS
•Each register is of 32 bit (32 bit -> word)
•Registers are numbered from 0 to 31
•Each register can be referred to by number or name
•Number references
•$0, $1, $2, … $30, $31
•MIPS has some special purpose registers as well
•hi, lo, etc.Register Number Conventional Name Usage
$0 $zero Hard -wired to 0
$1 $at Reserved for the assembler
$2 -$3 $v0, $v1 Return values from functions
$4 -$7 $a0 -$a3Arguments to functions -not preserved by 
subprograms
$8 -$15 $t0 -$t7 Temporary data, not preserved by subprograms
$16 -$23 $s0 -$s7 Saved registers, preserved by subprograms
$24 -$25 $t8 -$t9More temporary registers, not preserved by 
subprograms
$26 -$27 $k0 -$k1 Reserved for kernel. Do not use.
$28 $gp Global Area Pointer (base of global data segment)
$29 $sp Stack Pointer
$30 $fp Frame Pointer
$31 $ra Return AddressMIPS General Purpose RegistersMIPS General Purpose Registers
Register Number Conventional Name Usage
$0 $zero Hard -wired to 0
$1 $at Reserved for the assembler
$2 -$3 $v0, $v1 Return values from functions
$4 -$7 $a0 -$a3Arguments to functions -not preserved by 
subprograms
$8 -$15 $t0 -$t7 Temporary data, not preserved by subprograms
$16 -$23 $s0 -$s7 Saved registers, preserved by subprograms
$24 -$25 $t8 -$t9More temporary registers, not preserved by 
subprograms
$26 -$27 $k0 -$k1 Reserved for kernel. Do not use.
$28 $gp Global Area Pointer (base of global data segment)
$29 $sp Stack Pointer
$30 $fp Frame Pointer
$31 $ra Return AddressMIPS General Purpose Registers
Register Number Conventional Name Usage
$0 $zero Hard -wired to 0
$1 $at Reserved for the assembler
$2 -$3 $v0, $v1 Return values from functions
$4 -$7 $a0 -$a3Arguments to functions -not preserved by 
subprograms
$8 -$15 $t0 -$t7 Temporary data, not preserved by subprograms
$16 -$23 $s0 -$s7 Saved registers, preserved by subprograms
$24 -$25 $t8 -$t9More temporary registers, not preserved by 
subprograms
$26 -$27 $k0 -$k1 Reserved for kernel. Do not use.
$28 $gp Global Area Pointer (base of global data segment)
$29 $sp Stack Pointer
$30 $fp Frame Pointer
$31 $ra Return AddressMIPS: Other Registers
•HI/ LO: to store results of multiply instructions
•During integer division
•Remainder in HI
•Quotient in LO
•PC: program counter
•Points to the next instruction to be executed
•The two low order bits are always zero. Why?op1 op2 HI LOMIPS: Types of Instructions
•Three types of instructions
•R-type: when all data values used by an instruction is located in registers
opcode rs rt rdshift 
(shamt )funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsR Format: An Example
•add $t0, $s1, $s2
000000 10001 10010 01000 00000 100000
ALU s1 s2 t0 shamt addopcode rs rt rd shift ( shamt ) funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsR Format: An Example
•sub $t0, $s1, $s2
000000 10001 10010 01000 00000 100010
ALU s1 s2 t0 shamt subopcode rs rt rd shift ( shamt ) funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsMIPS: Types of Instructions
•Three types of instructions
•I-type: for instructions that operate on an immediate value and a register value
opcode rs rt IMM
6 bits 5 bits 5 bits 16 bitsMIPS: Types of Instructions
•addi $s1, $s2, 5
001000 10010 10001 0000000000000101
addi s2 s1 5opcode rs rt IMM
6 bits 5 bits 5 bits 16 bitsMIPS: Types of Instructions
•Three types of instructions
•J-type: for executing jump
opcode Pseudo -address
6 bits 26 bitsMIPS: Types of Instructions
•j 2600 H
000010 00000000000010011000000000
j 2600 Hopcode Pseudo -address
6 bits 26 bitsTypes of Instructions in MIPS
•Data movement instructions
•Register
•Memory
•Stack
•I/O
•Data processing instructions
•Arithmetic
•Logical
•Control
•Systems Control
•Transfer of controlData Movement Instructions in MIPS
Instruction Example Meaning Comments
load word lw $s1,100($s2) $s1=Memory[$s2+100] Copy from memory to register
store word sw $s1,100($s2) Memory[$s2+100]=$s1 Copy from register to memory
load upper 
immediatelui$s1,55 $s1=55x2^16 Load constant into upper 16 bits. 
Lower 16 bits are set to zero.
load address la $s1,label $s1=Address of label Pseudo -instruction (provided by assembler, not 
processor!)
Loads computed address of label (not its contents) 
into register
load immediate li $s1,100 $s1=100 Pseudo -instruction (provided by assembler, not 
processor!)
Loads immediate value into register
move from hi mfhi $s2 $s2=hi Copy from special register hi to general register
move from lo mflo $s2 $s2=lo Copy from special register lo to general register
move move $s1,$s2 $s1=$s2 Pseudo -instruction (provided by assembler, not 
processor) Copy from register to register.Data Movement Instructions in MIPS
Data Movement Instructions
•May have different instructions depending on the
•Size of operand
•32 bit word
•Half -word etc.
•Location of operands
•Memory -> Register
•Register -> Register
•Example IBM S/390
•May have different instruction for I/OData Processing Instructions in MIPS: 
Arithmetic
Instruction Example Meaning Comments
add add $t1,$t2,$t3 $t1=$t2+$t3
subtract sub $t1,$t2,$t3 $t1=$t2 -$t3
add immediate addi $t1,$t2,100 $t1=$t2+100 Addition with a constant number
add unsigned addu $t1,$t2,$t3 $t1=$t2+$t3 Values are treated as unsigned integers,
not two's complement integers
subtract unsigned subu $t1,$t2,$t3 $t1=$t2 -$t3 Values are treated as unsigned integers,
not two's complement integers
add immediate unsigned addiu $t1,$t2,100 $t1=$t2+100 Values are treated as unsigned integers,
not two's complement integers
Multiply (without overflow) mul $t1,$t2,$t3 $t1=$t2*$t3 Result is only 32 bits
Multiply mult $t2,$t3 $hi, $low=$t2*$t3 Upper 32 bits stored in special register hi
Lower 32 bits stored in special register lo
Divide div $t2,$t3 $hi, $low=$t2/$t3 Remainder stored in special register hi
Quotient stored in special register lo
Unsigned Divide divu $t2,$t3 $hi, $low=$t2/$t3 $t2 and $t3 store unsigned values.
Remainder stored in special register hi
Quotient stored in special register loData Processing Instructions in MIPS: Logical
Instruction Example Meaning Comments
and and $t1,$t2,$t3 $t1=$t2&$t3 Bitwise AND
or or $t1,$t2,$t3 $t1=$t2|$t3 Bitwise OR
and immediate andi $t1,$t2,100 $t1=$t2&100 Bitwise AND with immediate value
or immediate ori$t1,$t2,100 $t1=$t2|100 Bitwise OR with immediate value
shift left logical sll $t1,$t2,10 $t1=$t2<<10 Shift left by constant number of 
bits
shift right logical srl $t1,$t2,10 $t1=$t2>>10 Shift right by constant number of 
bitsR Format: An Example
•add $t0, $s1, $s2
000000 10001 10010 01000 00000 100000
ALU s1 s2 t0 shamt addopcode rs rt rd shift ( shamt ) funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsR Format: An Example
•add $t0, $s1, $s2
•sll$s1, $s2, 5000000 10001 10010 01000 00000 100000
ALU s1 s2 t0 shamt addopcode rs rt rd shift ( shamt ) funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bits
000000 00000 10010 10001 00101 000000
Special 0 s2 s1 shamt sllTransfer of Control Instructions in MIPS
Instruction Example Meaning Comments
branch on equal beq $s1,$s2,100 if($s1==$s2) go to PC+4+100 Test if registers are 
equal
branch on not equal bne $s1,$s2,100 if($s1!=$s2) go to PC+4+100 Test if registers are not 
equal
branch on greater than bgt $s1,$s2,100 if($s1>$s2) go to PC+4+100 Pseduo -instruction
branch on greater than 
or equalbge $s1,$s2,100 if($s1>=$s2) go to PC+4+100 Pseduo -instruction
branch on less than blt$s1,$s2,100 if($s1<$s2) go to PC+4+100 Pseduo -instruction
branch on less than or 
equalble $s1,$s2,100 if($s1<=$s2) go to PC+4+100 Pseduo -instructionTransfer of Control Instructions in MIPS
Instruction Example Meaning Comments
jump j 1000 go to address 1000 Jump to target address
jump register jr $s1 go to address stored in $s1 For switch, procedure 
return
jump and link jal1000 $ra=PC+4; go to address 1000 Use when making 
procedure call.
This saves the return 
address in $raComparison Instructions in MIPS
Instruction Example Meaning Comments
set on less than slt $s1,$s2,$s3 if($s2<$s3)$s1=1;
else $s1=0Test if less than.
If true, set $s1 to 1. 
Otherwise, set $s1 to 0.
set on less than 
immediateslti $s1,$s2,100 if($s2<100)$s1=1;
else $s1=0Test if less than.
If true, set $s1 to 1. 
Otherwise, set $s1 to 0Memory Alignment
•Memory is byte addressable
Bits
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
260A
260B
260C
260D
260E
260FByte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…•A MIPS word: 4 bytes
Bits
2600
2601
2602
2603
2604
2605
2606
2607
2608
2609
260A
260B
260C
260D
260E
260FWord 1
Word 2Memory Alignment
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…•A word occupies 4 contiguous bytes
Bits
2600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…Byte 0
2600
…
…
…
…
…Byte 1 Byte 2 Byte 3
Word 1
2604 Word 2
2608 Word 3Memory Alignment
•A word occupies 4 contiguous bytes2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3Memory Alignment
•Several instructions require memory alignment, 
•In case of those instructions, data can be loaded/ stored in memory starting only 
from/ to specific memory locations
Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•A word must start at an address divisible by 4
Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•Valid word address: 2600H, 2604H, 260CH, etc.
•Invalid word address: 2601H, 2605H, etc.
2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lw, swrequire target addresses to be memory aligned 
2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lw, swrequire target addresses to be memory aligned 
2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32601H
2605H
2609H
260DH
…
…
…
…Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lw, swrequire target addresses to be memory aligned 
2600H
2604H
2608H
260CH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment: Example
•lw, swrequire target addresses to be memory aligned 
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…2600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment: Example
•lw, swrequire target addresses to be memory aligned 
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…2600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment: Example
•lw, swrequire target addresses to be memory aligned 
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 
2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32601H
2605H
2609H
260DH
…
…
…
…Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 
2600H
2604H
2608H
260CH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 
2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32602H
2606H
260AH
260EH
…
…
…
…2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 
2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment: Example
•lh, lhu, shrequire target addresses to be memory aligned 
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…2600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment: Example
•lh, lhu, shrequire target addresses to be memory aligned 
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…2600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment: Example
•lh, lhu, shrequire target addresses to be memory aligned 
Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lb, lbu, sb
2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment: Example
2600
…
…
…
…
…Word 1
2604 Word 2
2608 Word 3•lb, lbu, sb
2600H
2604H
2608H
260CH
…
…
…
…2601H
2605H
2609H
260DH
…
…
…
…2602H
2606H
260AH
260EH
…
…
…
…2603H
2607H
260BH
260FH
…
…
…
…Byte 0 Byte 1 Byte 2 Byte 32600
2603
2604
2607
2608
260B
260C
260FWord 1
Word 2
Word 3
Word 4Memory Alignment Not Required: Example
•lwl, lwr, swl, swr can access bytes starting from any location
Most Significant
Least SignificantWord 1Byte 0
Byte 1
Byte 2
Byte 3
Byte 4
…
…
…An Example C Program: Condition
•C program segment
if (i== j) 
f=g+h; 
else 
f=g-h;
… #next set of C codes
•Consider the following register allocations:
f: $s0
g: $s1
h: $s2
i: $s3
j: $s4Next set of C codesi== j?
f=g+h f=g-h(false) 
i != j(true) 
i == j•MIPS code:
beq $s3,$s4, Label1            # branch i==j
sub $s0,$s1,$s2                   # f=g -h(false)
j   Label2                                # goto Label2
Label1: add $s0,$s1,$s2   # f= g+h (true)
Label2: … #next set of instructionsC program segment
if (i== j) 
f=g+h; 
else 
f=g-h;
… #next set of C codes
Next set of C codesi== j?
f=g+hf=g-h(false) 
i != j(true) 
i == j
Label1
Label2An Example C Program: Condition•C Code segment
sum=0;
i=0;
do { sum=sum + A[ i];
i=i+1;
} while ( i<n)
•Consider the following register allocations:
sum: $s0
base of A: $s1
i: $s3
n: $s4An Example C Program: Loopsum=0;
i=0;
do { sum=sum + A[ i];
i=i+1;
} while ( i<n)
•Consider the following register 
allocations:
sum: $s0
base of A: $s1
i: $s3
n: $s4sum=0;
i=0;
Label1: sum=sum+ A[ i];
i=i+1;
if(i<n) goto Label1;An Example C Program: Loopsum=0;
i=0;
Label1: sum=sum+ A[ i];
i=i+1;
if(i<n) goto Label1;
sum: $s0
base of A: $s1
i: $s3
n: $s4
Label1: add $s0, $s0, $s1move $s0, $zero
move $s3, $zero
Label1: sll$t1, $s3, 2
add $t3, $s1, $t1
lw$t2, 0($t3)
add $s0, $s0, $t2
add $s3, $s3, 1
blt$s3, $s4, Label1An Example C Program: Loopsum=0;
i=0;
Label1: sum=sum+ A[ i];
i=i+1;
if(i<n) goto Label1;
sum: $s0
base of A: $s1
i: $s3
n: $s4
Label1: add $s0, $s0, $s1move $s0, $zero
move $s3, $zero
Label1: sll$t1, $s3, 2
add $t3, $s1, $t1
lw$t2, 0($t3)
add $s0, $s0, $t2
add $s3, $s3, 1
blt$s3, $s4, Label1An Example C Program: Loopmove $s0, $zero
move $s3, $zero
Label1: sll$t1, $s3, 2
add $t3, $s1, $t1
lw$t2, 0($t3)
add $s0, $s0, $t2
add $s3, $s3, 1
blt$s3, $s4, Label1blt$s3, $s4, Label1
Is equivalent to
slt$at, $s3, $s4
bne $at, $zero, Label1blt: A Pseudo Instructionblt$s3, $s4, Label1
Is equivalent to
slt$at, $s3, $s4
bne $at, $zero, Label1blt: A Pseudo Instruction
Register 
NumberConventional 
Name Usage
$0 $zero Hard -wired to 0
$1 $at Reserved for the assembler
$2 -$3 $v0, $v1 Return values from functions
$4 -$7 $a0 -$a3Arguments to functions -not 
preserved by subprograms
$8 -$15 $t0 -$t7Temporary data, not preserved by 
subprograms
$16 -$23 $s0 -$s7Saved registers, preserved by 
subprograms
$24 -$25 $t8 -$t9More temporary registers, not 
preserved by subprograms
$26 -$27 $k0 -$k1 Reserved for kernel. Do not use.
$28 $gpGlobal Area Pointer (base of global 
data segment)
$29 $sp Stack Pointer
$30 $fp Frame Pointer
$31 $ra Return Addressblt$s3, $s4, Label1
Is equivalent to
slt$at, $s3, $s4
bne $at, $zero, Label1blt: A Pseudo Instruction
Instruction Example Meaning Comments
set on less 
thanslt $s1,$s2,$s3 if($s2<$s3)$s1=1;
else $s1=0Test if less 
than.
If true, set 
$s1 to 1. 
Otherwise, 
set $s1 to 0.
set on less 
than 
immediateslti $s1,$s2,100 if($s2<100)$s1=1;
else $s1=0Test if less 
than.
If true, set 
$s1 to 1. 
Otherwise, 
set $s1 to 0Addressing Modes in MIPS 32
•Register addressing
•add $t0, $s1, $s2
•Immediate addressing 
•addi $t0, $s1, 0xAA32
•Base addressing
•lw$s2, 0x4AB2($t3)
•jr$t1•PC relative addressing
•beq $s1, $t1, 0x2800
•j labelSPIM Simulator
•Uses several pseudo instructions
•Qt SPIM: The newest version
•Copyright
Spim is copyrighted by James Larus and distributed under a BSD license.
Copyright (c) 1990 -2018, James R. Larus. All rights reserved.System Calls
MIPS Assembly Code Layout
Assembler DirectivesAn Example Code in SPIM: Add Two Numbers
.text # text section
.globl main # call main by SPIM
main: la $s0, dataval # load address ‘value’ into $t0
lw$t1, 0($s0) # load word 0(value) into $t1
lw$t2, 4($s0) # load word 4(value) into $t2
add $t3, $t1, $t2 # add two numbers into $t3
sw$t3, 8($s0) # store word $t3 into 8($t0)
.data # data section
dataval : .word 15, 27, 0 # data for additionAn Example Code in SPIM: Add Two Numbers
.text # text section
.globl main # call main by SPIM
main: la $s0, dataval # load address ‘value’ into $t0
lw$t1, 0($s0) # load word 0(value) into $t1
lw$t2, 4($s0) # load word 4(value) into $t2
add $t3, $t1, $t2 # add two numbers into $t3
sw$t3, 8($s0) # store word $t3 into 8($t0)
.data # data section
dataval : .word 15, 27, 0 # data for addition
2800H 15
2804H 27
2808H 0An Example Code in SPIM: Add Two Numbers
.text # text section
.globl main # call main by SPIM
main: la $s0, dataval # load address ‘value’ into $t0
lw$t1, 0($s0) # load word 0(value) into $t1
lw$t2, 4($s0) # load word 4(value) into $t2
add $t3, $t1, $t2 # add two numbers into $t3
sw$t3, 8($s0) # store word $t3 into 8($t0)
.data # data section
dataval : .word 15, 27, 0 # data for addition
2800H 15
2804H 27
2808H 42System Calls
Another Example Code in SPIM: I/O
Function Call
MainFunction A
Function BFunction Call
MainFunction A
Function B
Caller Callee Callee Function Call
•Functions: blocks of instructions, invoked to perform a certain action
•A function has a starting address in memory
•Let the address be f_addr
•To call the function 
•Store the return address
•PC < -f_addr
•Execute the functionUse of jaland jr
Instruction Example Meaning Comments
jump j 1000 go to address 1000 Jump to target address
jump register jr $s1 go to address stored in $s1 For switch, procedure 
return
jump and link jal1000 $ra=PC+4; go to address 1000 Use when making 
procedure call.
This saves the return 
address in $raUse of jaland jr: An Example
li $a0,10
li $a1,21
jalmath      # $ra< -PC+4
addi $t0, $v0, 4
li $v0,10
syscall
math: add $v0,$a1,$a3
jr$raThe Components of a Computer
Central 
Processing UnitArithmetic Logic 
Unit
Control UnitMemory
Input DevicesOutput DevicesThe CPU
•Processor: Responsible for operations on data
•Datapath: Segment of the CPU that contains hardware responsible for storing 
and processing of data
•Path through which the data flows
•Control Section: Issues different control signals that controls datapathStages of Datapath: Instruction Execution
•Executing an instruction
•Performing all necessary operations in a single atomic block would be too complex and 
inefficient
•add $s0, $t1, $t2
•Break the process of instruction execution into stages
•Connect the stages to create the whole datapath
•smaller stages are easier to design
•easy to optimize (change) one stage without touching the othersFive Stages of Instruction Execution
•Stage 1: Instruction Fetch
•Stage 2: Instruction Decode
•Stage 3: Arithmetic Logic Operations
•Stage 4: Memory Access
•Stage 5: Register WriteStage 1: Instruction Fetch
•Instructions: stored in main memory
•Memory: byte addressable
•MIPS instructions: 32 bit, 4 bytesInstruction 1
Instruction 22000 H
2001 H
2003 H2002 H
2004 H
2005 H
2006 H
2007 HStage 1: Instruction Fetch
•The content of PC points to the location of 
the next instruction to be fetched (and 
executed
•The instruction pointed by PC is fetched
•PC is incremented by 4 (since instructions 
are of 4 byte)
•PC< -PC+4
•PC now points to the next instructionInstruction 1
Instruction 22000 H
2001 H
2003 H2002 H
2004 H
2005 H
2006 H
2007 HStage 1: Instruction Fetch
•The content of PC points to the location of 
the next instruction to be fetched (and 
executed
•The instruction pointed by PC is fetched
•PC is incremented by 4 (since instructions 
are of 4 byte)
•PC< -PC+4
•PC now points to the next instruction
Stage 2: Instruction Decode
•Gather the meaning of each filed in the instruction
•First read the opcode
•Type of instruction
•Length of fieldsStage 2: Instruction Decode
•Gather the meaning of each filed in the instruction
•First read the opcode
•Type of instruction 
•Length of fieldsopcode rs rt IMM
6 bits 5 bits 5 bits 16 bits
opcode rs rt rdshift 
(shamt ) funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsI-Type
R-TypeStage 2: Instruction Decode
•Gather the meaning of each filed in the instruction
•First read the opcode
•Type of instruction 
•Length of fields
•Then operandsopcode rs rt IMM
6 bits 5 bits 5 bits 16 bits
opcode rs rt rdshift 
(shamt ) funct
6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsI-Type
R-TypeStage 3: Arithmetic Logic Operations
•ALU is involved
•All arithmetic and logic operations involving operands
•What about loads and stores?
•lw $s0, 20($t0)
•The memory address to be accessed -> the value in $t0 + 20
•This addition is performed in this stageStage 4: Memory Access
•Used only by instructions that access memory
•In MIPS
•lw, sw, etc. Stage 5: Register Write
•Write the results in a register
•Especially for arithmetic and logic instructions
•add $t1, $s1, $s2
•What about store, branch, jump instructions?
•Don’t write anything into a register at the end
•These instructions remain idle during this fifth stage or skip it all togetherInstruction Execution instruction
memory
+4rtrsrd
registersALU
Data
memory
imm
1. Instruction
Fetch2. Decode/
Register
Read3. Execute/ 
Perform 
Arithmetic 
Logic 
Operations 4. Memory5. Register
WritePC•add $r3,$r1,$r2 # r3 = r1+r2
•Stage 1: fetch this instruction, increment PC
•Stage 2: decode to determine it is an add, 
then read registers $r1 and $r2
•Stage 3: add the two values retrieved in Stage 2
•Stage 4: idle (nothing to write to memory)
•Stage 5: write result of Stage 3 into register $r3Analyzing the Datapath: addinstruction
memory
+4
registersALU
Data
memory
imm213add r3, r1, r2reg[1]+
reg[2]
reg[2]reg[1]Example: add InstructionPCALUinstruction
memory
+4
registers
Data
memory
imm31xlwr3, 17(r1)reg[1]
+17
17reg[1]
MEM[r1+17]Example: lwInstructionPCDatapath and Control
•Datapath based on data transfers required to execute instructions
•Controller causes the right transfers to take place
PC
instruction
memory
+4rtrsrd
registers
Data
memory
immALU
Controlleropcode, functRequired Hardware
•Program Counter (PC)
•Holds the address of the next instruction to be fetched
•General Purpose Registers (GPR): 
•Used in Stages 2 (Read) and 5 (Write)
•MIPS has 32 GPRs
•Memory: used in Stages 1 (Fetch) and 4 (R/W)
•Use of cache
•ALU: used in stage 3
•Misc. registers: inserted between stages to hold data 
and control signals•Stage 1: Instruction Fetch
•Stage 2: Instruction Decode
•Stage 3: Arithmetic Logic Operations
•Stage 4: Memory Access
•Stage 5: Register WriteCPU Clocking
•For each instruction, how do we control the flow of information though the datapath ?
•Single Cycle CPU: All stages of an instruction completed within one long clock cycle
•Clock cycle: sufficiently long 
•Allow each instruction to complete all stages without interruption within one cycle
1. Instruction
Fetch2. Decode/
Register
Read3. Execute 4. Memory5. Reg.
Write
Clock Time PeriodCPU Clocking
•Multiple -cycle CPU: only one stage of instruction per clock cycle
•Length of clock cycle: equal to the time taken for the slowest stage
•Advantages over single cycle execution: 
•Unused stages in a particular instruction can be skipped 
•Instructions can be pipelined (overlapped)
1. Instruction
Fetch2. Decode/
Register
Read3. Execute 4. Memory 5. Register
WriteCPU Clocking
•Multiple -cycle CPU: only one stage of instruction per clock cycle
•Length of clock cycle: equal to the time taken for the slowest stage
•Advantages over single cycle execution: 
•Unused stages in a particular instruction can be skipped 
•Instructions can be pipelined (overlapped)
1. Instruction
Fetch2. Decode/
Register
Read3. Execute 4. Memory 5. Register
Write•CPU Execution Time = I ×CPI×C  = I×CPI×(1/f)Relative Cycle Time
•Assume 4ns for instruction and data memory, 3ns for ALU and adders, and 1ns for register 
reads or writes
•Negligible time for other operations such as mux delay, sign extension, shifting, etc.
http://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture5_cda3101.pdfAnother Look at Single Cycle CPU
•Implementation: easy
•The clock cycle: determined by the longest possible path
http://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture5_cda3101.pdfAnother Look at Single Cycle CPU
•Some functional units: idle for some time
•Some functional unit: duplication required (adder)
•Since it can not be shared during a clock cycle
http://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture5_cda3101.pdfPipeliningWhy Pipelining
A sequential 
laundryWhy Pipelining
A pipelined 
laundry
Five Stages of Instruction Execution
•Stage 1: Instruction Fetch (IF)
•Stage 2: Instruction Decode (ID)
•Stage 3: Arithmetic Logic Operations (EXE)
•Stage 4: Memory Access (MEM)
•Stage 5: Register Write/ Write Back (WB)https://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture7_cda3101.pdf
https://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture7_cda3101.pdfPipelining Stages: An Examplehttps://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture7_cda3101.pdfPipelining Stages: An Example
pipeline registers :tostore data
between cycles .Pipelining: A Numerical ExamplePipelined vs Non -pipelined Implementation
Non -pipelined
PipelinedPipelined vs Non -pipelined Implementation
Non -pipelined
Pipelined•Pipelining doesn’t speedup single task, it helps throughput of entire workload
•Pipeline rate limited by slowest pipeline stage
•Multiple tasks operate simultaneously using different resources
•Potential speedup = Number pipe stages
•Unbalanced lengths of pipe stages reduces speedup
•Time to “fill” pipeline and time to “drain” it reduces speedupPipelining: Some Facts•Balanced stages: each stage of same duration
•Non -pipelined executionSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2
S3 S4 S5 S1 S2
S3 S4 S5 S1 S2TT
T
TTime between instructions =T =5t
Instruction 1
Instruction 2
Instruction 3•Pipelined executionSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2
S3 S4 S5 S1 S2
S3 S4 S5 S1 S2T
Instruction 1
Instruction 2
Instruction 3
Time between instructions = t•TBI: Time between instructions
𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 =𝑡=5𝑡
5=𝑇
5=𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining•TBI: Time between instructions
𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 =𝑡=5𝑡
5=𝑇
5=𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠
𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑
𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining•TBI: Time between instructions
𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 =𝑡=5𝑡
5=𝑇
5=𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠
𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑
𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining
Potential 
Speedup•10 such instructionsSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2TT•Balanced stages: each stage of same duration
•Non -pipelined executionSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2
S3 S4 S5 S1 S2
S3 S4 S5 S1 S2TT
T
TTime between instructions =T =5t
Instruction 1
Instruction 2
Instruction 3•10 such instructions
•Time required for non -pipelined execution
•10×5t= 50tSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2TT•Pipelined executionSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2
S3 S4 S5 S1 S2
S3 S4 S5 S1 S2T
Instruction 1
Instruction 2
Instruction 3
Time between instructions = t•Pipelined executionSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2
S3 S4 S5 S1 S2
S3 S4 S5 S1 S2T
Instruction 1
Instruction 2
Instruction 3
Time between instructions = t•𝑠number of stages in pipeline
•Each stage requires time 𝑡
•Total time required to execute 𝑛
instructions in the pipeline 
•𝑠𝑡+𝑛−1𝑡=𝑠+𝑛−1𝑡•10 such instructions
•Time required for non -pipelined execution
•10×5t= 50t
•Time required for pipelined execution
•5t+(10 -1)t=14tSpeedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2TT•10 such instructions
•Time required for non -pipelined execution
• 10×5t= 50t
•Time required for pipelined execution
• 5t+(10 -1)t=14t
•Speed up =  50𝑡
14𝑡≈3.5Speedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2TT•10 such instructions
•Time required for non -pipelined execution
•10×5t= 50t
•Time required for pipelined execution
•5t+(10 -1)t=14t
•Speed up =  50𝑡
14𝑡≈3.5•100 such instructions
•Time required for non -pipelined execution
•100×5t= 500t
•Time required for pipelined execution
•5t+(100 -1)t=104t
•Speed up =  500𝑡
104𝑡≈4.8Speedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2TT•If the stages are balanced, many instructions to be executed
•𝑆𝑝𝑒𝑒𝑑𝑢𝑝 ≈𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑
𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining
S1 S2 S3 S4 S5t
tt
tt
S1 S2TT•MIPS was designed with pipelining in mind
•All MIPS instructions are the same length –32 bits
•This makes the IF phase universal and simple to implement
•Only three instruction formats, and source register fields are always in the same place
•This means we can read the register file in the ID phase before we even know what the instruction is
•The only memory operations occur in load and store instructions
•We can dedicate the ALU to compute addresses for these instructions
•No need to worry about multiple data memory accesses per instructionPipelining in MIPSPotential Challenges in Pipelining
•Hazards: Situation where next instruction cannot execute in the following cycle
•Structural hazards: Attempting to use the same hardware to do two different things at the same time
•Data hazards: Instruction depends on result of prior instruction still in the pipeline
•Control hazards: The flow of instruction addresses is not known at the time when the next instruction 
must be loaded•How to define the performance of a computer?
•Speed
•Execution time: Time for the completion of a task
•Important in most situations including personal computers
•Throughput/ bandwidth: Total number of work done in a given time
•Important in places like datacenterPerformance of a Computer Performance of a Processor
•Why do we need to focus on the performance of a processor
•Wait for I/O
•Performance metric: CPU execution time
•Time taken by the CPU for computing
•Includes OS routines executed for the program
•Does not include the wait time for I/O•Performance =1
𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒
•For a given program in two computers (actually CPUs of) A and B
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝐴
𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝐵=𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝐵
𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝐴=𝑛=𝑠𝑝𝑒𝑒𝑑𝑢𝑝Performance in Terms of CPU Execution TimeSteps Involved in Processing of an 
InstructionFetch an 
Instruction
Decode the 
Instruction
Fetch the 
Operands
Execute the 
Specified 
OperationStart
EndFetch an 
Instruction
Decode the 
Instruction
Fetch the 
Operands
Execute the 
Specified 
OperationStart
End•Each step of instruction cycle may 
contain a sequence of micro -
operationsFetch an 
Instruction
Decode the 
Instruction
Fetch the 
Operands
Execute the 
Specified 
OperationStart
End•Each step of instruction cycle may 
contain a sequence of micro -
operations
•Each micro -operation: triggered 
by the computer clock and 
executed in one clock cycle
Clock cycle
cycle 1 cycle 2 cycle 3•A machine instruction
•Several micro -operations
•Several clock cycles
•Average Clock Cycles per Instruction (CPI)
•CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time
= Total Number of Clock Cycles ×(1/Clock Rate)
= (Number of Instructions ×CPI) ×(1 / Clock Rate)CPU Execution Time and the Computer Clock•CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time
= Total Number of Clock Cycles ×(1/Clock Rate)
= (Number of Instructions * CPI) ×(1 / Clock Rate)
= Number of Instructions ×CPI ×CPU Clock Cycle Time
= I×CPI×C
= I×CPI×(1/f)CPU Performance Equation•A machine instruction
•Several micro -operations
•Several clock cycles
•Average Clock Cycles per Instruction (CPI)
•CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time
= Total Number of Clock Cycles ×(1/Clock Rate)
= (Number of Instructions ×CPI) ×(1 / Clock Rate)Factors Affecting CPU Execution Time•A machine instruction
•Several micro -operations
•Several clock cycles
•Average Clock Cycles Per Instruction (CPI)
•CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time
= Total Number of Clock Cycles ×(1/Clock Rate)
= (Number of Instructions ×CPI)×(1 / Clock Rate)Factors Affecting CPU Execution Time•A machine instruction
•Several micro -operations
•Several clock cycles
•Average Clock Cycles per Instruction (CPI)
•CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time
= Total Number of Clock Cycles ×(1/Clock Rate)
= (Number of Instructions ×CPI)×(1 / Clock Rate)Factors Affecting CPU Execution TimeCalculation of CPI
Instruction Type Frequency of 
OccurrenceClock Cycle
ALU Instructions 55% 5
Load Instructions 25% 6
Store Instructions 10% 4
Branch Instructions 10 % 2Calculation of CPI
Instruction Type Frequency of 
OccurrenceClock Cycle
ALU Instructions 55% 5
Load Instructions 25% 6
Store Instructions 10% 4
Branch Instructions 10 % 2
CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2
= 4.85Calculation of CPI
Instruction Type Frequency of 
OccurrenceClock Cycle
ALU Instructions 55% 5 (CPIALU)
Load Instructions 25% 6 (CPILoad)
Store Instructions 10% 4 (CPIStore)
Branch Instructions 10 % 2 (CPIBrach )
CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2
= 4.85Calculation of CPI
Instruction Type Frequency of 
OccurrenceClock Cycle
ALU Instructions 55% (FALU) 5 (CPIALU)
Load Instructions 25% ( FLoad) 6 (CPILoad)
Store Instructions 10% ( FStore) 4 (CPIStore)
Branch Instructions 10 % ( FBrach ) 2 (CPIBrach )
CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2
= 4.85Calculation of CPI
Instruction Type Frequency of 
OccurrenceClock Cycle
ALU Instructions 55% (FALU) 5 (CPIALU)
Load Instructions 25% ( FLoad) 6 (CPILoad)
Store Instructions 10% ( FStore) 4 (CPIStore)
Branch Instructions 10 % ( FBrach ) 2 (CPIBrach )
CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2
= (FALU* CPIALU)+ (FLoad* CPILoad)+ (FStore * CPIStore )+ (FBranch * CPIBranch )
= 4.85Calculation of CPI
Instruction Type Frequency of 
OccurrenceClock Cycle
ALU Instructions 55% (FALU) 5 (CPIALU)
Load Instructions 25% ( FLoad) 6 (CPILoad)
Store Instructions 10% ( FStore) 4 (CPIStore)
Branch Instructions 10 % ( FBrach ) 2 (CPIBrach )
CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2
= σ𝑖=1𝑛𝐹𝑖×𝐶𝑃𝐼𝑖
= 4.85•A program with 5,00,0000 instructions
•CPU clock frequency 2 GHz
•CPI = 2.4
What is the CPU execution time?Calculation of CPU Execution Time•A program with 5,00,0000 instructions; i.e. 𝐼=5×106
•CPU clock frequency 2 GHz; i.e𝑓=2×109
•CPI = 2.4Calculation of CPU Execution Time
CPU Execution Time = 5×106×𝟐.𝟒×𝟏
𝟐×𝟏𝟎𝟗•A program with 5,00,0000 instructions; i.e. 𝐼=5×106
•CPU clock frequency 2 GHz; i.e𝑓=2×109
•CPI = 2.4Calculation of CPU Execution Time
CPU Execution Time = 5×106×𝟐.𝟒×𝟏
𝟐×𝟏𝟎𝟗=𝟔×𝟏𝟎−𝟑𝒔•A program with 5,00,0000 instructions; i.e. 𝐼=5×106
•CPU clock frequency 2 GHz; i.e𝑓=2×109
•CPI = 2.4Calculation of CPU Execution Time
CPU Execution Time = 5×106×𝟐.𝟒×𝟏
𝟐×𝟏𝟎𝟗=𝟔𝒎𝒔•𝐼𝑃𝑆=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒
•𝑀𝐼𝑃𝑆=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒×106(million instructions per second)Instructions Per Second•𝑀𝐼𝑃𝑆=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠
𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒×106(million instructions per second)
•CPU Execution Time =(Number of Instructions ×CPI) ×(1 / Clock Rate)
•𝑀𝐼𝑃𝑆=𝐶𝑙𝑜𝑐𝑘𝑅𝑎𝑡𝑒
𝐶𝑃𝐼×106Instructions Per Second•A computer with 200 MHz clock rate 
•Details of instruction execution
Find MIPS. Instructions Per Second: Example
•Million floating -point operations per second
•Measures efficacy is executing floating point operations
•Very important for scientific and real time applications (such as gaming)MFLOPS•A program used for evaluating computer performances
•Why do we need benchmarks?
•Actual Target Workload :  Full applications that run on the target machine
•Real Full Program -based Benchmarks : A specific mix or suite of programs that are typical of 
targeted applications or workload.
•Small “Kernel” Benchmarks :  Key computationally -intensive pieces extracted from real programs. 
Examples: Matrix factorization, FFT , tree search, etc. Best used to test specific aspects of the machine.
•Microbenchmarks : Small, specially written programs to isolate a specific aspect  of performance 
characteristics: integer, floating point,  local memory, input/output, etc.Benchmarks•A program used for evaluating computer performances
•Why do we need benchmarks?
•SPEC (System Performance Evaluation Cooperative)
•Funded by a number of computer vendors
•Created a benchmark set in 1989
•Updated in 1992, 1995, 1999, 2006Benchmarks