Computer Architecture Angshuman Paul Assistant Professor Department of Computer Science & Engineering Levels of Abstraction ElectronsProgramAlgorithmProblem System Software Instruction Set Architecture Microarchitecture Logic DeviceApplication ProgramAlgorithmProblem We have studiedLevels of Abstraction ElectronsProgramAlgorithmProblem System Software Instruction Set Architecture Microarchitecture Logic DeviceApplication ProgramAlgorithmProblem Our next focusImage Source: https://www.industrialpcpro.com/imb -g41a -industrial -micro -atx-motherboardThe MotherboardSubsystems of a Digital Computer (A Simplified View) Central Processing UnitArithmetic & Logic Unit Control UnitMemory Input DevicesOutput DevicesSubsystems of a Digital Computer (A Simplified View) Central Processing UnitArithmetic & Logic Unit Control UnitMemory Input DevicesOutput DevicesSame memory for program and dataVon Neuman Architecture Harvard Architecture Separate memory for program and data •The brain of the computer system •Major component: •Arithmetic Logic Unit (ALU) •Control Unit (CU) •Registers (Local Storage) •General Purpose •Special PurposeThe Processor Programs & Instructions •The processor executes programs •A program is composed of a set of instructions •Instruction set •A set containing all the instructions that a processor understandsRepresentation of Instructions •Machine language •Pattern of binary symbols (0 and 1) •Suitable for representing execution at the gate level •Difficult to interpret by designer •Understood by the digital hardware •Assembly language •Mnemonics for representing instructions •Useful for designer •Not interpretable by the hardwareComponents of an Instruction (An Example) •C code: •a = b + c •MIPS code: •add $s0, $s1, $s2 •MIPS: Microprocessor without Interlocked Pipelined StagesComponents of an Instruction (An Example) add $s0, $s1, $s2 Opcodeadd $s0, $s1, $s2Components of an Instruction (An Example) OperandsComponents of an Instruction •Operation code (opcode) •Perform the operation : ADD, SUB, MPY, etc. •Source operand •Onthe content of: register, memory location •Destination operand •Put the result hereMIPS Registers: General Purpose Registers •32 bit registers •32 registers •MIPS is load store architecture •Only load and store can access memory directly •Usage of general -purpose registers •Holding temporary variables in expression evaluation •Passing parametersThe Number of Registers •More registers •Possible to allocate more variables in registers (fast accesses) •Reduction in number of memory access •Longer specifiers for registers (more bits for encoding) •Increase in register access time (physical registers) •More registers to save during context switchThe Memory •Large capacity •Slower access •Addressing mode •Decides how to access data •Direct from memory •From registersMemory Organization •Viewed as a large single dimensional array •Byte addressable •An address is an index into the arraybits 0 1 2 3 4 5 6 7Instructions that Access Memory •C Code: A[10] = A[10]+b •MIPS code: •lw$t0, 40($s2) [s2 hold the starting address of the array A] •add $t0, $s1, $t0 •sw$t0, 40($s2)Instruction Format: Major Considerations •Byte -Ordering •Instruction Length •Number of operands •0, 1, 2, 3 •Specification of operand location •Register, immediate, indirect, relative, etc. •Possible size of operands •Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Byte Ordering •Big endian •Little endian Byte Ordering •Big endian: IBM z/Architecture, Freescale ColdFire •Little endian: The Intel x86 and AMD64 / x86 -64 •Bi-endian: ARM AArch64, C -Sky, Power ISA, and RISC -V •Can be switched using softwareInstruction Format: Major Considerations •Byte -Ordering •Instruction Length •Number of operands •0, 1, 2, 3 •Specification of operand location •Register, immediate, indirect, relative, etc. •Possible size of operands •Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Instruction Length •Fixed: MARIE (16 bit) •Less complexity •Possible wastage of space •Variable length: x86 •Common instruction: smaller size •Saves space •More complex to decodeInstruction Format: Major Considerations •Byte -Ordering •Instruction Length •Number of operands •0, 1, 2, 3 •Specification of operand location •Register, immediate, indirect, relative, etc. •Possible size of operands •Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Number of Operands: Zero Address Instructions •Zero address instructions: in stack based instruction sets (1960s and 1970s) •Stack: a portion of memory •Implicit operands on stack •Push a r d•Push a a r dNumber of Operands: Zero Address Instructions•Push a •Push b a r dNumber of Operands: Zero Address Instructionsb a r d•Push a •Push bNumber of Operands: Zero Address Instructions•Push a •Push b •add b a r dNumber of Operands: Zero Address Instructions•Push a •Push b •add b a r dNumber of Operands: Zero Address Instructions•Push a •Push b •add •add does not use the address field •Zero -address instruction a+b r dNumber of Operands: Zero Address InstructionsNumber of Operands: One Address Instructions •One address instructions: in accumulator based instruction sets (1960s) •Accumulator: a general purpose register in CPU •Load X …. b ….X …. q ….YNumber of Operands: One Address Instructions •One address instructions: use an implied accumulator for operation on data •Accumulator: a general purpose register in CPU •Load X •acc=Mem[X]…. b ….X …. q ….YNumber of Operands: One Address Instructions •One address instructions: use an implied accumulator for operation on data •Accumulator: a general purpose register in CPU •Load X •acc=Mem[X] •Add Y…. b ….X …. q ….YNumber of Operands: One Address Instructions •One address instructions: use an implied accumulator for operation on data •Accumulator: a general purpose register in CPU •Load X •acc=Mem[X] •Add Y •acc= acc+ Mem [Y]…. b ….X …. q ….YNumber of Operands: Two Address Instructions •Two address instructions: May use •Either two registers (register register based) •Add R1, R2 (R1= R1+R2) •Or a register and a memory location (register memory based) •Add R1, X (R1= R1+Mem[X]) •Or two memory locations (memory memory based) •Add X, Y (Mem[X]=Mem [X]+ Mem [Y])•Three address instructions: •Example: Add R1, R2, R3 (R1= R2+R3) •May involve registers and memory locationsNumber of Operands: Three Address Instructions•y=(a+b)/(c+d) •Assume that a, b, c, and d are stored in memory locations A, B, C, and D •Three address: •Add R1, A, B (R1=Mem[A]+Mem[B]) •Add R2, C, D (R2=Mem[C]+Mem[D]) •DivR3, R1, R2 (R3=R1/R2)Number of Operands: An Example•y=(a+b)/(c+d) •Assume that a, b, c, and d are stored in memory locations A, B, C, and D •Two address: •Load R1, A (R1=Mem[A]) •Add R1, B (R1=R1+Mem[B]) •Load R2, C (R2=Mem[C]) •Add R2, D (R2=R2+Mem[D]) •DivR1, R2 (R1=R1/R2)Number of Operands: An Example•y=(a+b)/(c+d) •Assume that a, b, c, and d are stored in memory locations A, B, C, and D •One address: •Load C (Acc= Mem[C]) •Add D (Acc= Acc+Mem [D]) •Stor R (Mem[R]=Acc) •Load A (Acc= Mem[A]) •Add B (Acc= Acc+Mem [B]) •DivR (Acc=Acc/Mem[R])Number of Operands: An Example•Fewer address •Less complex but low flexibility •May require more memory access: increase in execution time •More instructions per program: more number of instruction fetches •More addresses •More involvement of registers: faster operations •Longer instructions for specifying more operands •Less instructions per programNumber of Operands: Relative AnalysisInstruction Format: Major Considerations •Byte -Ordering •Instruction Length •Number of operands •0, 1, 2, 3 •Specification of operand location •Register, immediate, indirect, relative, etc. •Possible size of operands •Byte, half -word (2 byte), word (4 byte), double (8 byte), etc.Operand Location: Addressing Modes •Determines how operands will be accessed •Addressing modes are related to specific operands but not the instruction as a whole •Possible addressing modes •Implied •Immediate •Direct •Indirect •Register •Register Indirect •Relative •Displacement (Indexed) •Stack •Different instruction sets use different sets of addressing modesImplied Addressing •Operand location is implicitly specified in the opcode itself •Does not contain any explicit operand •Example: INCA •Increment the content of accumulator •Zero address stack based instructions also use implied addressingImmediate Addressing •Operand is part of instruction •addi $t1, $t2, 5 •add 5 •No memory/register reference for the corresponding operand •Faster execution •Explicit specification of the corresponding operand: less flexibility Direct (Absolute) Addressing •Address of operand is specified in the instruction •add r1, 2066H •Memory access required •Limited address space (maximum 2𝑟, where ) r is the number of address field in the instruction…. b ….2066H …. …. ….Address A OpcodeInstruction Memory OperandDirect AddressingIndirect Addressing •Memory location pointed to by address field of the instruction contains the address of the operandAddress A Opcode Memory OperandAddress of operand•Two memory access required for one operand •Slower •If the content of memory is of n bit •Size of addressable memory 2𝑛 •Usually •No. of bit in a memory location > No. of address bits in an instruction •Access to large address spaceIndirect Addressing•Operand is held in a register mentioned in the instruction •Add R1, R2 •Limited number of registers •May be used for storing limited amount of data a given point of time •Small address field needed •Shorter instructions •Faster fetch •Involves registers •Fastest memoryRegister AddressingRegister Address R OpcodeInstruction Register Bank OperandRegister Addressing•The memory address of the operand is held in a register mentioned in the instruction •Add R1, (R5) •PC=R1+Mem[R5] •Indirect addressing with one memory access •Large address spaceRegister Indirect AddressingRegister Indirect Addressing Register Address R OpcodeInstruction Memory OperandAddress of Operand Register Bank•The content of the register is automatically incremented or decremented after instruction execution •Add R1, (R2)+ •R1 = R1 +M[R2] •R2 = R2 + d •Useful for implementing stacksAuto Indexing•The program counter holds a base address •The instruction specifies an immediate offset •Effective address of operand= Base address + offsetRelative Addressing (PC Relative) Opcode Offset Base AddressInstruction PCOperand•Offset limited, accessible address is also limitedRelative Addressing Opcode Offset Base AddressInstruction CPU Register (e.g. PC)Operand•A CPU register is used as index register •An explicit offset is specified in the instruction •Load R1, 1050 (R2) (R1=Mem[1050+R2]) •May be used for accessing the elements of an arrayIndexed Addressing Opcode Index register Offset IndexInstruction CPU Register R2 Operand•Operand is (implicitly) on top of stack •Push, PopStack AddressingInstruction Format Opcode Operand 1 Operand 2•A simple instruction format •The instruction format may depend on the type of instructionTypes of Instructions •Data movement instructions •Register •Memory •Stack •I/O instructions •Data processing instructions •Arithmetic •Logical •Control •Systems Control •Transfer of controlData Movement Instructions •Most frequently used •Possible data movements •Memory -> Register •Register -> Register •Register -> Memory •Restrict the instructions that can move data to and from memory •Load -store architecture •Reduction of memory traffic •Example Load, Store, Move, Exchange, Clear, Set etc.Data Movement Instructions •May have different instructions depending on the •Size of operand •32 bit word •Half -word etc. •Location of operands •Memory -> Register •Register -> Register •Example IBM S/390 •May have different instruction for I/OTypes of Instructions •Data movement instructions •Register •Memory •Stack •I/O •Data processing instructions •Arithmetic •Logical •Control •Systems Control •Transfer of controlData Processing Instructions: Arithmetic •For operations on •Integer •Floating point numbers •May have different arithmetic instructions depending on data size •May use a separate set of register for floating point numbersData Processing Instructions: Logical •Bitwise operations: AND, OR, NOT, XOR, TEST, CMP , SET •Bit manipulation instructions •Logical left/ right shift •arithmetic left/ right shiftArithmetic ShiftsArithmetic Shift: Right Shift 1 0 1 1 1 1 0 1Signed Negative Number S S0 0 1 1 0 0 0 1Signed Positive Number S SArithmetic Shift: Left Shift 1 0 1 1 0 1 1 0Signed Negative Number S S0 0 0 1 1 0 1 1 0Signed Positive Number S S0Arithmetic Shift: Left Shift 1 0 1 1 0 1 1 0Signed Negative Number S S0 0 1 1 0 1 1 0Signed Positive Number S SArithmetic Shift: Left Shift 1 0 1 1 0 1 1 0Signed Negative Number S S0 0 1 1 0 1 1 0Signed Positive Number S S 11 0 1 1 0 1 1 0Left Shift 0 1 0 1 1 0 1 0 1Right Shift 0Logical ShiftTypes of Instructions •Data movement instructions •Register •Memory •Stack •I/O •Data processing instructions •Arithmetic •Logical •Control •Systems Control •Transfer of controlControl Instructions: Transfer of Control •PC holds the address of the next instruction to be fetched •Transfer of control: do not execute the next PC value •Transfer control to another part of the instruction spaceControl Instructions: Transfer of Control •Branch: •Conditional: tests a condition and based on the outcome, jumps to a given location •Unconditional: jumps to a location without testing for any condition •BRZ X (branch to X if result is zero) •BRP X (positive) •BRN X (negative) •BRE X, R1, R2 (equal) •Skip •CALL, RETControl Instructions: Transfer of Control Memory Location Instruction 2600 H ADD A, R1, R2 2604 H BRP 2616 H 2608H … 2612 H … 2616 H SUB A, R3, R2 2620 H BR 2600 H 2624 H …Control Instructions: Transfer of Control Memory Location Instruction 2600 H ADD A, R1, R2 2604 H BRP 2616 H 2608H … 2612 H … 2616 H SUB A, R3, R2 2620 H BR 2600 H 2624 H …Conditional BranchUnconditional BranchControl Instructions: Transfer of Control Source: https://www.freetimelearning.com/c -language/c -language -unconditional -control -statements.phpControl Instructions: Transfer of Control Source: https://www.freetimelearning.com/c -language/c -language -unconditional -control -statements.phpMemory Location Instruction 2600 H ADD A, R1, R2 2604 H BRP 2616 H 2608H … 2612 H … 2616 H SUB A, R3, R2 2620 H BR 2600 H 2624 H …Conditional BranchUnconditional BranchControl Instructions: Transfer of Control •Branch: •Conditional: tests a condition and based on the outcome, jumps to a given location •Unconditional: jumps to a location without testing for any condition •BRZ X (branch to X if result is zero) •BRP X (positive) •BRN X (negative) •BRE X, R1, R2 (equal) •Skip: branching with implied address •Typically skips one instruction •CALL, RETControl Instructions: Transfer of Control •Procedure calls •Special branching instructions •Requires to store return address •Storage of return address: different for different machines •Memory •Register •StackControl Instructions: Systems Control •Halting instructions: •NOP: no operation •No change in processor state •Advances the program counter •Why do we need this? •Halt: •Brings processor to an orderly halt •May be restarted by an interrupt, external action etc. •Sleep mode •Halt in OS: invoked by privileged system softwareControl Instructions: Systems Control •Interrupt instructions: •Normal execution gets suspended •May be invoked by I/O or an instruction to get serviced •Asynchronous interrupt •Generated by hardware devices other than processor •May be generated at any time •Synchronous interrupt •Produced by control unit during instruction execution •Handles exception (such as division by zero)Instruction Set: Use of Assembly Language •CPU: executes instructions •Primitive operations •We represent instructions using assembly language •Readable by human •Different CPUs implement different sets of instructions •Instruction set •Instruction Set Architecture: specific to CPUs •Intel 80x86, •IBM/Motorola PowerPC (old Macintosh), •MIPS, etc...Instruction Set Architectures •Early trend in the design of instruction set architecture •Add more and more instructions to the instruction set •Separate instructions for as many operations as possible •Compound instructions: long execution time •VAX Poly •Complex Instruction Set Computers (CISC) •RISC –Reduced Instruction Set Computers •Keep the instruction set small and simple •Makes it easier to build fast hardware •Let software do complicated operations by composing simpler ones.RISC •Instructions are easy to decode •Reduced CPI •Many general purpose registers •Increase in the number of instructions per program •Too many assembly language instructions per program •Difficult to ensure correctness •Most instruction sets (e.g., MIPS, ARM)CISC •Instructions are difficult to decode •Increased CPI •Less number of general purpose registers •Less number of instructions per program •Assembly language programming is easier •Intel x86RISC vs CISC: An Example •Multiplying two numbers stored in memory locations X and Y CISC RISC MULT X,Y LOAD R1, X LOAD R2, Y MUL A, R1, R2MIPS Architecture •MIPS –a RISC architectures •Why MIPS instead of Intel x86? •Simple •Used in embedded apps •RISC Registers •A fast storage location for data inside the processor •limited number, expensive •Arithmetic, logic operations can only be performed on the content of registers •Faster execution of programsGeneral Purpose Registers in MIPS 32 •32 general purpose registers in MIPS •Each register is of 32 bit (32 bit -> word) •Registers are numbered from 0 to 31 •Each register can be referred to by number or name •Number references •$0, $1, $2, … $30, $31 •MIPS has some special purpose registers as well •hi, lo, etc.Register Number Conventional Name Usage $0 $zero Hard -wired to 0 $1 $at Reserved for the assembler $2 -$3 $v0, $v1 Return values from functions $4 -$7 $a0 -$a3Arguments to functions -not preserved by subprograms $8 -$15 $t0 -$t7 Temporary data, not preserved by subprograms $16 -$23 $s0 -$s7 Saved registers, preserved by subprograms $24 -$25 $t8 -$t9More temporary registers, not preserved by subprograms $26 -$27 $k0 -$k1 Reserved for kernel. Do not use. $28 $gp Global Area Pointer (base of global data segment) $29 $sp Stack Pointer $30 $fp Frame Pointer $31 $ra Return AddressMIPS General Purpose RegistersMIPS General Purpose Registers Register Number Conventional Name Usage $0 $zero Hard -wired to 0 $1 $at Reserved for the assembler $2 -$3 $v0, $v1 Return values from functions $4 -$7 $a0 -$a3Arguments to functions -not preserved by subprograms $8 -$15 $t0 -$t7 Temporary data, not preserved by subprograms $16 -$23 $s0 -$s7 Saved registers, preserved by subprograms $24 -$25 $t8 -$t9More temporary registers, not preserved by subprograms $26 -$27 $k0 -$k1 Reserved for kernel. Do not use. $28 $gp Global Area Pointer (base of global data segment) $29 $sp Stack Pointer $30 $fp Frame Pointer $31 $ra Return AddressMIPS General Purpose Registers Register Number Conventional Name Usage $0 $zero Hard -wired to 0 $1 $at Reserved for the assembler $2 -$3 $v0, $v1 Return values from functions $4 -$7 $a0 -$a3Arguments to functions -not preserved by subprograms $8 -$15 $t0 -$t7 Temporary data, not preserved by subprograms $16 -$23 $s0 -$s7 Saved registers, preserved by subprograms $24 -$25 $t8 -$t9More temporary registers, not preserved by subprograms $26 -$27 $k0 -$k1 Reserved for kernel. Do not use. $28 $gp Global Area Pointer (base of global data segment) $29 $sp Stack Pointer $30 $fp Frame Pointer $31 $ra Return AddressMIPS: Other Registers •HI/ LO: to store results of multiply instructions •During integer division •Remainder in HI •Quotient in LO •PC: program counter •Points to the next instruction to be executed •The two low order bits are always zero. Why?op1 op2 HI LOMIPS: Types of Instructions •Three types of instructions •R-type: when all data values used by an instruction is located in registers opcode rs rt rdshift (shamt )funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsR Format: An Example •add $t0, $s1, $s2 000000 10001 10010 01000 00000 100000 ALU s1 s2 t0 shamt addopcode rs rt rd shift ( shamt ) funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsR Format: An Example •sub $t0, $s1, $s2 000000 10001 10010 01000 00000 100010 ALU s1 s2 t0 shamt subopcode rs rt rd shift ( shamt ) funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsMIPS: Types of Instructions •Three types of instructions •I-type: for instructions that operate on an immediate value and a register value opcode rs rt IMM 6 bits 5 bits 5 bits 16 bitsMIPS: Types of Instructions •addi $s1, $s2, 5 001000 10010 10001 0000000000000101 addi s2 s1 5opcode rs rt IMM 6 bits 5 bits 5 bits 16 bitsMIPS: Types of Instructions •Three types of instructions •J-type: for executing jump opcode Pseudo -address 6 bits 26 bitsMIPS: Types of Instructions •j 2600 H 000010 00000000000010011000000000 j 2600 Hopcode Pseudo -address 6 bits 26 bitsTypes of Instructions in MIPS •Data movement instructions •Register •Memory •Stack •I/O •Data processing instructions •Arithmetic •Logical •Control •Systems Control •Transfer of controlData Movement Instructions in MIPS Instruction Example Meaning Comments load word lw $s1,100($s2) $s1=Memory[$s2+100] Copy from memory to register store word sw $s1,100($s2) Memory[$s2+100]=$s1 Copy from register to memory load upper immediatelui$s1,55 $s1=55x2^16 Load constant into upper 16 bits. Lower 16 bits are set to zero. load address la $s1,label $s1=Address of label Pseudo -instruction (provided by assembler, not processor!) Loads computed address of label (not its contents) into register load immediate li $s1,100 $s1=100 Pseudo -instruction (provided by assembler, not processor!) Loads immediate value into register move from hi mfhi $s2 $s2=hi Copy from special register hi to general register move from lo mflo $s2 $s2=lo Copy from special register lo to general register move move $s1,$s2 $s1=$s2 Pseudo -instruction (provided by assembler, not processor) Copy from register to register.Data Movement Instructions in MIPS Data Movement Instructions •May have different instructions depending on the •Size of operand •32 bit word •Half -word etc. •Location of operands •Memory -> Register •Register -> Register •Example IBM S/390 •May have different instruction for I/OData Processing Instructions in MIPS: Arithmetic Instruction Example Meaning Comments add add $t1,$t2,$t3 $t1=$t2+$t3 subtract sub $t1,$t2,$t3 $t1=$t2 -$t3 add immediate addi $t1,$t2,100 $t1=$t2+100 Addition with a constant number add unsigned addu $t1,$t2,$t3 $t1=$t2+$t3 Values are treated as unsigned integers, not two's complement integers subtract unsigned subu $t1,$t2,$t3 $t1=$t2 -$t3 Values are treated as unsigned integers, not two's complement integers add immediate unsigned addiu $t1,$t2,100 $t1=$t2+100 Values are treated as unsigned integers, not two's complement integers Multiply (without overflow) mul $t1,$t2,$t3 $t1=$t2*$t3 Result is only 32 bits Multiply mult $t2,$t3 $hi, $low=$t2*$t3 Upper 32 bits stored in special register hi Lower 32 bits stored in special register lo Divide div $t2,$t3 $hi, $low=$t2/$t3 Remainder stored in special register hi Quotient stored in special register lo Unsigned Divide divu $t2,$t3 $hi, $low=$t2/$t3 $t2 and $t3 store unsigned values. Remainder stored in special register hi Quotient stored in special register loData Processing Instructions in MIPS: Logical Instruction Example Meaning Comments and and $t1,$t2,$t3 $t1=$t2&$t3 Bitwise AND or or $t1,$t2,$t3 $t1=$t2|$t3 Bitwise OR and immediate andi $t1,$t2,100 $t1=$t2&100 Bitwise AND with immediate value or immediate ori$t1,$t2,100 $t1=$t2|100 Bitwise OR with immediate value shift left logical sll $t1,$t2,10 $t1=$t2<<10 Shift left by constant number of bits shift right logical srl $t1,$t2,10 $t1=$t2>>10 Shift right by constant number of bitsR Format: An Example •add $t0, $s1, $s2 000000 10001 10010 01000 00000 100000 ALU s1 s2 t0 shamt addopcode rs rt rd shift ( shamt ) funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsR Format: An Example •add $t0, $s1, $s2 •sll$s1, $s2, 5000000 10001 10010 01000 00000 100000 ALU s1 s2 t0 shamt addopcode rs rt rd shift ( shamt ) funct 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits 000000 00000 10010 10001 00101 000000 Special 0 s2 s1 shamt sllTransfer of Control Instructions in MIPS Instruction Example Meaning Comments branch on equal beq $s1,$s2,100 if($s1==$s2) go to PC+4+100 Test if registers are equal branch on not equal bne $s1,$s2,100 if($s1!=$s2) go to PC+4+100 Test if registers are not equal branch on greater than bgt $s1,$s2,100 if($s1>$s2) go to PC+4+100 Pseduo -instruction branch on greater than or equalbge $s1,$s2,100 if($s1>=$s2) go to PC+4+100 Pseduo -instruction branch on less than blt$s1,$s2,100 if($s1<$s2) go to PC+4+100 Pseduo -instruction branch on less than or equalble $s1,$s2,100 if($s1<=$s2) go to PC+4+100 Pseduo -instructionTransfer of Control Instructions in MIPS Instruction Example Meaning Comments jump j 1000 go to address 1000 Jump to target address jump register jr $s1 go to address stored in $s1 For switch, procedure return jump and link jal1000 $ra=PC+4; go to address 1000 Use when making procedure call. This saves the return address in $raComparison Instructions in MIPS Instruction Example Meaning Comments set on less than slt $s1,$s2,$s3 if($s2<$s3)$s1=1; else $s1=0Test if less than. If true, set $s1 to 1. Otherwise, set $s1 to 0. set on less than immediateslti $s1,$s2,100 if($s2<100)$s1=1; else $s1=0Test if less than. If true, set $s1 to 1. Otherwise, set $s1 to 0Memory Alignment •Memory is byte addressable Bits 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 260A 260B 260C 260D 260E 260FByte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …•A MIPS word: 4 bytes Bits 2600 2601 2602 2603 2604 2605 2606 2607 2608 2609 260A 260B 260C 260D 260E 260FWord 1 Word 2Memory Alignment Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …•A word occupies 4 contiguous bytes Bits 2600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …Byte 0 2600 … … … … …Byte 1 Byte 2 Byte 3 Word 1 2604 Word 2 2608 Word 3Memory Alignment •A word occupies 4 contiguous bytes2600 … … … … …Word 1 2604 Word 2 2608 Word 3Memory Alignment •Several instructions require memory alignment, •In case of those instructions, data can be loaded/ stored in memory starting only from/ to specific memory locations Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•A word must start at an address divisible by 4 Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•Valid word address: 2600H, 2604H, 260CH, etc. •Invalid word address: 2601H, 2605H, etc. 2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lw, swrequire target addresses to be memory aligned 2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lw, swrequire target addresses to be memory aligned 2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32601H 2605H 2609H 260DH … … … …Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lw, swrequire target addresses to be memory aligned 2600H 2604H 2608H 260CH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment: Example •lw, swrequire target addresses to be memory aligned Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …2600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment: Example •lw, swrequire target addresses to be memory aligned Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …2600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment: Example •lw, swrequire target addresses to be memory aligned Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32601H 2605H 2609H 260DH … … … …Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 2600H 2604H 2608H 260CH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32602H 2606H 260AH 260EH … … … …2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lh, lhu, shrequire target addresses to be memory aligned 2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment: Example •lh, lhu, shrequire target addresses to be memory aligned Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …2600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment: Example •lh, lhu, shrequire target addresses to be memory aligned Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …2600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment: Example •lh, lhu, shrequire target addresses to be memory aligned Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lb, lbu, sb 2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 3Memory Alignment: Example 2600 … … … … …Word 1 2604 Word 2 2608 Word 3•lb, lbu, sb 2600H 2604H 2608H 260CH … … … …2601H 2605H 2609H 260DH … … … …2602H 2606H 260AH 260EH … … … …2603H 2607H 260BH 260FH … … … …Byte 0 Byte 1 Byte 2 Byte 32600 2603 2604 2607 2608 260B 260C 260FWord 1 Word 2 Word 3 Word 4Memory Alignment Not Required: Example •lwl, lwr, swl, swr can access bytes starting from any location Most Significant Least SignificantWord 1Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 … … …An Example C Program: Condition •C program segment if (i== j) f=g+h; else f=g-h; … #next set of C codes •Consider the following register allocations: f: $s0 g: $s1 h: $s2 i: $s3 j: $s4Next set of C codesi== j? f=g+h f=g-h(false) i != j(true) i == j•MIPS code: beq $s3,$s4, Label1 # branch i==j sub $s0,$s1,$s2 # f=g -h(false) j Label2 # goto Label2 Label1: add $s0,$s1,$s2 # f= g+h (true) Label2: … #next set of instructionsC program segment if (i== j) f=g+h; else f=g-h; … #next set of C codes Next set of C codesi== j? f=g+hf=g-h(false) i != j(true) i == j Label1 Label2An Example C Program: Condition•C Code segment sum=0; i=0; do { sum=sum + A[ i]; i=i+1; } while ( i the value in $t0 + 20 •This addition is performed in this stageStage 4: Memory Access •Used only by instructions that access memory •In MIPS •lw, sw, etc. Stage 5: Register Write •Write the results in a register •Especially for arithmetic and logic instructions •add $t1, $s1, $s2 •What about store, branch, jump instructions? •Don’t write anything into a register at the end •These instructions remain idle during this fifth stage or skip it all togetherInstruction Execution instruction memory +4rtrsrd registersALU Data memory imm 1. Instruction Fetch2. Decode/ Register Read3. Execute/ Perform Arithmetic Logic Operations 4. Memory5. Register WritePC•add $r3,$r1,$r2 # r3 = r1+r2 •Stage 1: fetch this instruction, increment PC •Stage 2: decode to determine it is an add, then read registers $r1 and $r2 •Stage 3: add the two values retrieved in Stage 2 •Stage 4: idle (nothing to write to memory) •Stage 5: write result of Stage 3 into register $r3Analyzing the Datapath: addinstruction memory +4 registersALU Data memory imm213add r3, r1, r2reg[1]+ reg[2] reg[2]reg[1]Example: add InstructionPCALUinstruction memory +4 registers Data memory imm31xlwr3, 17(r1)reg[1] +17 17reg[1] MEM[r1+17]Example: lwInstructionPCDatapath and Control •Datapath based on data transfers required to execute instructions •Controller causes the right transfers to take place PC instruction memory +4rtrsrd registers Data memory immALU Controlleropcode, functRequired Hardware •Program Counter (PC) •Holds the address of the next instruction to be fetched •General Purpose Registers (GPR): •Used in Stages 2 (Read) and 5 (Write) •MIPS has 32 GPRs •Memory: used in Stages 1 (Fetch) and 4 (R/W) •Use of cache •ALU: used in stage 3 •Misc. registers: inserted between stages to hold data and control signals•Stage 1: Instruction Fetch •Stage 2: Instruction Decode •Stage 3: Arithmetic Logic Operations •Stage 4: Memory Access •Stage 5: Register WriteCPU Clocking •For each instruction, how do we control the flow of information though the datapath ? •Single Cycle CPU: All stages of an instruction completed within one long clock cycle •Clock cycle: sufficiently long •Allow each instruction to complete all stages without interruption within one cycle 1. Instruction Fetch2. Decode/ Register Read3. Execute 4. Memory5. Reg. Write Clock Time PeriodCPU Clocking •Multiple -cycle CPU: only one stage of instruction per clock cycle •Length of clock cycle: equal to the time taken for the slowest stage •Advantages over single cycle execution: •Unused stages in a particular instruction can be skipped •Instructions can be pipelined (overlapped) 1. Instruction Fetch2. Decode/ Register Read3. Execute 4. Memory 5. Register WriteCPU Clocking •Multiple -cycle CPU: only one stage of instruction per clock cycle •Length of clock cycle: equal to the time taken for the slowest stage •Advantages over single cycle execution: •Unused stages in a particular instruction can be skipped •Instructions can be pipelined (overlapped) 1. Instruction Fetch2. Decode/ Register Read3. Execute 4. Memory 5. Register Write•CPU Execution Time = I ×CPI×C = I×CPI×(1/f)Relative Cycle Time •Assume 4ns for instruction and data memory, 3ns for ALU and adders, and 1ns for register reads or writes •Negligible time for other operations such as mux delay, sign extension, shifting, etc. http://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture5_cda3101.pdfAnother Look at Single Cycle CPU •Implementation: easy •The clock cycle: determined by the longest possible path http://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture5_cda3101.pdfAnother Look at Single Cycle CPU •Some functional units: idle for some time •Some functional unit: duplication required (adder) •Since it can not be shared during a clock cycle http://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture5_cda3101.pdfPipeliningWhy Pipelining A sequential laundryWhy Pipelining A pipelined laundry Five Stages of Instruction Execution •Stage 1: Instruction Fetch (IF) •Stage 2: Instruction Decode (ID) •Stage 3: Arithmetic Logic Operations (EXE) •Stage 4: Memory Access (MEM) •Stage 5: Register Write/ Write Back (WB)https://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture7_cda3101.pdf https://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture7_cda3101.pdfPipelining Stages: An Examplehttps://www.cs.fsu.edu/~zwang/files/cda3101/Fall2017/Lecture7_cda3101.pdfPipelining Stages: An Example pipeline registers :tostore data between cycles .Pipelining: A Numerical ExamplePipelined vs Non -pipelined Implementation Non -pipelined PipelinedPipelined vs Non -pipelined Implementation Non -pipelined Pipelined•Pipelining doesn’t speedup single task, it helps throughput of entire workload •Pipeline rate limited by slowest pipeline stage •Multiple tasks operate simultaneously using different resources •Potential speedup = Number pipe stages •Unbalanced lengths of pipe stages reduces speedup •Time to “fill” pipeline and time to “drain” it reduces speedupPipelining: Some Facts•Balanced stages: each stage of same duration •Non -pipelined executionSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2TT T TTime between instructions =T =5t Instruction 1 Instruction 2 Instruction 3•Pipelined executionSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2T Instruction 1 Instruction 2 Instruction 3 Time between instructions = t•TBI: Time between instructions 𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 =𝑡=5𝑡 5=𝑇 5=𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining•TBI: Time between instructions 𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 =𝑡=5𝑡 5=𝑇 5=𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠 𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining•TBI: Time between instructions 𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 =𝑡=5𝑡 5=𝑇 5=𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠 𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining Potential Speedup•10 such instructionsSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2TT•Balanced stages: each stage of same duration •Non -pipelined executionSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2TT T TTime between instructions =T =5t Instruction 1 Instruction 2 Instruction 3•10 such instructions •Time required for non -pipelined execution •10×5t= 50tSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2TT•Pipelined executionSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2T Instruction 1 Instruction 2 Instruction 3 Time between instructions = t•Pipelined executionSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2 S3 S4 S5 S1 S2 S3 S4 S5 S1 S2T Instruction 1 Instruction 2 Instruction 3 Time between instructions = t•𝑠number of stages in pipeline •Each stage requires time 𝑡 •Total time required to execute 𝑛 instructions in the pipeline •𝑠𝑡+𝑛−1𝑡=𝑠+𝑛−1𝑡•10 such instructions •Time required for non -pipelined execution •10×5t= 50t •Time required for pipelined execution •5t+(10 -1)t=14tSpeedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2TT•10 such instructions •Time required for non -pipelined execution • 10×5t= 50t •Time required for pipelined execution • 5t+(10 -1)t=14t •Speed up = 50𝑡 14𝑡≈3.5Speedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2TT•10 such instructions •Time required for non -pipelined execution •10×5t= 50t •Time required for pipelined execution •5t+(10 -1)t=14t •Speed up = 50𝑡 14𝑡≈3.5•100 such instructions •Time required for non -pipelined execution •100×5t= 500t •Time required for pipelined execution •5t+(100 -1)t=104t •Speed up = 500𝑡 104𝑡≈4.8Speedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2TT•If the stages are balanced, many instructions to be executed •𝑆𝑝𝑒𝑒𝑑𝑢𝑝 ≈𝑇𝐵𝐼𝑛𝑜𝑛−𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑 𝑇𝐵𝐼𝑝𝑖𝑝𝑒𝑙𝑖𝑛𝑒𝑑=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝑠𝑡𝑎𝑔𝑒𝑠Speedup in Pipelining S1 S2 S3 S4 S5t tt tt S1 S2TT•MIPS was designed with pipelining in mind •All MIPS instructions are the same length –32 bits •This makes the IF phase universal and simple to implement •Only three instruction formats, and source register fields are always in the same place •This means we can read the register file in the ID phase before we even know what the instruction is •The only memory operations occur in load and store instructions •We can dedicate the ALU to compute addresses for these instructions •No need to worry about multiple data memory accesses per instructionPipelining in MIPSPotential Challenges in Pipelining •Hazards: Situation where next instruction cannot execute in the following cycle •Structural hazards: Attempting to use the same hardware to do two different things at the same time •Data hazards: Instruction depends on result of prior instruction still in the pipeline •Control hazards: The flow of instruction addresses is not known at the time when the next instruction must be loaded•How to define the performance of a computer? •Speed •Execution time: Time for the completion of a task •Important in most situations including personal computers •Throughput/ bandwidth: Total number of work done in a given time •Important in places like datacenterPerformance of a Computer Performance of a Processor •Why do we need to focus on the performance of a processor •Wait for I/O •Performance metric: CPU execution time •Time taken by the CPU for computing •Includes OS routines executed for the program •Does not include the wait time for I/O•Performance =1 𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒 •For a given program in two computers (actually CPUs of) A and B 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝐴 𝑃𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒 𝐵=𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝐵 𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑇𝑖𝑚𝑒𝐴=𝑛=𝑠𝑝𝑒𝑒𝑑𝑢𝑝Performance in Terms of CPU Execution TimeSteps Involved in Processing of an InstructionFetch an Instruction Decode the Instruction Fetch the Operands Execute the Specified OperationStart EndFetch an Instruction Decode the Instruction Fetch the Operands Execute the Specified OperationStart End•Each step of instruction cycle may contain a sequence of micro - operationsFetch an Instruction Decode the Instruction Fetch the Operands Execute the Specified OperationStart End•Each step of instruction cycle may contain a sequence of micro - operations •Each micro -operation: triggered by the computer clock and executed in one clock cycle Clock cycle cycle 1 cycle 2 cycle 3•A machine instruction •Several micro -operations •Several clock cycles •Average Clock Cycles per Instruction (CPI) •CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time = Total Number of Clock Cycles ×(1/Clock Rate) = (Number of Instructions ×CPI) ×(1 / Clock Rate)CPU Execution Time and the Computer Clock•CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time = Total Number of Clock Cycles ×(1/Clock Rate) = (Number of Instructions * CPI) ×(1 / Clock Rate) = Number of Instructions ×CPI ×CPU Clock Cycle Time = I×CPI×C = I×CPI×(1/f)CPU Performance Equation•A machine instruction •Several micro -operations •Several clock cycles •Average Clock Cycles per Instruction (CPI) •CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time = Total Number of Clock Cycles ×(1/Clock Rate) = (Number of Instructions ×CPI) ×(1 / Clock Rate)Factors Affecting CPU Execution Time•A machine instruction •Several micro -operations •Several clock cycles •Average Clock Cycles Per Instruction (CPI) •CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time = Total Number of Clock Cycles ×(1/Clock Rate) = (Number of Instructions ×CPI)×(1 / Clock Rate)Factors Affecting CPU Execution Time•A machine instruction •Several micro -operations •Several clock cycles •Average Clock Cycles per Instruction (CPI) •CPU Execution Time = Total Number of Clock Cycles ×CPU Clock Cycle Time = Total Number of Clock Cycles ×(1/Clock Rate) = (Number of Instructions ×CPI)×(1 / Clock Rate)Factors Affecting CPU Execution TimeCalculation of CPI Instruction Type Frequency of OccurrenceClock Cycle ALU Instructions 55% 5 Load Instructions 25% 6 Store Instructions 10% 4 Branch Instructions 10 % 2Calculation of CPI Instruction Type Frequency of OccurrenceClock Cycle ALU Instructions 55% 5 Load Instructions 25% 6 Store Instructions 10% 4 Branch Instructions 10 % 2 CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2 = 4.85Calculation of CPI Instruction Type Frequency of OccurrenceClock Cycle ALU Instructions 55% 5 (CPIALU) Load Instructions 25% 6 (CPILoad) Store Instructions 10% 4 (CPIStore) Branch Instructions 10 % 2 (CPIBrach ) CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2 = 4.85Calculation of CPI Instruction Type Frequency of OccurrenceClock Cycle ALU Instructions 55% (FALU) 5 (CPIALU) Load Instructions 25% ( FLoad) 6 (CPILoad) Store Instructions 10% ( FStore) 4 (CPIStore) Branch Instructions 10 % ( FBrach ) 2 (CPIBrach ) CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2 = 4.85Calculation of CPI Instruction Type Frequency of OccurrenceClock Cycle ALU Instructions 55% (FALU) 5 (CPIALU) Load Instructions 25% ( FLoad) 6 (CPILoad) Store Instructions 10% ( FStore) 4 (CPIStore) Branch Instructions 10 % ( FBrach ) 2 (CPIBrach ) CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2 = (FALU* CPIALU)+ (FLoad* CPILoad)+ (FStore * CPIStore )+ (FBranch * CPIBranch ) = 4.85Calculation of CPI Instruction Type Frequency of OccurrenceClock Cycle ALU Instructions 55% (FALU) 5 (CPIALU) Load Instructions 25% ( FLoad) 6 (CPILoad) Store Instructions 10% ( FStore) 4 (CPIStore) Branch Instructions 10 % ( FBrach ) 2 (CPIBrach ) CPI = (55/100) ×5+ (25/100) ×6+ (10/100) ×4+ (10/100) ×2 = σ𝑖=1𝑛𝐹𝑖×𝐶𝑃𝐼𝑖 = 4.85•A program with 5,00,0000 instructions •CPU clock frequency 2 GHz •CPI = 2.4 What is the CPU execution time?Calculation of CPU Execution Time•A program with 5,00,0000 instructions; i.e. 𝐼=5×106 •CPU clock frequency 2 GHz; i.e𝑓=2×109 •CPI = 2.4Calculation of CPU Execution Time CPU Execution Time = 5×106×𝟐.𝟒×𝟏 𝟐×𝟏𝟎𝟗•A program with 5,00,0000 instructions; i.e. 𝐼=5×106 •CPU clock frequency 2 GHz; i.e𝑓=2×109 •CPI = 2.4Calculation of CPU Execution Time CPU Execution Time = 5×106×𝟐.𝟒×𝟏 𝟐×𝟏𝟎𝟗=𝟔×𝟏𝟎−𝟑𝒔•A program with 5,00,0000 instructions; i.e. 𝐼=5×106 •CPU clock frequency 2 GHz; i.e𝑓=2×109 •CPI = 2.4Calculation of CPU Execution Time CPU Execution Time = 5×106×𝟐.𝟒×𝟏 𝟐×𝟏𝟎𝟗=𝟔𝒎𝒔•𝐼𝑃𝑆=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒 •𝑀𝐼𝑃𝑆=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒×106(million instructions per second)Instructions Per Second•𝑀𝐼𝑃𝑆=𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓𝐼𝑛𝑠𝑡𝑟𝑢𝑐𝑡𝑖𝑜𝑛𝑠 𝐶𝑃𝑈𝐸𝑥𝑒𝑐𝑢𝑡𝑖𝑜𝑛 𝑡𝑖𝑚𝑒×106(million instructions per second) •CPU Execution Time =(Number of Instructions ×CPI) ×(1 / Clock Rate) •𝑀𝐼𝑃𝑆=𝐶𝑙𝑜𝑐𝑘𝑅𝑎𝑡𝑒 𝐶𝑃𝐼×106Instructions Per Second•A computer with 200 MHz clock rate •Details of instruction execution Find MIPS. Instructions Per Second: Example •Million floating -point operations per second •Measures efficacy is executing floating point operations •Very important for scientific and real time applications (such as gaming)MFLOPS•A program used for evaluating computer performances •Why do we need benchmarks? •Actual Target Workload : Full applications that run on the target machine •Real Full Program -based Benchmarks : A specific mix or suite of programs that are typical of targeted applications or workload. •Small “Kernel” Benchmarks : Key computationally -intensive pieces extracted from real programs. Examples: Matrix factorization, FFT , tree search, etc. Best used to test specific aspects of the machine. •Microbenchmarks : Small, specially written programs to isolate a specific aspect of performance characteristics: integer, floating point, local memory, input/output, etc.Benchmarks•A program used for evaluating computer performances •Why do we need benchmarks? •SPEC (System Performance Evaluation Cooperative) •Funded by a number of computer vendors •Created a benchmark set in 1989 •Updated in 1992, 1995, 1999, 2006Benchmarks