Idea Transcript
03/10/16
Computer Architecture Academic Skills (SNE),part 1 Taco Walstra (slides based on material by Patterson & Hennessy (Berkeley/ Stanford), Mary Jane Irwin (Pennstate), Toto van Inge (UvA)
Syllabus 2 lectures with labs: Part 1 (today):
What is computer architecture? Performance of computer architectures. A very short introduction to the Instruction Set Architecture of MIPS processors A short introduction to a single cycle MIPS processor architecture Lab: performance and wiring an assembly program for a singlecycle MIPS using SIM-PL Patt 2:
Pipelined processors, hazards Caches, ….. Lab: pipelined mips and hazards, using SIM-PL Innopolis 2016
2
1
03/10/16
What is Computer Architecture? A building architect role: Materials
Plans
Buildings
Steel, stone
Houses, offices, universities
Wood, glass
Sportfacilities
Goals Function Cost, efficiency Ease of construction, aesthetics, safety Innopolis 2016
3
What is Computer Architecture? Technology
Plans
Computers
Logic gates
Desktops, servers
Flash memory consoles
mobile phones, game
Packaging
Goals
embedded systems
SRAM/DRAM
Function
supercomputers
Circuit technology
Performance Reliability Energy efficiency Cost /Manufacturability
Innopolis 2016
4
2
03/10/16
Computer Architecture embedding
Innopolis 2016
5
Chip revolution
Innopolis 2016
6
3
03/10/16
Moore's Law
In 1965 Intel's Gordon Moore predicted that the number of transistors that can be integrated on a single chip would double about every two years.
Innopolis 2016
7
A few trends
4
03/10/16
What is an ISA?
ISA (instruction set architecture)
A well-defined hardware/software interface The ‘contract’ between software and hardware. Functional definition of operations, modes, and storage locations supported by the hardware. Precise description of how to invoke, and access them No guarantees regarding (1) how operations are implemented No guarantees regarding (2) Which operations are fast and which are slow. No guarantees regarding (3) Which operations take more power and which take less Innopolis 2016
9
Which airplane has the best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud Concorde
BAC/Sud Concorde
Douglas DC-8-50
Douglas DC8-50 0
100
200
300
400
0
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud Concorde
BAC/Sud Concorde
Douglas DC-8-50
Douglas DC8-50 500
1000
Cruising Speed (mph)
4000
6000
8000 10000
Cruising Range (miles)
Passenger Capacity
0
2000
§1.6 Performance
Defining Performance
1500
Innopolis 2016
0
100000 200000 300000 400000 Passengers x mph
10
5
03/10/16
Throughput versus Response Time
Response time (execution time) – the time between the start and the completion of a task
(important to individual users)
Throughput (bandwidth) – the total amount of work done in a given time
(important to data center managers)
How are response time and throughput affected by:
Replacing the processor with a different (faster) version?
Adding more processors?
Will need a performance metrics as well as a set of applications to benchmark embedded and desktop computers, which are more focused on response time, versus servers, which are more focused Innopolis 2016 on throughput.
11
CPU clocking
The processor hardware is governed by a constant-rate clock (crystal oscillator pulse).
Clock period: duration of a clock cycle (e.g. 250 ps)
Clock frequency (rate): cycles per second (e.g. 4.0 GHz)
Innopolis 2016
12
6
03/10/16
Measuring Execution Time
Elapsed time
Total response time, including all aspects (Processing, I/O, OS overhead, idle time) Determines system performance
CPU time
Time spent processing a given job (Discounts I/O time, other jobs’ shares)
Comprises user CPU time and system CPU time
Different programs are affected differently by CPU and system performance
Innopolis 2016
13
CPU Time
Performance improved by
Reducing number of clock cycles
Increasing clock rate
Hardware designer must often trade off clock rate against cycle count Innopolis 2016
14
7
03/10/16
Instruction Count and CPI
Instruction Count for a program
Determined by the program, ISA and compiler
Average cycles per instruction
Determined by CPU Architecture If different instructions have different CPI
Innopolis 2016
15
CPI Example
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
Innopolis 2016
16
8
03/10/16
Pitfall: MIPS as a Performance Metric
MIPS: Millions Instructions Per Second (not Microprocessor without Interlocked Pipeline Stages)
Doesn't take into account the difference in ISA Doesn't take into account differences in complexity between instructions.
CPI varies between programs on a given CPU Innopolis 2016
17
Instruction Set The repertoire of instructions of a computer Different computers have different instruction set (But with many aspects in common) Early computers had very simple instruction sets
example: pdp-8 (1965) 6 different operations (AND, ADD, increment, store to memory, jump to subroutine, jump, I/O)
MIPS I (1986) R2000 (time frame: Intel 80386, DEC VAX mini, Motorola 68000. 15 MHz, external cache memory and small pipeline (later) MIPS32. Introduced in 1999. (latest release 2012). MIPS64 also introduced in 1999. Instruction set licensed to many different companies by MIPS Technologies.
Innopolis 2016
18
9
03/10/16
MIPS processor. Short historical overview MIPS Computer Systems founded in 1984 by Stanford University (Hennessy!) 1986: First R2000 processor (32 bits, no floating point, 15 MHz); 1988: R3000; (MIPS I ISA, 33 MHz), Sony playstation, many workstations and servers used it. Architecture + chips 1991: R4000 (64 bit), MIPS II ISA, Floating point unit rename to MIPS Technologies (SGI) 1994: R8000, 1996: R10000 (Nintendo), 1997: R12000 After 1998: licensing of MIPS architecture and microprocessor core designs (no manufacturing). 2013: most patents sold to Imagination Technologies Group.
Innopolis 2016
19
MIPS Instruction set Used as the example throughout the book Stanford MIPS commercialized by MIPS Technologies (www.mips.com) Large share of embedded core market
Applications in consumer electronics,network/storage equipment, cameras, printers, …
Typical of many modern ISAs
See MIPS Reference Data tear-out card, appendices in book.
Innopolis 2016
20
10
03/10/16
Two Key Principles of Machine Design Instructions are represented in binary and, as such, are indistinguishable from data Programs are stored in alterable memory (that can be read or written to) just like data
Stored Program concept (Von Neumann)
Programs can be shipped as files of binary numbers – binary compatibility
Computers can inherit ready-made software provided they are compatible with an existing ISA – leads industry to align around a small number of ISAs
Innopolis 2016
21
Before john von neumann......
Innopolis 2016
22
11
03/10/16
MIPS Arithmetic Instructions Add and subtract, three operands
add a, b, c
# a gets b + c
All arithmetic operations have this form (ADD, SUB, AND .....) Design principle 1: Simplicity favours regularity Regularity makes implementation simpler Simplicity enables higher performance at lower cost
Innopolis 2016
23
Register operands Arithmetic instructions use register operands MIPS has a 32 × 32-bit register file
Use for frequently accessed data
Numbered 0 to 31
Assembler names
$t0, $t1, ..., $t9 for temporary values $s0, $s1, ..., $s7 for saved variables
Side note: SIM-PL labs will use $1, $2 etc. names!) Make the common case fast (Arithmetic register operations (Amdahl)
Innopolis 2016
24
12
03/10/16
Arithmetic example C code:
f = (g + h) - (i + j);
Compiled MIPS code (assembly instructions):
add t0, g, h add t1, i, j sub f, t0, t1
#temp t0 = g + h #temp t1 = i + j #f = t0 - t1 Innopolis 2016
25
MIPS Instruction Formats (R instruction sub field: “shift amount” /function code)
Innopolis 2016
26
13
03/10/16
R Instructions example 2
SUB $t0, $s1, $s2 How encoded in hex? ($t0 = reg. no 8, $s0 = reg. No 16, $1= reg No 17 etc. Instruction form: sub $Rd, $Rs, $Rt)
Innopolis 2016
27
Sub $t0, $s1, $s2 example R instruction form: Sub $rd, $rs, $rt Opcode = 0000000 Rs reg = 0x11 (5 bits) Rt reg = 0x12 (hex) (5 bits) Rd = 8 (5 bits) Func = 0x22 Complete code: 000000 | 10001 | 10010 | 01000 | 000 0010 0010
Innopolis 2016
28
14
03/10/16
MIPS I-format (1) Immediate instructions (examples: addi, lui, ori, andi) example: addi $5, $3, 4
# $5 := $3 + 4
How is this instruction stored? Note: MIPS has NO “subi” instruction! Question: how to subtract a number from a register?
Opcode
Reg s
Reg t
Literal
001000
00011
Innopolis 2016
000101
0...0100
29
Exercise Which assembly instruction is represented by the following MIPS hexcode:
0x34422345
Innopolis 2016
30
15
03/10/16
MIPS Memory Access Instructions MIPS has 2 basic data transfer instructions for accessing memory –
lw $5, 4($3)
# load word from memory
How is this instruction stored?
Opcode
Reg s
001000
00011
Reg t
Literal
000101
0...0100
Innopolis 2016
31
MIPS Memory Access Instructions
MIPS has two basic data transfer instructions for accessing memory – lw $0, 4($3)
# load word from memory
– sw $2, 8($3)
# store word to memory
The data is loaded from (lw) or stored into (sw) a register in the register file. The register file has 32 locations => 5 bit address! The memory address - a 32 bit address - is formed by adding the contents of the base address register to the offset value.
Opcode
Reg s
Reg t
Literal
100011
00011
000000
0...0100
101011
00011
000010 Innopolis 2016
0...1000
32
16
03/10/16
Memory Access example 1
Innopolis 2016
33
How about large constants? Use the LUI instruction (Load Upper Immediate) with ORI (OR Immediate)
– lui $t0, 0x1234 – ori $t0, $t0, 0x5678 – Result: $t0 containts: 0x1234 5678
Innopolis 2016
34
17
03/10/16
MIPS-32 ISA Instruction categories: – Computational – Load / Store – Jump and Branch – Floating Point (coprocessor) – Memory management – Special
Instruction formats: all 32 bits wide!!
Innopolis 2016
35
Summary: MIPS (RISC) Design Principles Simplicity favors regularity –
fixed size instructions
–
small number of instruction formats (3)
– opcode alsways the first 6 bits
Smaller is faster – limited instruction set – limited number of registers in register file – limited number of addressing modes
Make the common case fast – arithmetic operands from the register file (load-store machine) – allow instructions to contain immediate operands
Good design demands good compromises – three instruction formats
Innopolis 2016
36
18
03/10/16
Byte Addresses Since 8-bit bytes are so useful, most architectures address individual bytes in memory – Alignment restriction - the memory address of a word must be on natural word boundaries (a multiple of 4 in MIPS-32) – Big-Endian: leftmost byte is word address IBM 360/370, Motorola 68k, MIPS, Sparc, HP PA – Little Endian: rightmost byte is word address Intel 80x86, DEC Vax, DEC Alpha (Windows NT), MIPS (!) – MIPS can be configured to use either big endian or little endian byteorder!
Innopolis 2016
37
MIPS R-Format instructions
sa (often named "shamt") means "Shift amount": the amount by which the source operand is rotated or shifted. func for instructions that share the same opcode, the func parameter contains the control codes to differentiate the instructions. Opcode 000000 accesses the ALU and the func selects the function to be executed by the ALU
Innopolis 2016
38
19
03/10/16
Shift operations Instruction Format R Shifts all the bits in a word left or right – sll $t2, $s0, 8 # $t2 = $s0 > 8 bits
These shifts are called logical because they fill with zeroes The 5 bit SHAMT field is enough to shift a 32 bit value 31 bit positions!
Innopolis 2016
39
Logical Operations There are a number of bit-wise logical operations in the MIPS ISA – and $t0, $t1, $t2 # $t0 = $t1 & $t2 – or $t0, $t1, $t2 # $t0 = $t1 | $t2 – nor $t0, $t1, $t2 # $t0 = not($t1 | $t2)
They all use the R-format (and, or, nor) or I-format (andi, ori) – andi $t0, $t1, 0xFF00 # $t0 = $t1 & 0xFF00
Innopolis 2016
40
20
03/10/16
Logical operations MIPS has no “NOT” (logical inversion) instruction. We can solve this by using 2 logical instructions: Load -1 in one register and use a XOR instruction: Example $1 = 5 (binary 00000101). How to invert this into 11111010? Load in $2 -1 (binary 11111111) Perform Exclusive OR of $1 and $2 (result in $9): XOR $9, $1,$2 Truth table of XOR function:
A B
XOR
0 0
0
0 1
1
1 0
1
1 1
0 Innopolis 2016
41
Logical operations example
21
03/10/16
Control Flow Instructions MIPS conditional branch instructions:
– bne $s0, $s1, Lbl #go to Lbl if $s0 not equal $s1 – beq $s0, $s1, Lbl #go to Lbl if $s0=$s1 example:
– if (i==j) h = i + j; – bne $s0, $s1, Lbl1 add $s3, $s0, $s1 Lbl1:...... Branch instructions use the I-format (How is the branch destination address specified?) Innopolis 2016
43
Compiling If statements mm
Innopolis 2016
44
22
03/10/16
Specifying Branch Destinations Use a register (like in lw and sw) added to the 16-bit offset – which register? Instruction Address Register (the PC) • its use is automatically implied by instruction • PC gets updated **(PC+4) during the fetch cycle** so that it holds the address of the next instruction.
– limits the branch distance to -2 exp(15) to +2 exp(15) -1 (word) instructions from the (instruction after the) branch instruction, but most branches are local anyway.
Innopolis 2016
45
Other Control Flow Instructions MIPS also has an unconditional branch instruction or jump instruction:
– j label # got to label J-Format Instruction
Innopolis 2016
46
23
03/10/16
Branching Far Away What if the branch destination is further away than can be captured in 16 bits? The compiler comes to the rescue – it inserts an unconditional jump to the branch target and inverts the condition.
– beq $s0, $s1, L1 – beq $s0, $s1, L1 j L1 L2: ...
becomes:
Innopolis 2016
47
Compiling Loop Statements I=0;
While (I < 10 ) { I++; } $1 = I =0; $2 is constant 10 Loop: BEQ $1, $2, end #loop body ADDI $1, $1, 1 J loop End: Innopolis 2016
48
24
03/10/16
Procedure call Instructions Procedure call: jump and link (JAL)
– jal ProcedureLabel – Address of following instruction put in $ra – Jumps to target address
Procedure return: jump register
– jr $ra – Copies $ra to program counter – Can also be used for computed jumps (e.g. for case/switch statements)
Innopolis 2016
49
Summary
We have looked at how hexadecimal numbers can be easily converted into binary representations. (create groups of 4 bits for each hexadecimal digit!)
MIPS ISA has 3 instruction formats: Register, Immediate and Jump
ALL instruction formats have 6 bits to determine the instruction type
If you know the instructiion type, you know the format of the other bits.
MIPS has a byte orientation: big endian or little endian
MIPS has a byte addressing: this affects the offset field when addressing memory locations of a base register! Difference between little and big endian
We have looked at signed and unsigned binary representations of numbers
We have looked at logical operations
We have looked at a few examples how to write assembly programs for common high level structures (more exercises with the labs!) Innopolis 2016
50
25
03/10/16
MIPS Organization So Far .
Innopolis 2016
51
26