Computer Architecture Paul Mellies Lecture 1 : General introduction to the course
We are all magnificent meaning machines
We produce meaning all day long
How far can we extend our sense of the world ?
And what human society will emerge next ?
Computer Architecture is one of the greatest adventures of Humanity in the second part of the 20th Century
It is also a beautiful story which is very important to study in order to understand what will come next
Quizz ... Which american company invented the first microprocessor ? A. B. C. D. E.
Control Data Corporation ( CDC ) International Business Machines ( IBM ) Hewlett Packard ( HP ) Intel Corporation ( Intel ) Digital Equipment Company ( DEC )
Quizz ... What was the name of this first microprocessor by Intel ? A. B. C. D. E.
Intel Pentium Intel 4004 Intel Itanium Intel Xeon Intel Core i7
The Intel 4004 microprocessor
put on market in November 1971 number of transistors : 2300 size : 10 μ-meters clock speed : 740 kHz
See : Stanley Mazor, the History of the Microcomputer -- Invention and Evolution, Proceedings of the IEEE, 1995
Looking for the true foundations of our digital age « Ask someone who works in online gaming or social networking how an integrated circuit works — a component upon which that person’s job, employer, and industry depends for survival — and you are likely to get a blank stare and perhaps something mumbled about “silicon” and “semiconductor.” This isn’t surprising: after all, the worlds of social networks and code writing are six or seven levels removed from the largely chemical business of making computer chips. It would be like asking someone preparing a Big Mac at a McDonald’s in Prague about cattle feed in Tulsa. » « This is also true in the media, even in the trade press. The reporter who writes incisively on, say, Apple and its next generation iPhone may have little knowledge about the chips inside that device, other than perhaps the name of the manufacturer of its central processor and perhaps its memory chips. That’s why you read very little these days, other than financial news, about the semiconductor industry. The men and women who once covered the business have mostly retired, and the new generation of technology and business reporters are much more comfortable writing about Twitter or Facebook. » Michael S. Malone « The Intel Trinity: How Robert Noyce, Gordon Moore, and Andy Grove Built the World's Most Important Company. »
Do you recognize this gentleman ?
Do you recognize this gentleman ?
Steve Jobs ( 1955 - 2011 )
But do you recognize these two gentlemen ?
But do you recognize these two gentlemen ?
Dennis M. Ritchie ( 1941 - 2011 ) Ken Thompson ( 1943 ) Inventors of UNIX and of the C programming language
And this elegant gentleman ?
And this elegant gentleman ?
Robert Noyce ( 1927 - 1990 )
A panoramic presentation of the course
Your high-level program
-
Your electronic device
Your high-level program
What this course is all about !
-
Your electronic device
Your high-level program
compilation into a low-level program typically assembly code or machine code
Computer Architecture microprocessor, memory and I/O designed to perform the low-level code in an optimized amount of time, reliability and energy
-
Your electronic device
Your high-level program intermediate languages designed to take advantage of the underlying architecture
Computer Architecture tradeoff to find between the various constraints of time, energy and reliability based on the applications one has in mind
-
Your electronic device
Software
Your application in mind
C or Python as a computational model Prerequisite to this course: Introduction to Computer Science
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
Hardware -
Electrons
-
Digital Logic as a computational model
Software
Your application in mind
C or Python as a computational model Prerequisite to this course: Introduction to Computer Science
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
Hardware -
Electrons
-
Digital Logic as a computational model
If you study this course hard enough, you will get a much clearer and extensive understanding of the digital age we live in.
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
Desiderius Erasmus Roterodamus ( 1467 - 1536 )
The digital abstraction
0
1
0
0
1
0
1
1
0
0
1
0
1
1
1
1
0
1
0
0
1
0
0
Every light bulb is either ON or OFF
0
The digital abstraction
1
0
0
1
0
1
1
1
1
0
1
0
0
1
0
0
0
1
0
0
1
0
1
0
Every light bulb is either ON or OFF
Can you count in binary numbers ?
0
0
0
0
= 0
Can you count in binary numbers ?
0
0
0
1
= 1
Can you count in binary numbers ?
0
0
1
0
= 2
Can you count in binary numbers ?
0
0
1
1
= 3
Can you count in binary numbers ?
0
1
0
0
= 4
Can you count in binary numbers ?
0
1
0
1
= 5
Can you count in binary numbers ?
0
1
1
0
= 6
Can you count in binary numbers ?
0
1
1
1
= 7
Can you count in binary numbers ?
1
0
0
0
= 8
Can you count in binary numbers ?
1
0
0
1
= 9
Can you count in binary numbers ?
1
0
1
0
= 10
Can you count in binary numbers ?
1
0
1
1
= 11
Can you count in binary numbers ?
1
1
0
0
= 12
Can you count in binary numbers ?
1
1
0
1
= 13
Can you count in binary numbers ?
1
1
1
0
= 14
Can you count in binary numbers ?
1
1
1
1
= 15
There are 10 kinds of people
The people who understand binary numbers and the people who don’t...
Key idea : transistors are switches ! gate = 0
gate = 1
drain
drain
drain
NMOS
PMOS
OFF
gate
ON
source
source
source
source
source
source
gate
ON drain
OFF drain
drain
NOT gate Algorithm
VDD
Programming Language Operating System Architecture (ISA) Microarchitecture
A
Y
Digital Logic Circuit GND
NAND gate VDD
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
Y A B
GND
NOR gate VDD VDD
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
A B
Y
A
Y
B
GND GND
Illustration : the inverter
value = 1
PMOS transistor Y
A
NMOS transistor value = 0
Illustration : the inverter
value = 1
ON output value = 1
input value = 0
OFF value = 0
Illustration : the inverter
value = 1
OFF output value = 0
input value = 1
ON value = 0
NAND gate
value = 1
value = 1
Y A B
value = 0
value = 0
NAND gate
value = 1
value = 1
ON
ON Y=1
A=0
B=0
OFF OFF
value = 0
value = 0
NAND gate
value = 1
value = 1
ON
OFF Y=1
A=1
B=0
ON OFF
value = 0
value = 0
NAND gate
value = 1
value = 1
OFF
ON Y=1
A=0
B=1
OFF ON
value = 0
value = 0
NAND gate
value = 1
value = 1
OFF
OFF Y=0
A=1
B=1
ON ON
value = 0
value = 0
NOT gate Algorithm Programming Language
A
Y
Operating System Architecture (ISA) Microarchitecture
A
Y
Digital Logic
0
1
Circuit
1
0
NAND gate Algorithm Programming Language
A B
Y
Operating System Architecture (ISA) Microarchitecture
A
B
Y
Digital Logic
0
0
1
Circuit
0
1
1
1
0
1
1
1
0
NOR gate Algorithm Programming Language
A B
Y
Operating System Architecture (ISA) Microarchitecture
A
B
Y
Digital Logic
0
0
1
Circuit
0
1
0
1
0
0
1
1
0
AND gate Algorithm Programming Language
A B
Y
Operating System Architecture (ISA) Microarchitecture
A
B
Y
Digital Logic
0
0
0
Circuit
0
1
0
1
0
0
1
1
1
OR gate Algorithm Programming Language
A B
Y
Operating System Architecture (ISA) Microarchitecture
A
B
Y
Digital Logic
0
0
0
Circuit
0
1
1
1
0
1
1
1
1
XOR gate Algorithm Programming Language
A B
Y
Operating System Architecture (ISA) Microarchitecture
A
B
Y
Digital Logic
0
0
0
Circuit
0
1
1
1
0
1
1
1
0
Multiplexor ( mux ) A
AND
Algorithm Programming Language
OR NOT
Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
AND
B S
S
Y
0
A
1
B
Y
Multiplexor ( mux ) A
AND
Algorithm Programming Language
OR
Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
AND
B S
S
Y
0
A
1
B
Y
Multiplexor ( mux )
Algorithm Programming Language
A
0
B
1
Y
Operating System Architecture (ISA)
S
Microarchitecture Digital Logic Circuit
S
Y
0
A
1
B
SR-latch ( static memory ) R
NOR
Q
NOR
Q
Algorithm Programming Language Operating System Architecture (ISA)
S
Microarchitecture Digital Logic
S
R
Q
Q
Circuit
0
0
mem
mem
0
1
0
1
1
0
1
0
1
1
0
0
SR-latch ( static memory )
Algorithm Programming Language Operating System
R
Q
S
Q
Architecture (ISA) Microarchitecture Digital Logic
S
R
Q
Q
Circuit
0
0
mem
mem
0
1
0
1
1
0
1
0
1
1
0
0
DRAM ( dynamic memory )
Algorithm
Select ( word line )
Programming Language Operating System Architecture (ISA)
Access transistor
Microarchitecture Digital Logic
Capacitor
Circuit GND
D ( bit line )
The origins of the word « architecture »
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic
The term architecture is used here to describe the attributes of a system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flow and controls, the logical design, and the physical implementation. Amdahl, Blaauw, Brooks Architecture of the IBM System / 360 IBM Journal of Research and Development April 1964
Circuit
organization of the data flow = microarchitecture logical design = digital logic physical implementation = circuit
Keyboard
Algorithm
Control Unit
Programming Language Operating System Architecture (ISA)
Main Memory
Input Devices
ALU Registers
Microarchitecture
Display
Secondary Memory
Printer Storage
Digital Logic Circuit
Mouse
Central Processing Unit ( CPU )
Output Devices Bus
Schematics of a Princeton architecture also known as « Von Neumann architecture »
Keyboard
Algorithm
Control Unit
Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic
Mouse Data Memory
Input Devices
ALU Code Memory
Registers
( typically ROM )
Display
Secondary Memory
Printer Storage Code Bus
Central Processing Unit ( CPU )
Output Devices Data Bus
Circuit
Schematics of an Harvard architecture where the « data memory » is separated from the « code memory »
A traditional distinction: North Bridge vs. South Bridge Other Input/Output functionalities
High-speed requirements : Memory, Graphics
CPU
DRAM
Cache
Monitor
Graphics
PCIe x16
North Bridge Graphics Controller
DRAM Controller
Mouse
Ethernet Controller USB Controller
Keyboard Printer
USB Controller USB Controller
PCIe x1
Doc edit prog GCC Game prog
PCIe Controller
South Bridge Internet
OS
Hard Disk
SATA Controller USB Controller
OS Research papers
PCIe Controller
PCIe x2
Family Photos
PCIe x4
Instruction Set Architecture
Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit
add add immediate add unsigned add immediate unsigned subtract subtract unsigned AND AND immediate OR OR immediate NOR shift left logical shift right logical load upper immediate load word store word load halfword unsigned store halfword load byte unsigned store byte load linked (atomic update) store conditional (atomic update) branch on equal branch on not equal jump jump and link jump register set less than set less than immediate set less than unsigned set less than immediate unsigned
add addi addu addiu sub subu AND ANDi OR ORi NOR sll srl lui lw sw lhu sh lbu sb ll sc beq bne j jal jr slt slti sltu sltiu
MIPS Core Instructions ( Patterson & Hennessy : Fig. 3.26 )
A complete description of the Instruction Set Architecture
MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together
MIPS Green Sheet
M I P S Reference Data
CORE INSTRUCTION SET FORNAME, MNEMONIC MAT OPERATION (in Verilog) add Add R R[rd] = R[rs] + R[rt] Add Immediate
ARITHMETIC CORE INSTRUCTION SET
1
OPCODE / FUNCT (Hex) (1) 0 / 20hex 8hex
I
R[rt] = R[rs] + SignExtImm
(1,2)
Add Imm. Unsigned addiu
I
R[rt] = R[rs] + SignExtImm
(2)
Add Unsigned
addu
R R[rd] = R[rs] + R[rt]
0 / 21hex
And
and
R R[rd] = R[rs] & R[rt]
0 / 24hex
And Immediate
andi
I
Branch On Equal
beq
I
Branch On Not Equal bne
I
Jump
j
J
R[rt] = R[rs] & ZeroExtImm if(R[rs]==R[rt]) PC=PC+4+BranchAddr if(R[rs]!=R[rt]) PC=PC+4+BranchAddr PC=JumpAddr
Jump And Link
jal
J
R[31]=PC+8;PC=JumpAddr
Jump Register
jr
addi
ll
R PC=R[rs] R[rt]={24’b0,M[R[rs] I +SignExtImm](7:0)} R[rt]={16’b0,M[R[rs] I +SignExtImm](15:0)} I R[rt] = M[R[rs]+SignExtImm]
Load Upper Imm.
lui
I
R[rt] = {imm, 16’b0}
Load Word
lw
I
R[rt] = M[R[rs]+SignExtImm]
Nor
nor
R R[rd] = ~ (R[rs] | R[rt])
Or
or
R R[rd] = R[rs] | R[rt]
Or Immediate
ori
I
Set Less Than
slt
R R[rd] = (R[rs] < R[rt]) ? 1 : 0
Load Byte Unsigned lbu Load Halfword Unsigned Load Linked
lhu
Set Less Than Imm. slti Set Less Than Imm. sltiu Unsigned Set Less Than Unsig. sltu
9hex
(3)
chex
(4)
4hex
(4) (5) (5)
5hex 2hex 3hex 0 / 08hex
(2) (2) (2,7)
24hex 25hex 30hex fhex
(2)
23hex 0 / 27hex 0 / 25hex
R[rt] = R[rs] | ZeroExtImm
(3)
dhex 0 / 2ahex
R[rt] = (R[rs] < SignExtImm)? 1 : 0 (2) ahex R[rt] = (R[rs] < SignExtImm) bhex ?1:0 (2,6) R R[rd] = (R[rs] < R[rt]) ? 1 : 0 (6) 0 / 2bhex 0 / 00hex R R[rd] = R[rt] > shamt M[R[rs]+SignExtImm](7:0) = I R[rt](7:0) M[R[rs]+SignExtImm] = R[rt]; I R[rt] = (atomic) ? 1 : 0 M[R[rs]+SignExtImm](15:0) = I R[rt](15:0) I M[R[rs]+SignExtImm] = R[rt]
FR
Subtract
sub
R R[rd] = R[rs] - R[rt]
Subtract Unsigned
subu
28hex
(2,7)
38hex
(2) (2)
FI
I
rs 26 25
opcode 31
J
rs 26 25
opcode 31
rt 21 20
29hex 2bhex
(1) 0 / 22hex 0 / 23hex
rd 16 15
shamt 11 10
rt 21 20
funct 6 5
0
immediate 16 15
0
address 26 25
fmt 26 25
opcode 31
R R[rd] = R[rs] - R[rt] (1) May cause overflow exception (2) SignExtImm = { 16{immediate[15]}, immediate } (3) ZeroExtImm = { 16{1b’0}, immediate } (4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0 } (5) JumpAddr = { PC+4[31:28], address, 2’b0 } (6) Operands considered unsigned numbers (vs. 2’s comp.) (7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic
opcode 31
11/11/--/0 11/10/--/y 11/11/--/y 11/10/--/3 11/11/--/3 11/10/--/2 11/11/--/2 11/10/--/1 11/11/--/1 31/--/--/-35/--/--/-0 /--/--/10 0 /--/--/12 10 /0/--/0 0/--/--/18 0/--/--/19 0/--/--/3 39/--/--/-3d/--/--/--
ft 21 20
fmt 26 25
fs 16 15
ft 21 20
fd 11 10
funct 6 5
16 15
0
REGISTER NAME, NUMBER, USE, CALL CONVENTION PRESERVED ACROSS NAME NUMBER USE A CALL? $zero 0 The Constant Value 0 N.A. $at 1 Assembler Temporary No Values for Function Results $v0-$v1 2-3 No and Expression Evaluation $a0-$a3 4-7 Arguments No $t0-$t7 8-15 Temporaries No $s0-$s7 16-23 Saved Temporaries Yes $t8-$t9 24-25 Temporaries No $k0-$k1 26-27 Reserved for OS Kernel No $gp 28 Global Pointer Yes $sp 29 Stack Pointer Yes $fp 30 Frame Pointer Yes $ra 31 Return Address No
Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.
0
immediate
PSEUDOINSTRUCTION SET NAME MNEMONIC OPERATION blt if(R[rs]R[rt]) PC = Label Branch Greater Than ble if(R[rs]=R[rt]) PC = Label Branch Greater Than or Equal li R[rd] = immediate Load Immediate move R[rd] = R[rs] Move
BASIC INSTRUCTION FORMATS R
opcode 31
0 / 02hex (2)
OPCODE / FMT /FT / FUNCT (Hex) 11/8/1/-11/8/0/-0/--/--/1a 0/--/--/1b 11/10/--/0
FLOATING-POINT INSTRUCTION FORMATS
I
Shift Left Logical
2
FORNAME, MNEMONIC MAT OPERATION Branch On FP True bc1t FI if(FPcond)PC=PC+4+BranchAddr (4) Branch On FP False bc1f FI if(!FPcond)PC=PC+4+BranchAddr(4) div Divide R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] divu Divide Unsigned R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] (6) add.s FR F[fd ]= F[fs] + F[ft] FP Add Single FP Add {F[fd],F[fd+1]} = {F[fs],F[fs+1]} + add.d FR Double {F[ft],F[ft+1]} FP Compare Single c.x.s* FR FPcond = (F[fs] op F[ft]) ? 1 : 0 FP Compare FPcond = ({F[fs],F[fs+1]} op c.x.d* FR Double {F[ft],F[ft+1]}) ? 1 : 0 * (x is eq, lt, or le) (op is ==, > shamt swc1 Store FP Single I M[R[rs]+SignExtImm] = F[rt] (2) Store FP M[R[rs]+SignExtImm] = F[rt]; (2) sdc1 I Double M[R[rs]+SignExtImm+4] = F[rt+1]
0
A complete description of the Instruction Set Architecture
4 IEEE 754 Symbols Exponent Fraction Object ±0 0 0 ± Denorm 0 ≠0 1 to MAX - 1 anything ± Fl. Pt. Num. ±∞ MAX 0 MAX ≠0 NaN S.P. MAX = 255, D.P. MAX = 2047
IEEE 754 FLOATING-POINT STANDARD (-1)S × (1 + Fraction) × 2(Exponent - Bias) where Single Precision Bias = 127, Double Precision Bias = 1023. IEEE Single Precision and Double Precision Formats: S 31
Exponent 23 22
S 63
Fraction
30
0
Exponent 62
Fraction 52 51
0
MEMORY ALLOCATION $sp
7fff fffchex
$gp
1000 8000hex 1000 0000hex
pc
STACK FRAME ... Argument 6 Argument 5 $fp Saved Registers
Stack
Dynamic Data Static Data
Stack Grows
Local Variables
$sp
Text
0040 0000hex
Higher Memory Addresses
Lower Memory Addresses
Reserved
0hex DATA ALIGNMENT
Double Word Word Word Halfword Halfword Halfword Halfword Byte Byte Byte Byte Byte Byte Byte Byte 0
1
2
3
4
5
6
7
Value of three least significant bits of byte address (Big Endian)
EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS B Interrupt Exception D Mask Code 31
15
8
Pending Interrupt 15
8
6
2
U M
E I L E
4
1
0
BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable EXCEPTION CODES Number Name Cause of Exception Number Name Cause of Exception 0 Int Interrupt (hardware) 9 Bp Breakpoint Exception Address Error Exception Reserved Instruction 4 AdEL 10 RI (load or instruction fetch) Exception Address Error Exception Coprocessor 5 AdES 11 CpU (store) Unimplemented Bus Error on Arithmetic Overflow 6 IBE 12 Ov Instruction Fetch Exception Bus Error on 7 DBE 13 Tr Trap Load or Store 8 Sys Syscall Exception 15 FPE Floating Point Exception SIZE PREFIXES (10x for Disk, Communication; 2x for Memory) PREPREPREPRESIZE FIX SIZE FIX SIZE FIX SIZE FIX 10-3 milli- 10-15 femto103, 210 Kilo- 1015, 250 Peta10-6 micro- 10-18 atto106, 220 Mega- 1018, 260 Exa109, 230 Giga- 1021, 270 Zetta- 10-9 nano- 10-21 zepto1012, 240 Tera- 1024, 280 Yotta- 10-12 pico- 10-24 yoctoThe symbol for each prefix is just its first letter, except µ is used for micro.
Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.
MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together
MIPS Green Sheet
3 OPCODES, BASE CONVERSION, ASCII SYMBOLS MIPS (1) MIPS (2) MIPS Hexa- ASCII Hexa- ASCII DeciDeciopcode funct funct Binary deci- Chardeci- Charmal mal (31:26) (5:0) (5:0) mal acter mal acter sll 00 0000 0 0 NUL 64 40 @ add.f (1) sub.f 00 0001 1 1 SOH 65 41 A j srl 00 0010 2 2 STX 66 42 B mul.f jal sra 00 0011 3 3 ETX 67 43 C div.f beq sllv 00 0100 4 4 EOT 68 44 D sqrt.f bne 00 0101 5 5 ENQ 69 45 E abs.f blez srlv 00 0110 6 6 ACK 70 46 F mov.f bgtz srav 00 0111 7 7 BEL 71 47 G neg.f addi jr 00 1000 8 8 BS 72 48 H addiu jalr 00 1001 9 9 HT 73 49 I slti movz 00 1010 10 a LF 74 4a J sltiu movn 00 1011 11 b VT 75 4b K andi syscall round.w.f 00 1100 12 c FF 76 4c L ori break 13 d CR 77 4d M trunc.w.f 00 1101 xori 14 e SO 78 4e N ceil.w.f 00 1110 lui sync 15 f SI 79 4f O floor.w.f 00 1111 mfhi 01 0000 16 10 DLE 80 50 P mthi (2) 01 0001 17 11 DC1 81 51 Q mflo 01 0010 18 12 DC2 82 52 R movz.f mtlo 01 0011 19 13 DC3 83 53 S movn.f 01 0100 20 14 DC4 84 54 T 01 0101 21 15 NAK 85 55 U 01 0110 22 16 SYN 86 56 V 01 0111 23 17 ETB 87 57 W mult 01 1000 24 18 CAN 88 58 X multu 01 1001 25 19 EM 89 59 Y div 01 1010 26 1a SUB 90 5a Z divu 01 1011 27 1b ESC 91 5b [ 01 1100 28 1c FS 92 5c \ 01 1101 29 1d GS 93 5d ] 01 1110 30 1e RS 94 5e ^ 01 1111 31 1f US 95 5f _ lb add 10 0000 32 20 Space 96 60 ‘ cvt.s.f lh addu 10 0001 33 21 ! 97 61 a cvt.d.f lwl sub 10 0010 34 22 " 98 62 b lw subu 10 0011 35 23 # 99 63 c lbu and 10 0100 36 24 $ 100 64 d cvt.w.f lhu or 10 0101 37 25 % 101 65 e lwr xor 10 0110 38 26 & 102 66 f nor 10 0111 39 27 ’ 103 67 g sb 10 1000 40 28 ( 104 68 h sh 10 1001 41 29 ) 105 69 i swl slt 10 1010 42 2a * 106 6a j sw sltu 10 1011 43 2b + 107 6b k 10 1100 44 2c , 108 6c l 10 1101 45 2d 109 6d m swr 10 1110 46 2e . 110 6e n cache 10 1111 47 2f / 111 6f o ll tge 11 0000 48 30 0 112 70 p c.f.f lwc1 tgeu 11 0001 49 31 1 113 71 q c.un.f lwc2 tlt 11 0010 50 32 2 114 72 r c.eq.f pref tltu 11 0011 51 33 3 115 73 s c.ueq.f teq 11 0100 52 34 4 116 74 t c.olt.f ldc1 11 0101 53 35 5 117 75 u c.ult.f ldc2 tne 11 0110 54 36 6 118 76 v c.ole.f c.ule.f 11 0111 55 37 7 119 77 w sc 11 1000 56 38 8 120 78 x c.sf.f swc1 57 39 9 121 79 y c.ngle.f 11 1001 swc2 11 1010 58 3a : 122 7a z c.seq.f c.ngl.f 11 1011 59 3b ; 123 7b { c.lt.f 11 1100 60 3c < 124 7c | sdc1 11 1101 61 3d = 125 7d } c.nge.f sdc2 11 1110 62 3e > 126 7e ~ c.le.f c.ngt.f 11 1111 63 3f ? 127 7f DEL (1) opcode(31:26) == 0 (2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single); if fmt(25:21)==17ten (11hex) f = d (double)
A classification of architectures CISC PDP-11 ( DEC 1970 ) VAX ( DEC 1977 ) 8086 ( Intel 1978 ) and later x86 architectures
VLIW Trace ( Multiflow 1984 ) Cydra-5 ( Cydrone 1984 ) Itanium ( Intel 2001 )
RISC MIPS ( John L. Hennessy in Stanford 1981 ) RISC ( David Patterson in Berkeley 1980 - 1984 ) PRISM ( DEC 1985 ) Alpha ( DEC 1992) ARM ( Acorn 1985 ) POWER ( IBM 1990 )
SIMD ILLIAC IV ( Univ. Illinois 1971 ) GeForce 256 ( Nvidia 1999 ) GPUs more generally
CISC RISC VLIW SIMD
Complex Instruction Set Computing Reduced Set Computing Very Long Instruction Word Single Instruction Multiple Data
The design principles of the MIPS architecture P1. Simplicity favors regularity : Every instruction of the MIPS architecture is of length 4 bytes = 32 bits.
P2.
Smaller is faster :
Only 32 registers in the MIPS architecture.
P3.
Good design demands good compromises :
The instructions are separated into three different formats R, I and J.
The Register ( or R ) format
opcode
6 bits
first second register source register source operand operand 5 bits
5 bits
register destination operand
shift amount
function code
5 bits
5 bits
6 bits
The design principles of the MIPS architecture P1. Simplicity favors regularity : Every instruction of the MIPS architecture is of length 4 bytes = 32 bits.
P2.
Smaller is faster :
Only 32 registers in the MIPS architecture.
P3.
Good design demands good compromises :
The instructions are separated into three different formats R, I and J.
The Immediate ( or I ) format
opcode
6 bits
first second register source register source operand operand 5 bits
5 bits
address or immediate constant
16 bits
The design principles of the MIPS architecture P1. Simplicity favors regularity : Every instruction of the MIPS architecture is of length 4 bytes = 32 bits.
P2.
Smaller is faster :
Only 32 registers in the MIPS architecture.
P3.
Good design demands good compromises :
The instructions are separated into three different formats R, I and J.
The Jump ( or J ) format
opcode
target address
6 bits
26 bits
The complete LC-3 Instruction Set Architecture
See the detailed description of the ISA by Patt and Patel, 1984.
Inflation of the x86 instruction set over time 2.20
Concluding Remarks
1000 Number of Instructions
900 800 700 600 500 400 300 200 100 19 7 19 8 8 19 0 8 19 2 8 19 4 8 19 6 8 19 8 9 19 0 9 19 2 9 19 4 9 19 6 9 20 8 0 20 0 0 20 2 0 20 4 0 20 6 0 20 8 1 20 0 12
0
Year FIGURE 2.43 Growth of x86 instruction set over time. While there is clear technical value to some of these extensions, this rapid change also increases the difficulty for other companies to try to build compatible processors.
The price to pay ( among other things ) for backward compatibility...
Illustration : Intel Haswell i7 core ( 2013 )
All Haswell models designed to support MMX, SSE, SSE2, SSSE3, SSE4.1, SSE4.2, SSE4.2, F16C BMI1+BMI2, EIST, Intel 64, XD bit, Intel VT-x and Smart Cache die size ≈ 177 mm2 clock rate ≈ 3 GHz 22 nm FinFET technology
number of transistors per die ≈ 1 400 000 000
If you want to know more about that issue ...
Please have a look at the Intel manual here : http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf
Performance ( vs. VAX-11/780 )
A famous « law » formulated by Gordon Moore in 1965
Moore’s paper may be also seen as part of a technological and ideological fight for integrated circuits.
Illustration : MOS Technology 6502 ( 1974 )
4.3 mm
3.9 mm
die size ≈ 16,6 mm2 clock rate ≈ 1 MHz to 2 MHz 8 μm NMOS technology
number of transistors per die ≈ 3510 in enhancement mode + 1018 in depletion mode
Digital Archeology : excavating the 6502 The low cost 6502 processor was designed in 1974 and launched the « Home Computers » era. The visual 6502 project started in 1999 and aims at « reverse engineering » the processor. Read the article :
Nikhil Swaminathan. Digging into Technology's Past. Volume 64 Number 4, July/August 2011 http://archive.archaeology.org/1107/features/mos_technology_6502_computer_chip_cpu.html
Visit the site :
http://www.visual6502.org/JSSim Make the JSSim simulator work and ... A. count the number of fetches in the first 100 clock cycles B. guess the function of the Sync pin Bonus : find the position of the instruction decode table and pinpoint the ROL and ROR commands in it FOR NEXT RECITATION A photography of the MOS Technology 6502 design team with a design schematic. Bill Mensch who drew the processor on a drafting board is standing second from the right. 1974.
Have you ever imagined constructing a city layer by layer ?
Picture taken from the project Urban Layers http://io.morphocode.com/urban-layers
Thus very likely to contain bugs and mistakes ... 1974 : MOS 6502 The ROR command of the first 6502 processor did not work. So, the command was simply removed from the manual ! 1994 P5 Pentium A few missing entries in the lookup table for its divide operation algorithm led to an error in the floating point unit, producing inaccurate results in approximately 1 in 9 billion floating point divides/ 2011 : Intel Sandy Bridge The metal line of the SATA port of the Cougar chipset was too small for its purpose and would gradually thin and degrade over time. 2014 : Intel Haswell The Transactional Synchronization Extension ( TSX ) or « transactional memory » was bugged in the Haswell processor. The TSX feature was thus disabled via a microcode update. Hot research topic today : design formal methods to prove that a given microarchitecture is correct
Intel’s own words on design defects The Pentium Pro processor and Pentium II processor may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Many times, the effects of the errata can be avoided by implementing hardware or software work-arounds, which are documented in the Pentum Pro Processor Specification Update and Pentium II Processor Specification Update, both of which can be found at www.intel.com. Pentium Pro and Pentium II processors include a feature called "reprogrammable microcode", which allows certain types of errata to be worked around via microcode updates. The microcode updates reside in the system BIOS and are loaded into the processor by the system BIOS during Power-On Self Test, or POST.
Pentium(R) Pro Processor and Pentium(R) II Processor Processor Update Utility Help File v3.4 January 1998 Copyright 1995-8, Intel Corporation.
High-level language program (in C)
Algorithm
swap(int v[], int k) {int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; }
Compiler
Programming Language Operating System Architecture (ISA)
Assembly language program (for MIPS)
Microarchitecture
swap:
multi add lw lw sw sw jr
$2, $5,4 $2, $4,$2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) $31
Digital Logic Circuit
Assembler
Binary machine language program (for MIPS)
00000000101000100000000100011000 00000000100000100001000000100001 10001101111000100000000000000000 10001110000100100000000000000100 10101110000100100000000000000000 10101101111000100000000000000100 00000011111000000000000000001000
FIGURE 1.4 C program compiled into assembly language and then assembled into machine language. Although the translation from high-level language to binary machine lan
A hierarchy of programming languages
Note the specific position of the programming language C at the software / hardware interface.
Compilation from C to MIPS Original C code : c
=
(a+b) - (i+j)
Compiled MIPS code : add t0, a, b add t1, i, j sub c, t0, t1
# # #
temporary register t0 = a+b temporary register t1 = i+j c = t0 - t1
Registers in MIPS There are exactly 32 registers in the MIPS architecture. Each register is able to store exactly one « word ». Here, by « word » one means a sequence of 32 bits. Each MIPS register thus contains exactly 32 bits. By convention, each register has its own status or function :
For more information, see Figure 2.14 in Patterson & Hennessy
Compilation from C to MIPS Original C code : c
=
(a+b) - (i+j)
Compiled MIPS code : add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1
# # #
temp register $t0 = $s1+$s2 temp register $t1 = $s2+$s3 $s0 = $t0 - $t1
Here, we make the important assumption that • the registers $s1 and $s2 contain the values of a and b • the registers $s2 and $s3 contain the values of i and j
Another example swap (int v[], int k) {int temp; temp = v[k]; v = v[k+1]; v[k+1] = temp; }
Original C code :
Compiled MIPS code : multi $2, $5,4 add
$2, $4,$2
lw lw sw sw jr
$15, $16, $16, $15, $31
0($2) 4($2) 0($2) 4($2)
# # # # # # # # #
multiplies by 4 the value of register 5 and writes the result in register 2 adds the value of register 4 to the value of register 2 and writes the result in register 2 loads the value of Mem[$2] in register 15 loads the value of Mem[$2+4] in register 16 stores the value of register 16 in Mem[$2] stores the value of register 15 in Mem[$2+4] jumps to the address loaded in register 31
Here, we suppose that : • register 4 contains the address in memory of the array v[] of integers • register 5 contains the value of the integer k.
A troublesome fact Almost every C compiler is bugged ... To convince yourself, please have a look at : Xuejun Yang , Yang Chen , Eric Eide , John Regehr Finding and Understanding Bugs in C Compilers 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation ( PLDI )
Compilers and operating systems you can trust Certified Compilers CompCert : a formally checked C compiler Certified Operating Systems seL4 project : a formally checked OS microkernel
For the most curious among you :
Xavier Leroy Proof assistants in computer science research https://www.youtube.com/watch?v=PE4v6rVpX2g
Introducing myself ...
Paul Mellies
[email protected] I am a CNRS researcher in Paris working on the interface between logic and programming languages with an interest in the translation ( or compilation ) of high level programs into low level programs like assembly code or machine code. One main purpose of my research is to develop the mathematical framework to provide a rigorous meaning ( or semantics ) to low level code, typically written in machine code or implemented in hardware.
HW 0 : introduce yourself ! Due before February 19th (Tuesday) midnight.
In the next five days : Install Unix on your machine ! Learn your first shell commands using the references which you will find on the webpage of the course: http://computer-architecture.org Read the paper by Moore. Read the paper by Patterson and Ditzel.
HW 1 : explore your computer ! Due before February 22nd (Friday) midnight.
Thank you !
Any question ?