Computer Architecture Paul Mellies

Loading...
Computer Architecture Paul Mellies Lecture 1 : General introduction to the course

We are all magnificent meaning machines

We produce meaning all day long

How far can we extend our sense of the world ?

And what human society will emerge next ?

Computer Architecture is one of the greatest adventures of Humanity in the second part of the 20th Century

It is also a beautiful story which is very important to study in order to understand what will come next

Quizz ... Which american company invented the first microprocessor ? A. B. C. D. E.

Control Data Corporation ( CDC ) International Business Machines ( IBM ) Hewlett Packard ( HP ) Intel Corporation ( Intel ) Digital Equipment Company ( DEC )

Quizz ... What was the name of this first microprocessor by Intel ? A. B. C. D. E.

Intel Pentium Intel 4004 Intel Itanium Intel Xeon Intel Core i7

The Intel 4004 microprocessor

put on market in November 1971 number of transistors : 2300 size : 10 μ-meters clock speed : 740 kHz

See : Stanley Mazor, the History of the Microcomputer -- Invention and Evolution, Proceedings of the IEEE, 1995

Looking for the true foundations of our digital age « Ask someone who works in online gaming or social networking how an integrated circuit works — a component upon which that person’s job, employer, and industry depends for survival — and you are likely to get a blank stare and perhaps something mumbled about “silicon” and “semiconductor.” This isn’t surprising: after all, the worlds of social networks and code writing are six or seven levels removed from the largely chemical business of making computer chips. It would be like asking someone preparing a Big Mac at a McDonald’s in Prague about cattle feed in Tulsa. » « This is also true in the media, even in the trade press. The reporter who writes incisively on, say, Apple and its next generation iPhone may have little knowledge about the chips inside that device, other than perhaps the name of the manufacturer of its central processor and perhaps its memory chips. That’s why you read very little these days, other than financial news, about the semiconductor industry. The men and women who once covered the business have mostly retired, and the new generation of technology and business reporters are much more comfortable writing about Twitter or Facebook. » Michael S. Malone « The Intel Trinity: How Robert Noyce, Gordon Moore, and Andy Grove Built the World's Most Important Company. »

Do you recognize this gentleman ?

Do you recognize this gentleman ?

Steve Jobs ( 1955 - 2011 )

But do you recognize these two gentlemen ?

But do you recognize these two gentlemen ?

Dennis M. Ritchie ( 1941 - 2011 ) Ken Thompson ( 1943 ) Inventors of UNIX and of the C programming language

And this elegant gentleman ?

And this elegant gentleman ?

Robert Noyce ( 1927 - 1990 )

A panoramic presentation of the course

Your high-level program

-

Your electronic device

Your high-level program

What this course is all about !

-

Your electronic device

Your high-level program

compilation into a low-level program typically assembly code or machine code

Computer Architecture microprocessor, memory and I/O designed to perform the low-level code in an optimized amount of time, reliability and energy

-

Your electronic device

Your high-level program intermediate languages designed to take advantage of the underlying architecture

Computer Architecture tradeoff to find between the various constraints of time, energy and reliability based on the applications one has in mind

-

Your electronic device

Software

Your application in mind

C or Python as a computational model Prerequisite to this course: Introduction to Computer Science

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

Hardware -

Electrons

-

Digital Logic as a computational model

Software

Your application in mind

C or Python as a computational model Prerequisite to this course: Introduction to Computer Science

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

Hardware -

Electrons

-

Digital Logic as a computational model

If you study this course hard enough, you will get a much clearer and extensive understanding of the digital age we live in.

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

Desiderius Erasmus Roterodamus ( 1467 - 1536 )

The digital abstraction

0

1

0

0

1

0

1

1

0

0

1

0

1

1

1

1

0

1

0

0

1

0

0

Every light bulb is either ON or OFF

0

The digital abstraction

1

0

0

1

0

1

1

1

1

0

1

0

0

1

0

0

0

1

0

0

1

0

1

0

Every light bulb is either ON or OFF

Can you count in binary numbers ?

0

0

0

0

= 0

Can you count in binary numbers ?

0

0

0

1

= 1

Can you count in binary numbers ?

0

0

1

0

= 2

Can you count in binary numbers ?

0

0

1

1

= 3

Can you count in binary numbers ?

0

1

0

0

= 4

Can you count in binary numbers ?

0

1

0

1

= 5

Can you count in binary numbers ?

0

1

1

0

= 6

Can you count in binary numbers ?

0

1

1

1

= 7

Can you count in binary numbers ?

1

0

0

0

= 8

Can you count in binary numbers ?

1

0

0

1

= 9

Can you count in binary numbers ?

1

0

1

0

= 10

Can you count in binary numbers ?

1

0

1

1

= 11

Can you count in binary numbers ?

1

1

0

0

= 12

Can you count in binary numbers ?

1

1

0

1

= 13

Can you count in binary numbers ?

1

1

1

0

= 14

Can you count in binary numbers ?

1

1

1

1

= 15

There are 10 kinds of people

The people who understand binary numbers and the people who don’t...

Key idea : transistors are switches ! gate = 0

gate = 1

drain

drain

drain

NMOS

PMOS

OFF

gate

ON

source

source

source

source

source

source

gate

ON drain

OFF drain

drain

NOT gate Algorithm

VDD

Programming Language Operating System Architecture (ISA) Microarchitecture

A

Y

Digital Logic Circuit GND

NAND gate VDD

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

Y A B

GND

NOR gate VDD VDD

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

A B

Y

A

Y

B

GND GND

Illustration : the inverter

value = 1

PMOS transistor Y

A

NMOS transistor value = 0

Illustration : the inverter

value = 1

ON output value = 1

input value = 0

OFF value = 0

Illustration : the inverter

value = 1

OFF output value = 0

input value = 1

ON value = 0

NAND gate

value = 1

value = 1

Y A B

value = 0

value = 0

NAND gate

value = 1

value = 1

ON

ON Y=1

A=0

B=0

OFF OFF

value = 0

value = 0

NAND gate

value = 1

value = 1

ON

OFF Y=1

A=1

B=0

ON OFF

value = 0

value = 0

NAND gate

value = 1

value = 1

OFF

ON Y=1

A=0

B=1

OFF ON

value = 0

value = 0

NAND gate

value = 1

value = 1

OFF

OFF Y=0

A=1

B=1

ON ON

value = 0

value = 0

NOT gate Algorithm Programming Language

A

Y

Operating System Architecture (ISA) Microarchitecture

A

Y

Digital Logic

0

1

Circuit

1

0

NAND gate Algorithm Programming Language

A B

Y

Operating System Architecture (ISA) Microarchitecture

A

B

Y

Digital Logic

0

0

1

Circuit

0

1

1

1

0

1

1

1

0

NOR gate Algorithm Programming Language

A B

Y

Operating System Architecture (ISA) Microarchitecture

A

B

Y

Digital Logic

0

0

1

Circuit

0

1

0

1

0

0

1

1

0

AND gate Algorithm Programming Language

A B

Y

Operating System Architecture (ISA) Microarchitecture

A

B

Y

Digital Logic

0

0

0

Circuit

0

1

0

1

0

0

1

1

1

OR gate Algorithm Programming Language

A B

Y

Operating System Architecture (ISA) Microarchitecture

A

B

Y

Digital Logic

0

0

0

Circuit

0

1

1

1

0

1

1

1

1

XOR gate Algorithm Programming Language

A B

Y

Operating System Architecture (ISA) Microarchitecture

A

B

Y

Digital Logic

0

0

0

Circuit

0

1

1

1

0

1

1

1

0

Multiplexor ( mux ) A

AND

Algorithm Programming Language

OR NOT

Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

AND

B S

S

Y

0

A

1

B

Y

Multiplexor ( mux ) A

AND

Algorithm Programming Language

OR

Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

AND

B S

S

Y

0

A

1

B

Y

Multiplexor ( mux )

Algorithm Programming Language

A

0

B

1

Y

Operating System Architecture (ISA)

S

Microarchitecture Digital Logic Circuit

S

Y

0

A

1

B

SR-latch ( static memory ) R

NOR

Q

NOR

Q

Algorithm Programming Language Operating System Architecture (ISA)

S

Microarchitecture Digital Logic

S

R

Q

Q

Circuit

0

0

mem

mem

0

1

0

1

1

0

1

0

1

1

0

0

SR-latch ( static memory )

Algorithm Programming Language Operating System

R

Q

S

Q

Architecture (ISA) Microarchitecture Digital Logic

S

R

Q

Q

Circuit

0

0

mem

mem

0

1

0

1

1

0

1

0

1

1

0

0

DRAM ( dynamic memory )

Algorithm

Select ( word line )

Programming Language Operating System Architecture (ISA)

Access transistor

Microarchitecture Digital Logic

Capacitor

Circuit GND

D ( bit line )

The origins of the word « architecture »

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic

The term architecture is used here to describe the attributes of a system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data flow and controls, the logical design, and the physical implementation. Amdahl, Blaauw, Brooks Architecture of the IBM System / 360 IBM Journal of Research and Development April 1964

Circuit

organization of the data flow = microarchitecture logical design = digital logic physical implementation = circuit

Keyboard

Algorithm

Control Unit

Programming Language Operating System Architecture (ISA)

Main Memory

Input Devices

ALU Registers

Microarchitecture

Display

Secondary Memory

Printer Storage

Digital Logic Circuit

Mouse

Central Processing Unit ( CPU )

Output Devices Bus

Schematics of a Princeton architecture also known as « Von Neumann architecture »

Keyboard

Algorithm

Control Unit

Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic

Mouse Data Memory

Input Devices

ALU Code Memory

Registers

( typically ROM )

Display

Secondary Memory

Printer Storage Code Bus

Central Processing Unit ( CPU )

Output Devices Data Bus

Circuit

Schematics of an Harvard architecture where the « data memory » is separated from the « code memory »

A traditional distinction: North Bridge vs. South Bridge Other Input/Output functionalities

High-speed requirements : Memory, Graphics

CPU

DRAM

Cache

Monitor

Graphics

PCIe x16

North Bridge Graphics Controller

DRAM Controller

Mouse

Ethernet Controller USB Controller

Keyboard Printer

USB Controller USB Controller

PCIe x1

Doc edit prog GCC Game prog

PCIe Controller

South Bridge Internet

OS

Hard Disk

SATA Controller USB Controller

OS Research papers

PCIe Controller

PCIe x2

Family Photos

PCIe x4

Instruction Set Architecture

Algorithm Programming Language Operating System Architecture (ISA) Microarchitecture Digital Logic Circuit

add add immediate add unsigned add immediate unsigned subtract subtract unsigned AND AND immediate OR OR immediate NOR shift left logical shift right logical load upper immediate load word store word load halfword unsigned store halfword load byte unsigned store byte load linked (atomic update) store conditional (atomic update) branch on equal branch on not equal jump jump and link jump register set less than set less than immediate set less than unsigned set less than immediate unsigned

add addi addu addiu sub subu AND ANDi OR ORi NOR sll srl lui lw sw lhu sh lbu sb ll sc beq bne j jal jr slt slti sltu sltiu

MIPS Core Instructions ( Patterson & Hennessy : Fig. 3.26 )

A complete description of the Instruction Set Architecture

MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together

MIPS Green Sheet

M I P S Reference Data

CORE INSTRUCTION SET FORNAME, MNEMONIC MAT OPERATION (in Verilog) add Add R R[rd] = R[rs] + R[rt] Add Immediate

ARITHMETIC CORE INSTRUCTION SET

1

OPCODE / FUNCT (Hex) (1) 0 / 20hex 8hex

I

R[rt] = R[rs] + SignExtImm

(1,2)

Add Imm. Unsigned addiu

I

R[rt] = R[rs] + SignExtImm

(2)

Add Unsigned

addu

R R[rd] = R[rs] + R[rt]

0 / 21hex

And

and

R R[rd] = R[rs] & R[rt]

0 / 24hex

And Immediate

andi

I

Branch On Equal

beq

I

Branch On Not Equal bne

I

Jump

j

J

R[rt] = R[rs] & ZeroExtImm if(R[rs]==R[rt]) PC=PC+4+BranchAddr if(R[rs]!=R[rt]) PC=PC+4+BranchAddr PC=JumpAddr

Jump And Link

jal

J

R[31]=PC+8;PC=JumpAddr

Jump Register

jr

addi

ll

R PC=R[rs] R[rt]={24’b0,M[R[rs] I +SignExtImm](7:0)} R[rt]={16’b0,M[R[rs] I +SignExtImm](15:0)} I R[rt] = M[R[rs]+SignExtImm]

Load Upper Imm.

lui

I

R[rt] = {imm, 16’b0}

Load Word

lw

I

R[rt] = M[R[rs]+SignExtImm]

Nor

nor

R R[rd] = ~ (R[rs] | R[rt])

Or

or

R R[rd] = R[rs] | R[rt]

Or Immediate

ori

I

Set Less Than

slt

R R[rd] = (R[rs] < R[rt]) ? 1 : 0

Load Byte Unsigned lbu Load Halfword Unsigned Load Linked

lhu

Set Less Than Imm. slti Set Less Than Imm. sltiu Unsigned Set Less Than Unsig. sltu

9hex

(3)

chex

(4)

4hex

(4) (5) (5)

5hex 2hex 3hex 0 / 08hex

(2) (2) (2,7)

24hex 25hex 30hex fhex

(2)

23hex 0 / 27hex 0 / 25hex

R[rt] = R[rs] | ZeroExtImm

(3)

dhex 0 / 2ahex

R[rt] = (R[rs] < SignExtImm)? 1 : 0 (2) ahex R[rt] = (R[rs] < SignExtImm) bhex ?1:0 (2,6) R R[rd] = (R[rs] < R[rt]) ? 1 : 0 (6) 0 / 2bhex 0 / 00hex R R[rd] = R[rt] << shamt I

sll

Shift Right Logical

srl

Store Byte

sb

Store Conditional

sc

Store Halfword

sh

Store Word

sw

R R[rd] = R[rt] >> shamt M[R[rs]+SignExtImm](7:0) = I R[rt](7:0) M[R[rs]+SignExtImm] = R[rt]; I R[rt] = (atomic) ? 1 : 0 M[R[rs]+SignExtImm](15:0) = I R[rt](15:0) I M[R[rs]+SignExtImm] = R[rt]

FR

Subtract

sub

R R[rd] = R[rs] - R[rt]

Subtract Unsigned

subu

28hex

(2,7)

38hex

(2) (2)

FI

I

rs 26 25

opcode 31

J

rs 26 25

opcode 31

rt 21 20

29hex 2bhex

(1) 0 / 22hex 0 / 23hex

rd 16 15

shamt 11 10

rt 21 20

funct 6 5

0

immediate 16 15

0

address 26 25

fmt 26 25

opcode 31

R R[rd] = R[rs] - R[rt] (1) May cause overflow exception (2) SignExtImm = { 16{immediate[15]}, immediate } (3) ZeroExtImm = { 16{1b’0}, immediate } (4) BranchAddr = { 14{immediate[15]}, immediate, 2’b0 } (5) JumpAddr = { PC+4[31:28], address, 2’b0 } (6) Operands considered unsigned numbers (vs. 2’s comp.) (7) Atomic test&set pair; R[rt] = 1 if pair atomic, 0 if not atomic

opcode 31

11/11/--/0 11/10/--/y 11/11/--/y 11/10/--/3 11/11/--/3 11/10/--/2 11/11/--/2 11/10/--/1 11/11/--/1 31/--/--/-35/--/--/-0 /--/--/10 0 /--/--/12 10 /0/--/0 0/--/--/18 0/--/--/19 0/--/--/3 39/--/--/-3d/--/--/--

ft 21 20

fmt 26 25

fs 16 15

ft 21 20

fd 11 10

funct 6 5

16 15

0

REGISTER NAME, NUMBER, USE, CALL CONVENTION PRESERVED ACROSS NAME NUMBER USE A CALL? $zero 0 The Constant Value 0 N.A. $at 1 Assembler Temporary No Values for Function Results $v0-$v1 2-3 No and Expression Evaluation $a0-$a3 4-7 Arguments No $t0-$t7 8-15 Temporaries No $s0-$s7 16-23 Saved Temporaries Yes $t8-$t9 24-25 Temporaries No $k0-$k1 26-27 Reserved for OS Kernel No $gp 28 Global Pointer Yes $sp 29 Stack Pointer Yes $fp 30 Frame Pointer Yes $ra 31 Return Address No

Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.

0

immediate

PSEUDOINSTRUCTION SET NAME MNEMONIC OPERATION blt if(R[rs]R[rt]) PC = Label Branch Greater Than ble if(R[rs]<=R[rt]) PC = Label Branch Less Than or Equal bge if(R[rs]>=R[rt]) PC = Label Branch Greater Than or Equal li R[rd] = immediate Load Immediate move R[rd] = R[rs] Move

BASIC INSTRUCTION FORMATS R

opcode 31

0 / 02hex (2)

OPCODE / FMT /FT / FUNCT (Hex) 11/8/1/-11/8/0/-0/--/--/1a 0/--/--/1b 11/10/--/0

FLOATING-POINT INSTRUCTION FORMATS

I

Shift Left Logical

2

FORNAME, MNEMONIC MAT OPERATION Branch On FP True bc1t FI if(FPcond)PC=PC+4+BranchAddr (4) Branch On FP False bc1f FI if(!FPcond)PC=PC+4+BranchAddr(4) div Divide R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] divu Divide Unsigned R Lo=R[rs]/R[rt]; Hi=R[rs]%R[rt] (6) add.s FR F[fd ]= F[fs] + F[ft] FP Add Single FP Add {F[fd],F[fd+1]} = {F[fs],F[fs+1]} + add.d FR Double {F[ft],F[ft+1]} FP Compare Single c.x.s* FR FPcond = (F[fs] op F[ft]) ? 1 : 0 FP Compare FPcond = ({F[fs],F[fs+1]} op c.x.d* FR Double {F[ft],F[ft+1]}) ? 1 : 0 * (x is eq, lt, or le) (op is ==, <, or <=) ( y is 32, 3c, or 3e) FP Divide Single div.s FR F[fd] = F[fs] / F[ft] FP Divide {F[fd],F[fd+1]} = {F[fs],F[fs+1]} / div.d FR Double {F[ft],F[ft+1]} FP Multiply Single mul.s FR F[fd] = F[fs] * F[ft] FP Multiply {F[fd],F[fd+1]} = {F[fs],F[fs+1]} * mul.d FR Double {F[ft],F[ft+1]} FP Subtract Single sub.s FR F[fd]=F[fs] - F[ft] FP Subtract {F[fd],F[fd+1]} = {F[fs],F[fs+1]} sub.d FR Double {F[ft],F[ft+1]} lwc1 I F[rt]=M[R[rs]+SignExtImm] Load FP Single (2) Load FP F[rt]=M[R[rs]+SignExtImm]; (2) ldc1 I Double F[rt+1]=M[R[rs]+SignExtImm+4] mfhi R R[rd] = Hi Move From Hi mflo Move From Lo R R[rd] = Lo Move From Control mfc0 R R[rd] = CR[rs] mult Multiply R {Hi,Lo} = R[rs] * R[rt] Multiply Unsigned multu R {Hi,Lo} = R[rs] * R[rt] (6) sra Shift Right Arith. R R[rd] = R[rt] >>> shamt swc1 Store FP Single I M[R[rs]+SignExtImm] = F[rt] (2) Store FP M[R[rs]+SignExtImm] = F[rt]; (2) sdc1 I Double M[R[rs]+SignExtImm+4] = F[rt+1]

0

A complete description of the Instruction Set Architecture

4 IEEE 754 Symbols Exponent Fraction Object ±0 0 0 ± Denorm 0 ≠0 1 to MAX - 1 anything ± Fl. Pt. Num. ±∞ MAX 0 MAX ≠0 NaN S.P. MAX = 255, D.P. MAX = 2047

IEEE 754 FLOATING-POINT STANDARD (-1)S × (1 + Fraction) × 2(Exponent - Bias) where Single Precision Bias = 127, Double Precision Bias = 1023. IEEE Single Precision and Double Precision Formats: S 31

Exponent 23 22

S 63

Fraction

30

0

Exponent 62

Fraction 52 51

0

MEMORY ALLOCATION $sp

7fff fffchex

$gp

1000 8000hex 1000 0000hex

pc

STACK FRAME ... Argument 6 Argument 5 $fp Saved Registers

Stack

Dynamic Data Static Data

Stack Grows

Local Variables

$sp

Text

0040 0000hex

Higher Memory Addresses

Lower Memory Addresses

Reserved

0hex DATA ALIGNMENT

Double Word Word Word Halfword Halfword Halfword Halfword Byte Byte Byte Byte Byte Byte Byte Byte 0

1

2

3

4

5

6

7

Value of three least significant bits of byte address (Big Endian)

EXCEPTION CONTROL REGISTERS: CAUSE AND STATUS B Interrupt Exception D Mask Code 31

15

8

Pending Interrupt 15

8

6

2

U M

E I L E

4

1

0

BD = Branch Delay, UM = User Mode, EL = Exception Level, IE =Interrupt Enable EXCEPTION CODES Number Name Cause of Exception Number Name Cause of Exception 0 Int Interrupt (hardware) 9 Bp Breakpoint Exception Address Error Exception Reserved Instruction 4 AdEL 10 RI (load or instruction fetch) Exception Address Error Exception Coprocessor 5 AdES 11 CpU (store) Unimplemented Bus Error on Arithmetic Overflow 6 IBE 12 Ov Instruction Fetch Exception Bus Error on 7 DBE 13 Tr Trap Load or Store 8 Sys Syscall Exception 15 FPE Floating Point Exception SIZE PREFIXES (10x for Disk, Communication; 2x for Memory) PREPREPREPRESIZE FIX SIZE FIX SIZE FIX SIZE FIX 10-3 milli- 10-15 femto103, 210 Kilo- 1015, 250 Peta10-6 micro- 10-18 atto106, 220 Mega- 1018, 260 Exa109, 230 Giga- 1021, 270 Zetta- 10-9 nano- 10-21 zepto1012, 240 Tera- 1024, 280 Yotta- 10-12 pico- 10-24 yoctoThe symbol for each prefix is just its first letter, except µ is used for micro.

Copyright 2009 by Elsevier, Inc., All rights reserved. From Patterson and Hennessy, Computer Organization and Design, 4th ed.

MIPS Reference Data Card (“Green Card”) 1. Pull along perforation to separate card 2. Fold bottom side (columns 3 and 4) together

MIPS Green Sheet

3 OPCODES, BASE CONVERSION, ASCII SYMBOLS MIPS (1) MIPS (2) MIPS Hexa- ASCII Hexa- ASCII DeciDeciopcode funct funct Binary deci- Chardeci- Charmal mal (31:26) (5:0) (5:0) mal acter mal acter sll 00 0000 0 0 NUL 64 40 @ add.f (1) sub.f 00 0001 1 1 SOH 65 41 A j srl 00 0010 2 2 STX 66 42 B mul.f jal sra 00 0011 3 3 ETX 67 43 C div.f beq sllv 00 0100 4 4 EOT 68 44 D sqrt.f bne 00 0101 5 5 ENQ 69 45 E abs.f blez srlv 00 0110 6 6 ACK 70 46 F mov.f bgtz srav 00 0111 7 7 BEL 71 47 G neg.f addi jr 00 1000 8 8 BS 72 48 H addiu jalr 00 1001 9 9 HT 73 49 I slti movz 00 1010 10 a LF 74 4a J sltiu movn 00 1011 11 b VT 75 4b K andi syscall round.w.f 00 1100 12 c FF 76 4c L ori break 13 d CR 77 4d M trunc.w.f 00 1101 xori 14 e SO 78 4e N ceil.w.f 00 1110 lui sync 15 f SI 79 4f O floor.w.f 00 1111 mfhi 01 0000 16 10 DLE 80 50 P mthi (2) 01 0001 17 11 DC1 81 51 Q mflo 01 0010 18 12 DC2 82 52 R movz.f mtlo 01 0011 19 13 DC3 83 53 S movn.f 01 0100 20 14 DC4 84 54 T 01 0101 21 15 NAK 85 55 U 01 0110 22 16 SYN 86 56 V 01 0111 23 17 ETB 87 57 W mult 01 1000 24 18 CAN 88 58 X multu 01 1001 25 19 EM 89 59 Y div 01 1010 26 1a SUB 90 5a Z divu 01 1011 27 1b ESC 91 5b [ 01 1100 28 1c FS 92 5c \ 01 1101 29 1d GS 93 5d ] 01 1110 30 1e RS 94 5e ^ 01 1111 31 1f US 95 5f _ lb add 10 0000 32 20 Space 96 60 ‘ cvt.s.f lh addu 10 0001 33 21 ! 97 61 a cvt.d.f lwl sub 10 0010 34 22 " 98 62 b lw subu 10 0011 35 23 # 99 63 c lbu and 10 0100 36 24 $ 100 64 d cvt.w.f lhu or 10 0101 37 25 % 101 65 e lwr xor 10 0110 38 26 & 102 66 f nor 10 0111 39 27 ’ 103 67 g sb 10 1000 40 28 ( 104 68 h sh 10 1001 41 29 ) 105 69 i swl slt 10 1010 42 2a * 106 6a j sw sltu 10 1011 43 2b + 107 6b k 10 1100 44 2c , 108 6c l 10 1101 45 2d 109 6d m swr 10 1110 46 2e . 110 6e n cache 10 1111 47 2f / 111 6f o ll tge 11 0000 48 30 0 112 70 p c.f.f lwc1 tgeu 11 0001 49 31 1 113 71 q c.un.f lwc2 tlt 11 0010 50 32 2 114 72 r c.eq.f pref tltu 11 0011 51 33 3 115 73 s c.ueq.f teq 11 0100 52 34 4 116 74 t c.olt.f ldc1 11 0101 53 35 5 117 75 u c.ult.f ldc2 tne 11 0110 54 36 6 118 76 v c.ole.f c.ule.f 11 0111 55 37 7 119 77 w sc 11 1000 56 38 8 120 78 x c.sf.f swc1 57 39 9 121 79 y c.ngle.f 11 1001 swc2 11 1010 58 3a : 122 7a z c.seq.f c.ngl.f 11 1011 59 3b ; 123 7b { c.lt.f 11 1100 60 3c < 124 7c | sdc1 11 1101 61 3d = 125 7d } c.nge.f sdc2 11 1110 62 3e > 126 7e ~ c.le.f c.ngt.f 11 1111 63 3f ? 127 7f DEL (1) opcode(31:26) == 0 (2) opcode(31:26) == 17ten (11hex); if fmt(25:21)==16ten (10hex) f = s (single); if fmt(25:21)==17ten (11hex) f = d (double)

A classification of architectures CISC PDP-11 ( DEC 1970 ) VAX ( DEC 1977 ) 8086 ( Intel 1978 ) and later x86 architectures

VLIW Trace ( Multiflow 1984 ) Cydra-5 ( Cydrone 1984 ) Itanium ( Intel 2001 )

RISC MIPS ( John L. Hennessy in Stanford 1981 ) RISC ( David Patterson in Berkeley 1980 - 1984 ) PRISM ( DEC 1985 ) Alpha ( DEC 1992) ARM ( Acorn 1985 ) POWER ( IBM 1990 )

SIMD ILLIAC IV ( Univ. Illinois 1971 ) GeForce 256 ( Nvidia 1999 ) GPUs more generally

CISC RISC VLIW SIMD

Complex Instruction Set Computing Reduced Set Computing Very Long Instruction Word Single Instruction Multiple Data

The design principles of the MIPS architecture P1. Simplicity favors regularity : Every instruction of the MIPS architecture is of length 4 bytes = 32 bits.

P2.

Smaller is faster :

Only 32 registers in the MIPS architecture.

P3.

Good design demands good compromises :

The instructions are separated into three different formats R, I and J.

The Register ( or R ) format

opcode

6 bits

first second register source register source operand operand 5 bits

5 bits

register destination operand

shift amount

function code

5 bits

5 bits

6 bits

The design principles of the MIPS architecture P1. Simplicity favors regularity : Every instruction of the MIPS architecture is of length 4 bytes = 32 bits.

P2.

Smaller is faster :

Only 32 registers in the MIPS architecture.

P3.

Good design demands good compromises :

The instructions are separated into three different formats R, I and J.

The Immediate ( or I ) format

opcode

6 bits

first second register source register source operand operand 5 bits

5 bits

address or immediate constant

16 bits

The design principles of the MIPS architecture P1. Simplicity favors regularity : Every instruction of the MIPS architecture is of length 4 bytes = 32 bits.

P2.

Smaller is faster :

Only 32 registers in the MIPS architecture.

P3.

Good design demands good compromises :

The instructions are separated into three different formats R, I and J.

The Jump ( or J ) format

opcode

target address

6 bits

26 bits

The complete LC-3 Instruction Set Architecture

See the detailed description of the ISA by Patt and Patel, 1984.

Inflation of the x86 instruction set over time 2.20

Concluding Remarks

1000 Number of Instructions

900 800 700 600 500 400 300 200 100 19 7 19 8 8 19 0 8 19 2 8 19 4 8 19 6 8 19 8 9 19 0 9 19 2 9 19 4 9 19 6 9 20 8 0 20 0 0 20 2 0 20 4 0 20 6 0 20 8 1 20 0 12

0

Year FIGURE 2.43 Growth of x86 instruction set over time. While there is clear technical value to some of these extensions, this rapid change also increases the difficulty for other companies to try to build compatible processors.

The price to pay ( among other things ) for backward compatibility...

Illustration : Intel Haswell i7 core ( 2013 )

All Haswell models designed to support MMX, SSE, SSE2, SSSE3, SSE4.1, SSE4.2, SSE4.2, F16C BMI1+BMI2, EIST, Intel 64, XD bit, Intel VT-x and Smart Cache die size ≈ 177 mm2 clock rate ≈ 3 GHz 22 nm FinFET technology

number of transistors per die ≈ 1 400 000 000

If you want to know more about that issue ...

Please have a look at the Intel manual here : http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

Performance ( vs. VAX-11/780 )

A famous « law » formulated by Gordon Moore in 1965

Moore’s paper may be also seen as part of a technological and ideological fight for integrated circuits.

Illustration : MOS Technology 6502 ( 1974 )

4.3 mm

3.9 mm

die size ≈ 16,6 mm2 clock rate ≈ 1 MHz to 2 MHz 8 μm NMOS technology

number of transistors per die ≈ 3510 in enhancement mode + 1018 in depletion mode

Digital Archeology : excavating the 6502 The low cost 6502 processor was designed in 1974 and launched the « Home Computers » era. The visual 6502 project started in 1999 and aims at « reverse engineering » the processor. Read the article :

Nikhil Swaminathan. Digging into Technology's Past. Volume 64 Number 4, July/August 2011 http://archive.archaeology.org/1107/features/mos_technology_6502_computer_chip_cpu.html

Visit the site :

http://www.visual6502.org/JSSim Make the JSSim simulator work and ... A. count the number of fetches in the first 100 clock cycles B. guess the function of the Sync pin Bonus : find the position of the instruction decode table and pinpoint the ROL and ROR commands in it FOR NEXT RECITATION A photography of the MOS Technology 6502 design team with a design schematic. Bill Mensch who drew the processor on a drafting board is standing second from the right. 1974.

Have you ever imagined constructing a city layer by layer ?

Picture taken from the project Urban Layers http://io.morphocode.com/urban-layers

Thus very likely to contain bugs and mistakes ... 1974 : MOS 6502 The ROR command of the first 6502 processor did not work. So, the command was simply removed from the manual ! 1994 P5 Pentium A few missing entries in the lookup table for its divide operation algorithm led to an error in the floating point unit, producing inaccurate results in approximately 1 in 9 billion floating point divides/ 2011 : Intel Sandy Bridge The metal line of the SATA port of the Cougar chipset was too small for its purpose and would gradually thin and degrade over time. 2014 : Intel Haswell The Transactional Synchronization Extension ( TSX ) or « transactional memory » was bugged in the Haswell processor. The TSX feature was thus disabled via a microcode update. Hot research topic today : design formal methods to prove that a given microarchitecture is correct

Intel’s own words on design defects The Pentium Pro processor and Pentium II processor may contain design defects or errors known as errata that may cause the product to deviate from published specifications. Many times, the effects of the errata can be avoided by implementing hardware or software work-arounds, which are documented in the Pentum Pro Processor Specification Update and Pentium II Processor Specification Update, both of which can be found at www.intel.com. Pentium Pro and Pentium II processors include a feature called "reprogrammable microcode", which allows certain types of errata to be worked around via microcode updates. The microcode updates reside in the system BIOS and are loaded into the processor by the system BIOS during Power-On Self Test, or POST.

Pentium(R) Pro Processor and Pentium(R) II Processor Processor Update Utility Help File v3.4 January 1998 Copyright 1995-8, Intel Corporation.

High-level language program (in C)

Algorithm

swap(int v[], int k) {int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; }

Compiler

Programming Language Operating System Architecture (ISA)

Assembly language program (for MIPS)

Microarchitecture

swap:

multi add lw lw sw sw jr

$2, $5,4 $2, $4,$2 $15, 0($2) $16, 4($2) $16, 0($2) $15, 4($2) $31

Digital Logic Circuit

Assembler

Binary machine language program (for MIPS)

00000000101000100000000100011000 00000000100000100001000000100001 10001101111000100000000000000000 10001110000100100000000000000100 10101110000100100000000000000000 10101101111000100000000000000100 00000011111000000000000000001000

FIGURE 1.4 C program compiled into assembly language and then assembled into machine language. Although the translation from high-level language to binary machine lan

A hierarchy of programming languages

Note the specific position of the programming language C at the software / hardware interface.

Compilation from C to MIPS Original C code : c

=

(a+b) - (i+j)

Compiled MIPS code : add t0, a, b add t1, i, j sub c, t0, t1

# # #

temporary register t0 = a+b temporary register t1 = i+j c = t0 - t1

Registers in MIPS There are exactly 32 registers in the MIPS architecture. Each register is able to store exactly one « word ». Here, by « word » one means a sequence of 32 bits. Each MIPS register thus contains exactly 32 bits. By convention, each register has its own status or function :

For more information, see Figure 2.14 in Patterson & Hennessy

Compilation from C to MIPS Original C code : c

=

(a+b) - (i+j)

Compiled MIPS code : add $t0, $s1, $s2 add $t1, $s3, $s4 sub $s0, $t0, $t1

# # #

temp register $t0 = $s1+$s2 temp register $t1 = $s2+$s3 $s0 = $t0 - $t1

Here, we make the important assumption that • the registers $s1 and $s2 contain the values of a and b • the registers $s2 and $s3 contain the values of i and j

Another example swap (int v[], int k) {int temp; temp = v[k]; v = v[k+1]; v[k+1] = temp; }

Original C code :

Compiled MIPS code : multi $2, $5,4 add

$2, $4,$2

lw lw sw sw jr

$15, $16, $16, $15, $31

0($2) 4($2) 0($2) 4($2)

# # # # # # # # #

multiplies by 4 the value of register 5 and writes the result in register 2 adds the value of register 4 to the value of register 2 and writes the result in register 2 loads the value of Mem[$2] in register 15 loads the value of Mem[$2+4] in register 16 stores the value of register 16 in Mem[$2] stores the value of register 15 in Mem[$2+4] jumps to the address loaded in register 31

Here, we suppose that : • register 4 contains the address in memory of the array v[] of integers • register 5 contains the value of the integer k.

A troublesome fact Almost every C compiler is bugged ... To convince yourself, please have a look at : Xuejun Yang , Yang Chen , Eric Eide , John Regehr Finding and Understanding Bugs in C Compilers 2011 ACM SIGPLAN Conference on Programming Language Design and Implementation ( PLDI )

Compilers and operating systems you can trust Certified Compilers CompCert : a formally checked C compiler Certified Operating Systems seL4 project : a formally checked OS microkernel

For the most curious among you :

Xavier Leroy Proof assistants in computer science research https://www.youtube.com/watch?v=PE4v6rVpX2g

Introducing myself ...

Paul Mellies [email protected] I am a CNRS researcher in Paris working on the interface between logic and programming languages with an interest in the translation ( or compilation ) of high level programs into low level programs like assembly code or machine code. One main purpose of my research is to develop the mathematical framework to provide a rigorous meaning ( or semantics ) to low level code, typically written in machine code or implemented in hardware.

HW 0 : introduce yourself ! Due before February 19th (Tuesday) midnight.

In the next five days : Install Unix on your machine ! Learn your first shell commands using the references which you will find on the webpage of the course: http://computer-architecture.org Read the paper by Moore. Read the paper by Patterson and Ditzel.

HW 1 : explore your computer ! Due before February 22nd (Friday) midnight.

Thank you !

Any question ?

Loading...

Computer Architecture Paul Mellies

Computer Architecture Paul Mellies Lecture 1 : General introduction to the course We are all magnificent meaning machines We produce meaning all da...

20MB Sizes 0 Downloads 0 Views

Recommend Documents

Computer Architecture
0-13-148521-0. 3. Languages, Levels, Virtual Machines. A multilevel machine. Tanenbaum, Structured Computer Organization

Computer Architecture
Jul 29, 2009 - 1.3.3 Exercise for module 5, 6 ... Exercise 8.2: Suppose that a computer prefetches up to 20 instructions

Computer Architecture
Fifth Edition. “The 5th edition of Computer Architecture: A Quantitative Approach continues the legacy, providing stud

Govindarajulu computer architecture pdf
Tate equivalent establishes its micturates and Jacobinises kindly! Hazelnut mentionable you liquesces his caper irrespon

Advanced Computer Architecture
Performance Issues II. • Ic & p affected by processor design and compiler technology. • m affected mainly by compile

Computer Architecture and Programming Von Neumann architecture
Page 44 ... sll $t4, $t0, 2 add $t4, $s0, $t4 lw $t5, 0($t3) lw $t6, 0($t4) slt $t2, $t5, $t6 beq $t2, $zero, endif add

ADVANCED COMPUTER ARCHITECTURE
ADVANCED COMPUTER. ARCHITECTURE. Marco Ferretti. Tel. Ufficio: 0382 985365. E-mail: [email protected] Web: www.un

Computer Architecture: Instructions - Piazza
Use two temporary variables t0 and t1. Chapter 2 — Instructions: ..... Page 44 ... $t2. ▫ sll $t2,$s0,3. $s0. $t2. â

Computer architecture - UiO
I Basics of computer architecture. 3 .... 9 Introduction to low-level programming. 81 ..... 12.1 The major differences b

Computer Architecture - OS3.nl
Mar 10, 2016 - $t0, $t1, ..., $t9 for temporary values. $s0, $s1, ..., $s7 for saved .... sll $t2, $s0, 8 # $t2 = $s0