Keep your face always toward the sunshine - and shadows will fall behind you. Walt Whitman
Idea Transcript
Outline
Introduction to CMOS VLSI Design
Lecture 7B: Logical Effort
Introduction Delay in a Logic Gate Multistage Logic Networks Choosing the Best Number of Stages Example Summary
Lecture by Jay Brockman University of Notre Dame Fall 2008
Modified by Peter Kogge Fall 2009 Based on lecture slides by David Harris, Harvey Mudd College http://www.cmosvlsi.com/coursematerials.html 6: Logical Effort
Slide 1
Review Assume gate G1 is driving some # of other gates G2 – Fanout = number of such gates being driven Delay of a gate G1 = parasitic delay + effort delay Parasitic Delay = delay if gate G1 is driving 0 load – Function of diffusion capacitance in gate – Delay seen when G1 drives no other circuits Effort Delay: delay due to capacitance of circuits driven by G; function of – the number of gates of type G2 being driven – the input capacitance presented by a G2 gate
CMOS VLSI Design
5: Logical Effort
CMOS VLSI Design
Slide 2
Normalized Delay Normalized Delay: gate delay relative to: – “fanout of 1” inverter (drives one inverter) – with no parasitic capacitance – of value 3RC • R = eqvt resistance of unit nmos transistor in saturation • C = eqvt capacitance of gate of unit nmos transistor
CMOS VLSI Design
Normalized Parasitic Delay Inverter has 3 units of diffusion capacitance – 2 from pmos – 1 from nmos Parasitic delay of inverter is τ =3RC Normalized Parasitic Delay of a gate is p/3RC – p is parasitic delay from Elmore model To convert to psec, multiply by parasitic delay of an inverter in chosen technology
CMOS VLSI Design
Normailzed Effort Delay h = fanout or electrical effort = property of circuit – = # equivalent G1 gates being driven by G1 – = Cout/Cin where • Cout = total load capacitance presented by G2 inputs • Cin = input capacitance G1 presents on its sources – When gates G2 (those being driven) are type G1, then • h = # of copies g = logical effort to drive a gate of type G2 – effort required to drive one gate vs perfect inverter – how many eqvt invertors one G2 gate looks like – = (Input cap of G2 gate)/(Input cap of inverter) – = (Input cap of G2 gate)/3 Inverter has logical effort = 1 CMOS VLSI Design
gi = logical effort to drive a gate of type i = input cap/cap of inverter hi = fanout of gates of type i = load cap/input cap Question: if delay thru one gate is p + hg, can we write delay thru multistage as some P+HG? CMOS VLSI Design
Scaling Transistors What if all transistors in gate G got wider by k? – Denote as gate “G(k)” Parasitic delay of G(k): delay of unloaded gate – Diffusion capacitance increases by k – Resistance decreases by k – Result: No change Effort delay: ratio of load cap to input cap – If drive same # of G(k) as before, no change – If drive same # of G(1) as before, decrease by 1/k Result: fanout to type G(1) gates increases by k
CMOS VLSI Design
Overall Delay delay = ∑delay(i) – where delay(i) = delay thru I’th “stage” of logic delay(i) = pi + hi * gi – pi function only of gate type at stage i – gi function only of gate type at stage i • input cap/cap of inverter – hi depends on gates at stage i+1 • total load on gate i/input cap of gate I delay = ∑(pi + hi * gi ) = ∑(pi ) + ∑(hi * gi ) Can we write ∑(hi * gi ) as some H*G?
Introduce Branching Effort – Accounts for branching between stages in path
b=
5: Logical Effort
P = ∑ pi DF = ∑ f i
= BH
Slide 15
5: Logical Effort
CMOS VLSI Design
Slide 16
Designing Fast Circuits
Gate Sizes How wide should the gates be for least delay?
D = ∑ d i = DF + P
fˆ = gh = g CCoutin
Delay is smallest when each stage bears same effort 1 fˆ = gi hi = F N
⇒ Cini =
Thus minimum delay of N stage path is
Working backward, apply capacitance transformation to find input capacitance of each gate given load it drives. Check work by verifying input cap spec is met.
1
D = NF N + P This is a key result of logical effort – To find fastest possible delay – Doesn’t require calculating gate sizes 5: Logical Effort
CMOS VLSI Design
Slide 17
5: Logical Effort
Example: 3-stage path
A
8
5: Logical Effort
x
CMOS VLSI Design
y 45 y
B
45
Slide 19
Slide 18
x x A
x
CMOS VLSI Design
Example: 3-stage path
Select gate sizes x and y for least delay from A to B
G = (4/3)*(5/3)*(5/3) = 100/27 H = 45/8 B=3*2=6 F = GBH = 125
fˆ = 3 F = 5
x 8
P=2+3+2=7 D = 3*5 + 7 = 22 = 4.4 FO4
CMOS VLSI Design
Slide 21
45 P: 4 N: 6
CMOS VLSI Design
P: 12 N: 3
B
5: Logical Effort
x
y 45 y
CMOS VLSI Design
B
45
Slide 22
Choosing Best # of Stages Goal: estimate delay & choose transistor sizes Many different topologies (combinations of gate types) that implement same function We know in general – NANDs better than NORs – Gates with fewer inputs better than more inputs Typical shortcut: estimate delay by # of stages – Assuming constant “gate delay” – and thus shorter paths are faster THIS IS NOT ALWAYS TRUE! – Adding inverters at end with increasing sizes can speed up circuit, esp. when high load
Work backward for sizes y = 45 * (5/3) / 5 = 15 x = (15*2) * (5/3) / 5 = 10
5: Logical Effort
x
A
Example: 3-stage path
A P: 4 N: 4
Example: 3-stage path
45
Slide 23
CMOS VLSI Design
Best Number of Stages
Example (p. 178)
(p. 178)
How many stages should a path use? – Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter InitialDriver
1
1
1
How many stages should a path use? – Minimizing number of stages is not always fastest Example: drive 64-bit datapath with unit inverter
1
InitialDriver
D =
1
D = NF1/N + P = N(64)1/N + N
1
1
1
8
4
2.8
16
8
23 DatapathLoad N: f: D:
5: Logical Effort
64 1
64 2
64 3
CMOS VLSI Design
64
DatapathLoad N: f: D:
4
Slide 25
5: Logical Effort
General Derivation
D = NF + ∑ pi + ( N − n1 ) pinv 1 N
Logic Block: n1Stages Path Effort F
N - n1 ExtraInverters
N total stages with (N-n1) Inverters • do not change logical effort • do add parasitic delay
CMOS VLSI Design
3 4 15
64
4 2.8 15.3 Fastest
CMOS VLSI Design
Slide 26
pinv + ρ (1 − ln ρ ) = 0 has no closed-form solution
Again, – these ρ values are best logical effort per stage ^ = log – when you have N ρ F stages
pinv + ρ (1 − ln ρ ) = 0 5: Logical Effort
2 8 18
64
For pinv = 1, solve numerically for ρ = 3.59
i =1
1 1 1 ∂D = − F N ln F N + F N + pinv = 0 ∂N 1 Define best stage effort ρ = F N
1 64 65
64
Best Stage Effort
Consider adding inverters to end of n1 stage path – How many give least delay? n1
64
Slide 27
5: Logical Effort
CMOS VLSI Design
Slide 28
Sensitivity Analysis
1st Example, Revisited
D(N) /D(N)
How sensitive is delay to using exactly the best 1.6 1.51 number of stages? 1.4
Ben Bitdiddle is the memory designer for the Motoroil 68W86, an embedded automotive processor. Help Ben design the decoder for a register file. A[3:0] A[3:0]
1.26
1.2
32 bits
1.15
1.0
0.5
0.7
1.0 N/ N
1.4
2.0
= actual N vs optimal N
2.4 < ρ < 6 gives delay within 15% of optimal – We can be sloppy! – I like ρ = 4
5: Logical Effort
CMOS VLSI Design
Slide 29
What Does This Mean? 16 word register file – There are 16 separate row lines – Branching factor of 16 at end Each word is 32 bits wide & each bit presents load of 3 unit-sized transistors – The load on each row line is 32*3 True and complementary address inputs A[3:0] – Any address input needed for only 8 row lines Each input may drive 10 unit-sized transistors – Total input capacitance from 1st stage gates on inputs = 10 5: Logical Effort
CMOS VLSI Design
Slide 31
Decoder specifications: – 16 word register file – Each word is 32 bits wide – Each bit presents load of 3 unit-sized transistors – True and complementary address inputs A[3:0] – Each input may drive 10 unit-sized transistors Ben needs to decide: – How many stages to use? – How large should each gate be? – How fast can decoder operate? 5: Logical Effort
16
Register File
CMOS VLSI Design
Slide 30
Number of Stages Decoder effort is mainly electrical and branching Electrical Effort: H= Branching Effort: B= If we neglect logical effort (assume G = 1) Path Effort: F= Number of Stages:
5: Logical Effort
N=
CMOS VLSI Design
Slide 32
16 words
0.0
4:16 Decoder
(ρ =2.4)
(ρ=6)
Number of Stages
3 Stage Gate Sizes & Delay
Decoder effort is mainly electrical and branching Electrical Effort: H = (32*3) / 10 = 9.6 Branching Effort: B=8 If we neglect logical effort (assume G = 1) Path Effort: F = GBH = 76.8 Number of Stages:
fˆ = F 1/ 3 = 5.36 D = 3 fˆ + 1 + 4 + 1 = 22.1 z = 96*1/5.36 = 18
y = 18*2/5.36 = 6.7
A[0] A[0] 10
10
N = log4F = 3.1
y
z
word[0] 96 units of wordline capacitance
Try a 3-stage design 5: Logical Effort
y
CMOS VLSI Design
Slide 33
Comparison
5: Logical Effort
z
word[15]
CMOS VLSI Design
Slide 34
Review of Definitions
Compare many alternatives with a spreadsheet Term
Stage
Path
number of stages
1
N
G = ∏ gi
Design
N
G
P
D
NAND4-INV
2
2
5
29.8
logical effort
g
NAND2-NOR2
2
20/9
4
30.1
electrical effort
h=
Cout Cin
H=
Con-path + Coff-path Con-path
B = ∏ bi
Cout-path Cin-path
INV-NAND4-INV
3
2
6
22.1
branching effort
b=
NAND4-INV-INV-INV
4
2
7
21.1
effort
f = gh
F = GBH
NAND2-NOR2-INV-INV
4
20/9
6
20.5
NAND2-INV-NAND2-INV
4
16/9
6
19.7
effort delay
f
DF = ∑ f i
INV-NAND2-INV-NAND2-INV
5
16/9
7
20.4
parasitic delay
p
P = ∑ pi
NAND2-INV-NAND2-INV-INV-INV 6
16/9
8
21.6
delay
d= f +p
5: Logical Effort
CMOS VLSI Design
Slide 35
5: Logical Effort
CMOS VLSI Design
D = ∑ d i = DF + P
Slide 36
Method of Logical Effort 1) 2) 3) 4) 5)
Compute path effort Estimate best number of stages Sketch path with N stages Estimate least delay Determine best stage effort
6) Find gate sizes
5: Logical Effort
F = GBH N = log 4 F 1
D = NF N + P 1 fˆ = F N Cini =
CMOS VLSI Design
gi Couti fˆ
Slide 37
Summary Logical effort is useful for thinking of delay in circuits – Numeric logical effort characterizes gates – NANDs are faster than NORs in CMOS – Paths are fastest when effort delays are ~4 – Path delay is weakly sensitive to stages, sizes – But using fewer stages doesn’t mean faster paths – Delay of path is about log4F FO4 inverter delays – Inverters and NAND2 best for driving large caps Provides language for discussing fast circuits – But requires practice to master
5: Logical Effort
CMOS VLSI Design
Slide 39
Limits of Logical Effort Chicken and egg problem – Need path to compute G – But don’t know number of stages without G Simplistic delay model – Neglects input rise time effects Interconnect – Iteration required in designs with wire Maximum speed only – Not minimum area/power for constrained delay