An Energy-Efficient 32-bit Multiplier Architecture in 90-nm ... - DiVA portal [PDF]

Sep 26, 2006 - representation for signed binary numbers. Section 3 throws light on some renowned multiplier types used i

0 downloads 4 Views 1MB Size

Recommend Stories


Pupils in remedial classes - DiVA portal [PDF]
remedial class. The thesis is based on interviews, questionnaires, and obser- vations and includes parents, teachers, and pupils in ten remedial classes. Fifty-five ... Article III focuses on teaching children in remedial classes, and is based on ...

Untitled - DiVA portal
When you do things from your soul, you feel a river moving in you, a joy. Rumi

Untitled - DiVA portal
Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

Untitled - DiVA portal
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Untitled - DiVA portal
If you want to become full, let yourself be empty. Lao Tzu

Untitled - DiVA portal
Don't count the days, make the days count. Muhammad Ali

Untitled - DiVA portal
Never wish them pain. That's not who you are. If they caused you pain, they must have pain inside. Wish

Untitled - DiVA portal
Sorrow prepares you for joy. It violently sweeps everything out of your house, so that new joy can find

Untitled - Diva-portal
Silence is the language of God, all else is poor translation. Rumi

Bachelor thesis in Business Administration - DiVA portal [PDF]
Purpose: The purpose of this thesis is to identify what factors influence international students in their choice of a bank. Literature review: A review of previous research about bank selection criteria related to students as well as a few examples o

Idea Transcript


An Energy-Efficient 32-bit Multiplier Architecture in 90-nm CMOS Master thesis performed in division of Electronic Devices by

Nasir Mehmood Thesis No: LiTH-ISY-EX--06/3852--SE Linköping Date: 2006-09-26

ii

An Energy-Efficient 32-bit Multiplier Architecture in 90-nm CMOS Master thesis in Electronic Devices at Linköping Institute of Technology by

Nasir Mehmood LiTH-ISY-EX--06/3852--SE

Supervisor: Prof. Dr. Atila Alvandpour Co-supervisor: Tekn.Lic Martin Hansson Examiner: Prof. Dr. Atila Alvandpour Linköping 2006-09-26

iii

iv

Presentation date: 2006-09-15 Publication date:

Division of Electronic Devices Department of Electrical Engg.

Language ●English Swedish Number of pages 72

Type of Publication Licentiate thesis ● Degree thesis Thesis C-level Thesis D-level Report Other (specify below)

ISBN (Licentiate thesis) ISRN: LiTH-ISY-EX--06/3852--SE Title of series (Master thesis) Series number/ISSN ()

URL, Electronic Version http://www.ep.liu.se

Publication Title Energy-efficient 32-bit multiplier architecture in 90-nm CMOS Author Nasir Mehmood

Abstract A fast and energy-efficient multiplier is always needed in electronics industry especially DSP, image processing and arithmetic units in microprocessors. Multiplier is such an important element which contributes substantially to the total power consumption of the system. On VLSI level, the area also becomes quite important as more area means more system cost. Speed is another key parameter while designing a multiplier for a specific application. These three parameters i.e. power, area and speed are always traded off. Speaking of DSP processors, area and speed of MAC unit are the most important factors. But sometimes, increasing speed also increases the power consumption, so there is an upper bound of speed for a given power criteria. Considering the battery operated portable multimedia devices, low power and fast designs of multipliers are more important than area. The design of a low power, high speed and area efficient multiplier is thus the goal of my thesis work. The projected plan is to instantiate a good design and modify it for low power and speed and prepare its layout using 90nm technology in Cadence®. For that purpose study has been performed on a number of research papers presented in section 7 and selected one of the architecture presented by Jung-Yup Kang and Jean-Luc Gaudiot. They presented a unique technique for power reduction in Wallace tree multipliers. They have proposed a method to calculate 2’s complement of multiplicand for final Partial Product Row (PPR) if using MBE technique. This method has been used in the design for speed enhancement and power reduction. The ultimate purpose is to come up with such an architecture which is energy and area efficient than a conventional multiplier at the same performance level. This report describes the design and evaluation of new energy-efficient 32-bit multiplier architecture by comparing its power, performance and chip area to those of a conventional 32-bit multiplier. The report throws light on the basic principles and methods of binary multiplication process and also the power consumption issues related to multipliers. The new algorithm, which reduces the last negative signal in the partial product row is discussed to develop the new architecture. A power performance comparison is shown. The simulation results show that the new architecture is 46 % energy-efficient than a conventional multiplier at the same performance level. The number of transistors used is 34% less and also it consumes 25% less chip area in 90nm CMOS technology.

Key Words: Modified Booth-encoding, carry save adder, multiplier, partial products, CMOS power, two’s complement, power-delay product

v

vi

Team Involved Prof. Dr. Atila Alvandpour Supervisor Division of Electronics Devices Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Phone : +46 (0)13-285818 Mobile: +46 (0)708-485818 Fax : +46 (0)13-139282 Email: [email protected] Tekn. Lic. Martin Hansson Ph.D. Student Lic. Eng., M.Sc.EE Co-Supervisor Division of Electronics Devices Department of Electrical Engineering Linköping University Office 3A:515, B building Office phone: +46(0)13 282859 Cell phone: +46(0)70 3444716 Fax: +46(0)13 139282 E-mail: [email protected] Nasir Mehmood B.E Electrical (Pak) Student MS SoCware Office 3A:529, B building Cell phone: +46(0)70 4789464 Fax: +46(0)13 139282 E-mail: [email protected] [email protected]

vii

viii

Abbreviations

MBE PPR PPG MSB LSB PP CSAT CLA CMOS

Modified Booth Encoding Partial Product Row Partial Product Generation Most Significant Bit Least Significant Bit Partial Product Carry save adder tree Carry look-ahead adder Complementary MOS

ix

x

Abstract A fast and energy-efficient multiplier is always needed in electronics industry especially DSP, image processing and arithmetic units in microprocessors. Multiplier is such an important element which contributes substantially to the total power consumption of the system. On VLSI level, the area also becomes quite important as more area means more system cost. Speed is another key parameter while designing a multiplier for a specific application. These three parameters i.e. power, area and speed are always traded off. Speaking of DSP processors, area and speed of MAC unit are the most important factors. But sometimes, increasing speed also increases the power consumption, so there is an upper bound of speed for a given power criteria. Considering the battery operated portable multimedia devices, low power and fast designs of multipliers are more important than area. The design of a low power, high speed and area efficient multiplier is thus the goal of my thesis work. The projected plan is to instantiate a good design and modify it for low power and speed and prepare its layout using 90nm technology in Cadence®. For that purpose study has been performed on a number of research papers presented in section 7 and selected one of the architecture presented by Jung-Yup Kang and Jean-Luc Gaudiot. They presented a unique technique for power reduction in Wallace tree multipliers. They have proposed a method to calculate 2’s complement of multiplicand for final Partial Product Row (PPR) if using MBE technique. This method has been used in the design for speed enhancement and power reduction. The ultimate purpose is to come up with such an architecture which is energy and area efficient than a conventional multiplier at the same performance level. This report describes the design and evaluation of new energy-efficient 32-bit multiplier architecture by comparing its power, performance and chip area to those of a conventional 32-bit multiplier. The report throws light on the basic principles and methods of binary multiplication process and also the power consumption issues related to multipliers. The new algorithm which reduces the last negative signal in the partial product row is discussed to develop the new architecture. A power performance comparison is shown. The simulation results show that the new architecture is 46 % energy-efficient than a conventional multiplier at the same performance level. The number of transistors used is 34% less and also it consumes 25% less chip area in 90nm CMOS technology.

xi

xii

Preface This master thesis is related to the design and layout of a 32-bit low power fast multiplier. Dr. Atila Alvandpour is the project supervisor and Martin Hansson is the co-supervisor of my thesis work. Master thesis consists of 20 Swedish point’s equivalent to 30 ECTS points. The duration of my thesis spans from May, 2006 to September, 2006, comprising of approximately 5 months. The scope of thesis work includes searching for a fast Wallace tree-based multiplier, understanding of basic multiplier concepts, understanding of low power techniques, selection of an appropriate architecture, system level design, transistor level design, layout using 90nm gate length transistors, simulation and performance verification, power, performance and chip area comparison with a 32-bit conventional multiplier. Section 1 describes the basic multiplier concepts like generation of partial products, partial product reduction techniques, Modified Booth Encoding (MBE), carry save adder tree and related theory about signed/unsigned multiplier operation. Section 2 describes the signed multiplication and various methods of representation for signed binary numbers. Section 3 throws light on some renowned multiplier types used in today’s electronic products. Array based multipliers, tree-based multiplier, bit serial and bit parallel multipliers are discussed here. Multiplier Accumulator (MAC) is also described at the end. Section 4 deals with the power related issues in VLSI circuits generally and the multipliers in specific. It also provides knowledge about various low power techniques utilized for power reduction in VLSI circuits. Section 5 describes the architecture of a 32-bit energy-efficient tree multiplier that has been developed for the purpose of master thesis. This section covers the previous work done by other researchers, gate level designs of all sub-modules, sign extension problem, critical path calculations and power, performance as well as chip area comparisons with a conventional multiplier. Section 6 is about the simulation setup and results at transistor and layout levels. Section 7 covers all the references that have been consulted for the preparation of thesis report.

xiii

xiv

Acknowledgments I would like to express my sincere gratitude to my supervisor Prof. Atila Alvandpour and co-supervisor Tekn.Lic. Martin Hansson for their esteemed guidance and assistance for my thesis work. I also thank all the staff of Electronic Devices division for being so kind and helping me during my thesis work. I am also thankful to Henrik Fredriksson and Stefan Andersson for their assistance. I would also like to express my heartiest thanks to my father Muhammad Hussain, mother Kaneez Akhtar, my brothers Amir and Adeel, my sisters and my wife Sumaira for their affection and encouragement to complete my studies in Sweden. And specially, I feel pleasure to pay sincere gratitude to all my teachers in Govt. High School Bhakral, Fauji Foundation College, UET Taxila and Linköping University who provided me the knowledge, wisdom and inspiration due to which I am able to achieve my MS degree. And of course I cannot forget my son Areeb whose innocent eyes are always staring at me even from thousands miles away.

xv

xvi

Table of Contents List of figures..................................................................................................... xix 1. Multiplier Background.................................................................................. 3 1.1. Basic binary multiplier ......................................................................... 3 1.2. Partial product generation..................................................................... 4 1.3. Booth Encoding .................................................................................... 4 1.4. Modified Booth Encoding (MBE) ........................................................ 5 1.5. Carry Save Adder Tree (CSAT) ........................................................... 6 1.6. Fast Adders ........................................................................................... 8 1.6.1. Carry look-ahead adder (CLA)..................................................... 8 1.6.2. Simple carry skip adder .............................................................. 10 1.6.3. Multilevel carry skip adder......................................................... 12 1.6.4. Carry select adder ....................................................................... 12 1.6.5. Conditional sum adder................................................................ 13 1.7. Compressors ....................................................................................... 14 1.7.1. [3:2] Compressor ........................................................................ 14 1.7.2. [4:2] Compresssor....................................................................... 15 1.7.3. More compression ...................................................................... 16 2. Signed multiplication.................................................................................. 19 2.1. Signed number representations........................................................... 19 2.1.1. Sign-magnitude representation ................................................... 19 2.1.2. Complement representation ........................................................ 20 2.1.2.1. 1’s complement representation ............................................... 20 2.1.2.2. 2’s complement representation ............................................... 21 2.1.2.3. Binary offset representation.................................................... 21 2.1.3. Signed-digit code ........................................................................ 22 2.1.4. Canonic signed-digit code .......................................................... 22 3. Multiplier types........................................................................................... 25 3.1. Array multipliers................................................................................. 25 3.1.1. Simple array multiplier ............................................................... 25 3.1.2. Double array multiplier............................................................... 26 3.1.3. Higher-order array ...................................................................... 27 3.2. Tree multipliers................................................................................... 28 3.2.1. Binary Trees ............................................................................... 29 3.2.2. Balanced-Delay tree.................................................................... 30 3.2.3. Overturned-Staircase tree ........................................................... 30 3.2.4. Wallace Tree............................................................................... 31 3.3. Serial multipliers................................................................................. 32 xvii

3.3.1. Serial/parallel multiplier ............................................................. 33 3.3.2. Transposed serial/parallel multiplier .......................................... 33 3.4. Serial/Parallel Multiplier Accumulator............................................... 34 4. Power consumption issues.......................................................................... 37 4.1. CMOS power consumption ................................................................ 37 4.1.1. Static power ................................................................................ 37 4.1.2. Short circuit power ..................................................................... 38 4.1.3. Dynamic switching power ......................................................... 39 4.2. Low power techniques........................................................................ 40 4.2.1. Delay balancing .......................................................................... 40 4.2.2. Clock gating................................................................................ 41 4.2.3. Voltage scaling ........................................................................... 42 4.2.4. Transition activity reduction....................................................... 42 4.2.5. Pipelining.................................................................................... 43 4.2.6. Interleaving ................................................................................. 44 5. Energy-efficient 32 bit multiplier architecture ........................................... 47 5.1. Previous related work ......................................................................... 47 5.2. Architecture description...................................................................... 47 5.2.1. Multiplication Algorithm............................................................ 48 5.2.2. Sign extension problem and solution.......................................... 48 5.2.3. Fast method for 2’s complement ................................................ 49 5.3. Development of full architecture ........................................................ 51 5.4. Critical path calculations .................................................................... 54 6. Simulation results in Cadence® 90nm ....................................................... 59 6.1. Power, performance comparison ........................................................ 59 6.2. Chip area comparison ......................................................................... 61 6.3. Future work direction ......................................................................... 65 6.4. Conclusions ........................................................................................ 65 7. References................................................................................................... 69

xviii

List of figures Figure 1: Basic binary multiplication ................................................................... 3 Figure 2: Signed multiplication algorithm............................................................ 3 Figure 3: Partial product generation logic ............................................................ 4 Figure 4: Multiplier bit grouping according to Booth Encoding.......................... 5 Figure 5: Bit positions in multiplier ..................................................................... 7 Figure 6: Rearranging bits in multiplier ............................................................... 7 Figure 7: 9-input reduction tree ............................................................................ 7 Figure 8: 4-bit carry look ahead (CLA)................................................................ 9 Figure 9: 4-bit ripple carry adder........................................................................ 10 Figure 10: 4-bit carry bypass adder .................................................................... 11 Figure 11: Pass transistor implementation of bypass adder................................ 11 Figure 12: Multilevel Carry skip adder .............................................................. 12 Figure 13: Carry select module........................................................................... 13 Figure 14: 4-bit conditional sum adder............................................................... 14 Figure 15: A generic compressor........................................................................ 14 Figure 16: Gate level design of [3:2] compressor .............................................. 15 Figure 17: [4:2] compressor logic diagram [15]................................................. 15 Figure 18: [4:2] compressor using [3:2] compressor.......................................... 16 Figure 19: A [6:2] compressor............................................................................ 16 Figure 20: A [9:2] compressor............................................................................ 16 Figure 21: Array multiplier mechanism ............................................................. 25 Figure 22: Simple array layout ........................................................................... 26 Figure 23: Partial products addition using Double array.................................... 27 Figure 24: Partial products addition using (6, 6, 8, 10) array............................. 28 Figure 25: Partial product addition using tree topology ..................................... 29 Figure 26: Binary tree formed by 4:2 compressors ............................................ 29 Figure 27: Overturned-staircase tree with 18 PPs .............................................. 31 Figure 28: Typical Wallace tree ......................................................................... 32 Figure 29: Serial/parallel multiplier.................................................................... 33 Figure 30: Transposed serial/parallel multiplier................................................. 34 Figure 31: S/P multiplier accumulator................................................................ 34 Figure 32: Sources of leakage currents in CMOS [17] ...................................... 37 Figure 33: Short circuit current in CMOS circuits ............................................. 39 Figure 34: Equivalent circuit for load capacitance [18] ..................................... 40 Figure 35: Delay balancing example .................................................................. 41 Figure 36: Clock gating technique for power saving ......................................... 42 Figure 37: Pipelining example............................................................................ 43 xix

Figure 38: Interleaved circuit for multiplication ................................................ 44 Figure 39: Multiplication algorithm with negative encoding............................. 48 Figure 40: Multiplication with sign extension [5] .............................................. 49 Figure 41: Two’s complement conversion ......................................................... 49 Figure 42: Finding conversion signals for 8 bit number .................................... 50 Figure 43: Fast logic for 2’s complement computation [5] ................................ 50 Figure 44: New proposed algorithm ................................................................... 51 Figure 45: New proposed multiplier architecture............................................... 52 Figure 46: MBE logic ......................................................................................... 53 Figure 47: 5-1 selector ........................................................................................ 53 Figure 48: 3-5 decoder........................................................................................ 53 Figure 49: Partial product generation ................................................................. 53 Figure 50: Critical path of conventional multiplier ............................................ 54 Figure 51: Critical path of new architecture ....................................................... 55 Figure 52: Simulation setup................................................................................ 59 Figure 53: Power Vs delay plot .......................................................................... 60 Figure 54: Chip area comparison........................................................................ 61 Figure 55: Simulation results @ 1.0V ................................................................ 63 Figure 56: Simulation results @ 1.2V ................................................................ 64

xx

PART I BASICS OF MULTIPLICATION

1

2

1. Multiplier Background 1.1. Basic binary multiplier The operation of multiplication is rather simple in digital electronics. It has its origin from the classical algorithm for the product of two binary numbers. This algorithm uses addition and shift left operations to calculate the product of two numbers. Two examples are presented below. 10 x 8 = 80

-6 x 4 = -24

1 0 1 0 1 0 0 0 00 0 0 0 00 0 00 0 0 1 01 0 1 01 0 0 0 0

1 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 1 1 0 1 0 0 0 Figure 1: Basic binary multiplication

The left example shows the multiplication procedure of two unsigned binary digits while the one on the right is for signed multiplication.. The first digit is called Multiplicand and the second Multiplier. The only difference between signed and unsigned multiplication is that we have to extend the sign bit in the case of signed one, as depicted in the given right example in PP row 3. Based upon the above procedure, we can deduce an algorithm for any kind of multiplication which is shown in Figure 2. Here, we assume that the MSB represents the sign of digit.

Figure 2: Signed multiplication algorithm

3

1.2. Partial product generation Partial product generation is the very first step in binary multiplier. These are the intermediate terms which are generated based on the value of multiplier. If the multiplier bit is ‘0’, then partial product row is also zero, and if it is ‘1’, then the multiplicand is copied as it is. From the 2nd bit multiplication onwards, each partial product row is shifted one unit to the left as shown in the above mentioned example. In signed multiplication, the sign bit is also extended to the left. Partial product generators for a conventional multiplier consist of a series of logic AND gates as shown in Figure 3. X7

X6

X5

X4

X3

X2

X1

X0 Yi

PPi7

PPi6

PPi5

PPi4

PPi3

PPi2

PPi1

PPi0

Figure 3: Partial product generation logic

Careful optimization of the partial-product generation can lead to some substantial delay and area reduction [1]. 1.3. Booth Encoding Booth encoding is a method used for the reduction of the number of partial products proposed by A.D. Booth in 1950. A binary number X consisting of m bits represented in 2’s complement format can be described as [13] X = - 2 m X m + 2 m -1 X m -1 + 2 m - 2 X m - 2 + …

Eq 1.1

Rewriting Eq. 1.1 using 2a = 2a+1 -2a leads to

X = - 2m (X m -1 - X m ) + 2m -1 (X m - 2 - X m -1 ) + 2m - 2 (X m -3 - X m - 2 )

Eq 1.2

Considering the first 3 bits of X, we can determine whether to add Y, 2Y or 0 to partial product. The grouping of X bits is shown in Figure 4

4

Figure 4: Multiplier bit grouping according to Booth Encoding

The multiplier X is segmented into groups of three bits (Xi+1, Xi, Xi-1) and each group of bits is associated with its own partial product row using Table 1. Xi+1 Xi Xi-1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1

Increment 0 Y Y 2Y -2Y -Y -Y 0

Table 1: Booth encoding table

1.4. Modified Booth Encoding (MBE)

Modified booth encoding was invented by O.L. Macsorley in 1961. MBE is an enhanced form of Booth encoding. A binary number X = xm-1, xm-2,….., x0 consisting of m bits represented in 2’s complement form can be mathematically expressed as X = −2 m xm −1 + ∑ xi 2i

0

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.