Time efficient signed Vedic multiplier using ... - IET Digital Library [PDF]

The carry-save multiplier and Wallace tree multiplier are based on the arrangement of adders aiming reduction in overall

0 downloads 6 Views 658KB Size

Report

Download PDF

PNG Network

Recommend Stories

Implementation of Power Efficient Vedic Multiplier

What you seek is seeking you. Rumi

Efficient Computing Techniques using Vedic Mathematics Sutras

Don't watch the clock, do what it does. Keep Going. Sam Levenson

Novel High Speed Vedic Mathematics Multiplier using Compressors

The happiest people don't have the best of everything, they just make the best of everything. Anony

Building Digital Library using DSpace

Don't ruin a good today by thinking about a bad yesterday. Let it go. Anonymous

Survey on Performance of Vedic Multiplier

Come let us be friends for once. Let us make life easy on us. Let us be loved ones and lovers. The earth

performance evaluation of reversible vedic multiplier

Be like the sun for grace and mercy. Be like the night to cover others' faults. Be like running water

digital library digital library services services

Suffering is a gift. In it is hidden mercy. Rumi

Area Efficient Architecture for Convolution Using Vedic Mathematics

You miss 100% of the shots you don’t take. Wayne Gretzky

An Efficient Design of Dadda Multiplier Using Compression Techniques

If you want to become full, let yourself be empty. Lao Tzu

Vedic Maths Books Pdf

Don't be satisfied with stories, how things have gone with others. Unfold your own myth. Rumi

Idea Transcript

Time efﬁcient signed Vedic multiplier using redundant binary representation Ranjan Kumar Barik, Manoranjan Pradhan, Rutuparna Panda Department of Electronics and Telecommunication Engineering, VSS University of Technology, Burla, Sambalpur, India E-mail: [email protected] Published in The Journal of Engineering; Received on 19th December 2016; Accepted on 3rd February 2017

Abstract: This study presents a high-speed signed Vedic multiplier (SVM) architecture using redundant binary (RB) representation in Urdhva Tiryagbhyam (UT) sutra. This is the ﬁrst ever effort towards extension of Vedic algorithms to the signed numbers. The proposed multiplier architecture solves the carry propagation issue in UT sutra, as carry free addition is possible in RB representation. The proposed design is coded in VHDL and synthesised in Xilinx ISE 14.4 of various FPGA devices. The proposed SVM architecture has better speed performances as compared with various state-of-the-art conventional as well as Vedic architectures.

1

Introduction

The multiplication operation is one of the core components in digital signal processing (DSP) used in applications such as the discrete Fourier transform, fast Fourier transform, convolution or digital ﬁlters [1]. Thus, there has been a continuous research for reﬁning predeﬁned multipliers and developing new approaches for an efﬁcient multiplier architecture. There are many existing multiplier architectures such as array multiplier, Booth multiplier, (CS) multiplier, Wallace tree multiplier and Modiﬁed Booth multiplier [1–5]. Array multiplier is the simplest among all; however, large numbers of gates are required making time inefﬁcient architecture. The main advantage of Baugh Wooley multiplier (BWM) algorithm is that it can be implemented entirely with the conventional full adders [1]. However, this algorithm requires an irregular format array for implementation. The reorganised partial products (PPs) array of an N-bits BWM is achieved by inverting of most signiﬁcant bits (MSBs) of all PPs rows along with all bits of ﬁnal PP, adding extra ‘1’ bit to nth column of PPs. Thus, high performance multipliers are preferred with regular structures or reduced numbers of PPs. The carry-save multiplier and Wallace tree multiplier are based on the arrangement of adders aiming reduction in overall critical path of the multiplication operation. Although, CSA method provides easier layout design as compared with the Wallace tree method, carry signal can only be proceed bit wise manner. However, the Wallace tree method provides high-speed operation, but the irregularity in design make it difﬁcult for layout design. The modiﬁed Booth method of multiplication is based on the concept of reducing the number of PPs by processing more than one bit for the generation of PPs. This method of multiplier is considered as standard signed numbers multiplication. Booth’s encoding is mostly employed for designing of fast multipliers as it reduces the number of PPs in multiplication process. However, for higher radix i.e. more than 4, Booth’s encoding has difﬁculties in generating hard multiples. In [6], authors have reported a novel RB signed digit Booth’s encoding for a fast multiplier. This encoding technique generates PPs in RB forms without using any hard multipliers. In recent years, various decimal algorithms of ancient Vedic mathematics are efﬁciently extended to binary number system. The larger calculation is reduced to a smaller calculation by using the algorithms of Vedic mathematics [7]. Many signiﬁcant amount of works has been published using the concept of Vedic mathematics, obtaining efﬁcient multiplier, divider, squaring and cube architectures [8–19]. The controversies regarding the book ‘Vedic mathematics’ is mainly associated to the term ‘Vedic’ and its origin. Many opponents have argued that the term ‘Vedic

mathematics’ is entirely misleading and literally incorrect [20]. However, the opponents have never disagreed that the book brings true knowledge, different from the modern western method and provides secret techniques for solving various problems of mathematics. Thus, the advantages of these techniques are never undervalued as continuous research works have been carried out for its applications. In recent years, various decimal algorithms of ancient Vedic mathematics are efﬁciently extended to binary number system. An efﬁcient technique for squaring operation is reported using Vedic sutra without multiplication operation [8]. The area and delay parameters are compared with booth’s algorithms on Xilinx Virtex 4vlx15sf36-12. A high-speed, multiplier less squaring method is reported in [9]. For square operation of n-bit operand, squaring unit, i.e. Sethi and Panda [9] uses one (n−1) bit squaring circuit, a 2n bit binary adder and two numbers of n bit binary adder/ subtractor. Sethi and Panda [9], claim efﬁciency in delay and device utilisation as compared with booth multiplier and square unit reported in [8]. A time efﬁcient multiplier architecture using Nikhilam sutra of Vedic mathematics is reported in [10]. The efﬁciency of multiplier [10] is based on the two factors (i) larger operation is reduced to the smaller operation (ii) the speed of addition operation in accumulation of PPs is increased using CS adders. The multiplier architecture is found to be efﬁcient as compared with the state-of-the-art Vedic multipliers and straightforward CSA multiplier. As an application of Vedic algorithms, a time and area efﬁcient circuit for factorial calculation is reported [11]. This factorial calculation is based on parallel implementation using Vedic multiplication operations. The authors claim that, the reduction in the number of stages in a Vedic multiplication process leads to signiﬁcant reduction in propagation delay along with switching power consumption [12]. In recent time, apart from multiplier architecture [10], we have worked on dedicated and efﬁcient square and cube architectures using algorithms of Vedic mathematics [13, 14]. These proposed designs convert the square or cube of a large magnitude number into a smaller magnitude number along with some arithmetic operation. In both of the works [13, 14] decimal algorithm is successfully extended to binary radix-2 number system considering digital platforms. Again, the design performances of these square and cube architectures are found better as compared with various state-of-the-art conventional and Vedic architectures. Bansal and Madhu [15] suggested a high speed 16 × 16 bits Vedic multiplication using compressor adders. The multiplication operation is also based on Urdhva Tiryagbhyam (UT) algorithm of Vedic mathematics. Authors have claimed that their proposed multiplier has better speed performance as compared with the

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

array, Wallace tree and Booth multiplier architectures. Apart from these, Ramalatha et al. [17] have reported a speed and energy efﬁcient arithmetic logic unit (ALU), using UT sutra of Vedic multiplication technique. The ASIC level implementation of a complex multiplier design using Vedic mathematics is presented in [18]. The authors claim speed improvement over previously reported complex multiplier architectures using parallel adder and DA based implementation. In [19], Authors have suggested both single and double precision format ﬂoating point operations considering the UT method of Vedic mathematics. The authors claim speed and area efﬁciency as compared with ﬂoating point multipliers using Booth and Karatsuba algorithm. However, this multiplication operation is only limited to the unsigned number system. The motivation for the proposed work is that, all the previously reported Vedic architectures are only limited to unsigned numbers. In this paper, we have extended the scope of Vedic mathematics to signed numbers using the concept of the redundant binary (RB) number system in the UT algorithm of Vedic multiplier. The proposed signed Vedic multiplier (SVM) also solves the issue related to the carry propagation in UT algorithm using RB adder [12, 16]. The presented approach is completely unlike from the previously reported Vedic architectures in the sense that (i) the RB representation extends Vedic architecture to signed numbers (ii) the use of RB adder reduce propagation delay signiﬁcantly. Finally, it is believed that proposed SVM architecture may set fresh beginning for future research for signed number multiplication. The rest of the paper is organised as follows: Section 2 discusses the background of redundant number system and UT method of multiplication. Section 3 describes the proposed SVM using redundant number system. Section 4 provides the result and discussion of the proposed SVM architecture. The paper concludes in Section 5. 2

Background

In this section, we have discussed the theoretical background of redundant representation and the existing multiplier using UT sutra of Vedic mathematics. The RB number system is introduced in the Section 2.1 along with its property of carry free addition. The multiplication procedure for decimal and binary numbers using UT algorithm is explained in Section 2.2. 2.1 Redundant binary number representation In the year 1960, the class of signed digits number system with symmetric digit sets [− a, a] and radix r . 2 are deﬁned by Avizienis, restricting the value of α is to r/2 + 1 ≤ a ≤ r − 1 [21]. In [1], redundant number systems with general, asymmetric digit sets of the set [− a, b] and lower redundancy with r = 2 are discussed as Generalised Signed-Digit (GSD) numbers Table 1 There are two RB encoding scheme presentation using two binary bits m+ i

m− i

RB digit mi

Usdr : Vsdr Wsdr For SDR, Zsd = {z′ € Z: − a ≤ z ≤ b}, where ‘Z’ denotes to the set of integers and α, β > 0. As discussed, value of ‘α’ and ‘β’ deﬁne redundancy index (ρ) and redundancy only exist if

a + b + 1 ≥ r. The range of number for SDR (Wsdr ) can be deﬁned as

Wsdr

0 1 1 0

0 −1 1 0

− (b) RB coding scheme using (m+ i − mi ) 0 0 0 1 1 0 1 1

1 0 0 1

−1 0 0 1

rl − 1 rl − 1 , b∗ = −a ∗ r−1 r−1

> Z

For RBR, r = 2 and α and β are ﬁxed to 1. So Zrb is modiﬁed to Zrb = {z′ € Z: − 1 ≤ z ≤ 1} resulting Zrb = {−1, 0, 1} and Wrb as Wrb = −(2l − 1), 2l − 1 > Z. where l = MSB of the number There are three digits in Zrb , the encoding of these digits (mi ) set needs at least two bits. As, RB representation consists of digit set {1, 0, 1}, these RB digits requires at least two binary bits − + − (m+ i , mi ) for representation of single RB digit. The mi and mi + are referred as positively weighted bit (mi x1) of mi and similarly, − m− i is designated as negatively weighted bit (mi x (−1)) of mi . There are two RB encoding scheme presentation using two binary bits i.e. Tables 1a and b. The encoding scheme for − Table 1a follows as (m+ i − mi ), similarly, the Table 1b follows + − (mi − mi ), respectively. For example, the representation of 1 − using encoding scheme in second row of Table 1a, (m+ i − mi ) = (0 − 1) = (−1) = 1. In other encoding scheme as shown in ﬁrst − = row of Table 1b, 1 is represented as (00) = m+ i − mi 0 − 0 = (0 − 1) = −1 = 1. 2.1.1 Carry free addition in RBR: The main advantage of the RBR is that addition of two RB numbers can be computed in a ﬁxed time independent of the operand size. This is accomplished in two steps, Table 2 Computational rule for generation of intermediate sum and intermediate carry for RB addition Addend digit(Yi)

Digits at prior lower bit position (Xi−1, Yi−1)

Intermediate carry (Ci)

Intermediate sum (Si)

1 1

1 0

1 1

0 1

0 0 1 1 0

1 0 1 1 1

0 0 0 0 0

1 0 0 0 1

1 1

0 1

any value both are Non-negative otherwise any value any value any value both are Non-negative otherwise any value

1 1

1 0

Augend digit (Xi)

Value of mi

− (a) RB coding scheme using (m+ i − mi ) 0 0 0 1 1 0 1 1

J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

system. In GSD number system, the redundancy index (r) is deﬁned as r = a + b + 1 and based on value of ‘r’, the redundant and non-redundant number system are classiﬁed. RB number representation (RBR) [1, 21] is the special case of signed- digit number representation (SDR). Let ‘W’ be the range of number and ‘V’ be the set of representation having interpretation function ‘U’, so for SDR

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

in the ﬁrst step intermediate carry (ci [ {1, 0, 1}) and sum (si [ {1, 0, 1} at position ‘i’ are generated in such a way that xi + yi = si + 2ci (Table 2) where xi and yi are characterised as two digits, respectively. Then in second step, the ﬁnal sum Zi at each position is determined by the addition of intermediate sum Si and intermediate carry Ci−1 from preceding lower position without generating any carry. This shows a fully digit parallel addition can be possible using RB representation independent of word size of the operands. Table 2 shows the computational rule for generation of intermediate sum and intermediate carry for RB addition. An example of addition of two RB numbers is presented in Fig. 1. Let X = 10101001 Y = 11100111 are two RB numbers where X is the 1st number and Y is the 2nd number. For the addition of two number 2rd value of intermediate sum and intermediate carry is as per second row of Table 2 when digits at prior lower bit positions (Xi−1, Yi−1) are both negatives. Hence, the corresponding intermediate sum digit (S2) and intermediate carry digit (C3) are obtained as ‘1’ and ‘0’. However, addition of two number 3rd value of intermediate sum and intermediate carry is as per second row of Table 2 when digits at prior lower bit positions (Xi−1, Yi−1) are both non-negative. Hence, the corresponding intermediate sum digit (S3) and intermediate carry digit (C4) are now obtained as ‘1’ and ‘1’. Note that, the digits values of X and Y for i = 2 and i = 3 are same however the intermediate sum and carry digits are different as per the digits at prior lower bit position. Therefore, addition of two number 3rd value of intermediate sum and intermediate carry matches as per Table 2. 2.2 UT sutra for multiplication operation The UT sutra of Vedic mathematics is considered to be the general method for multiplication. The meaning of UT sutra is ‘Vertically and cross-wise’ [7]. The multiplication result of two numbers ax + b and cx + d i.e. x2 ac + x(ad + bc) + bd can be derived using principle of UT sutra as: † The coefﬁcient of x2 is the vertical multiplication of a and c. † The coefﬁcient of x is the by the cross wise multiplication of a with d and b with c, along with an addition operation of both products. † The self-determining term is the vertical multiplication of b with d. The multiplication operation of higher digits’ terms using UT sutra can also be obtained using sets of reduced vertically and crosswise multiplication. Therefore, primary motto behind utilisation of Vedic arithmetic i.e. ‘complex computation is diminished to less complex computation’ is likewise held. For decimal number system the value of x = 10. The decimal example showing UT process of vertical and cross-wise multiplication as 22 X 14 = {(2 X 1), (2X 4 + 2X 1), (2 X 4)} = {2, 10, 8} = 2, (101 + 0), 8 = (2 + 1), 0, 8 = 308. Mathmatical proof:

For two digits numbers, Let X = ax + b and Y = cx + d, Hence, multiplication of X and Y X ∗ Y = (ax + b) ∗ (cx + d ) = ax(cx + d ) + b(cx + d ) = ax2 + adx + bcx + bd = acx2 + (ad + bc)x + bd 2 Similarly, 2 for three digits numbers, Let X = a1 x + b1 x + c1 and y = a2 x + b2 x + c2 , multiplication of X and Y X ∗ Y = a1 x2 + b1 x + c1 ∗ a2 x2 + b2 x + c2 = a1 x2 a2 x2 + b2 x + c2 + b1 x a2 x2 + b2 x + c2 + c1 a2 x2 + b2 x + c2 = a1 a2 x4 + a1 b2 x3 + a2 b1 x3 + a1 c2 x2 + b1 b2 x2 + a2 c1 x2 + b1 c2 x + b2 c1 x + c1 c2 = x4 a1 a2 + x3 a1 b2 + a2 b1 + x2 a1 c2 + b1 b2 + a2 c1 + x b1 c2 + b2 c1 + c1 c2 Similarly, the multiplication operation using UT algorithm can also be extended to the binary number system. Let A (A3 A2 A1 A0) and B (B3 B2 B1 B0) are two 4 bits’ binary numbers. This 4 × 4 multiplication operation using UT algorithm is reduced to four numbers of 2 × 2 multiplication i.e. M1 (A1 A0 X B1 B0), M2 (A1 A0 X B3 B2), M3 (A3 A2 X B1 B0), M4 (A3 A2 X B3 B2). Fig. 2 shows the generalised structure for 4 × 4 multiplication using UT sutra of Vedic algorithm. The all 2 × 2 multiplication operation are carried out simultaneously improving speed of the operation. The ﬁnal result of the multiplication operation is obtained using three parts i.e. part1 (P1), part2 (P2), part3 (P3). The P1 is the lower two bits of M1 result i.e. S01 S00. The P3 is higher two bits M4 result i.e. S33 S32. The P2 is obtained as the middle part of multiplication operation using a multioperand addition. For 4 × 4 multiplication, The P2 is obtained through the four operands (O4, O3, O2, O1) adders each having bit size of four i.e. O4 (S31 S30 0 0), O3 (S23 S22 S21 S20), O2 (S13 S12 S11 S10), O1(0 0 S03 S02). Note that the bolded index in P2 are appended zeroes. Multiplication operation of A (1001) with B (0111) using UT sutra is shown in Fig. 3. 3

Proposed signed SVM architecture

As discussed in Section 1, the Vedic architecture provides efﬁcient digital arithmetic operations, however; these are only suitable for unsigned numbers. Considering the fact that signed numbers are essential for various applications, in this section, we have proposed a SVM using RB number system. The proposed SVM not only extend the scope of Vedic architecture to signed numbers, it also provides a high-speed multiplier architecture using property of the RB number system. The entire proposed multiplier architecture is subdivided into four stages of operation as (i) Partition of inputs (multiplicand and multiplier) (ii) Generation of 2’s complement representation (iii) Generation of RB partial products (RBPPs) using RB signed digit Booth encoding (RBSDE) and (iv) Adjustment in partial products followed by RB adders. Each stage of the proposed SVM is explained as follows. 3.1 Partition of inputs

Fig. 1 Addition of two RB number using computational rules as per the Table 2

The Section 2.2 describes the multiplication operation using UT sutra for decimal as well as binary numbers. The efﬁciency in time is achieved in UT sutra is due to the fact that, the larger

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

Fig. 2 Structure for 4 × 4 multiplication using UT sutra of Vedic algorithm

operands multiplication is subdivided into smaller operands multiplications and its parallel execution. The ﬁrst stage of proposed SVM also follows the same technique, partitioning inputs into reduced operand size followed by its parallel execution. The stage1 of the proposed SVM partition inputs (A, B) to four parts, i.e. higher part of A (AH), lower part of A (AL), higher part of B (BH) and lower part of B (BL). The conventional UT sutra for multiplication operation is explained in Figs. 2 and 3 (in Section 2.2). For 4 × 4 bits multiplication operation using UT algorithm, the operand is reduced and leaded to four numbers of 2 × 2 bits multiplication i.e. M1 (A1 A0 X B1 B0), M2 (A1 A0 X B3 B2), M3 (A3 A2 X B1 B0), M4 (A3 A2 X B3 B2). The implementation of 4 × 4 bits multiplication is achieved by using four numbers of dedicated 2 × 2 bits multipliers and executing these in parallel. Similarly, for 8 × 8 bits multiplication the architecture is going to use four numbers of 4 × 4 bits multiplication operation each containing four numbers 2 × 2 bits multiplication blocks. The 32 × 32 bits multiplications can also be implemented using four numbers of 16 × 16 bits multiplication operations. Each 16 × 16 bits multiplication utilises four 8 × 8 bits multiplications. Thus, for 32 × 32 bits multiplication architecture using two 2 bits numbers, we do not have to append 30 zeroes in front of the operands. The architecture calculates the valid number of bits. Similarly, the proposed method of multiplication i.e. SVM, can also be designed breaking the operands into smaller modules and executing the algorithm. However, the proposed SVM is intended for RB representation the input operands to the proposed designs are also coded with binary bits of RBR. The only difference to the proposed SVM and conventional UT method of multiplication is that, the proposed SVM eliminates the need of multiplication of reduced bits operands and so its corresponding addition operations.

The design of N × N digits proposed SVM requires recursive utilisation of 2 × 2 digits RB PPs generation blocks (RBPPs GBs). Fig. 5a shows the block diagram for proposed 2 × 2 digits RBPPs GB. This 2 digits RBPPGB consists two operations (i) RB digits to two’s complement representation (ii) RBSDE for the generation of RB PPs. The block diagram for 4 digits proposed SVM using four basic 2 digits RBPPs GB is shown in Fig. 5b. The RB adders generated RB PPs from each two digits RBPPs GBs are added using RB adders. 3.2 Dedicated RB to two’s complement converter The reduced operands from stage 1 are still in RBR. As discussed in Section 1, a fast multiplier can be designed using a RB signed digit encoding (RBSDE). This encoding technique generates PPs in RB forms without using any hard multipliers. However, the implementation of this RBSDE is suitable for conventional binary representation. Thus, this section (Section 3.2) explains a dedicated two digits RB to two’s complement conversion circuit considering input carry as zero. Note that, the input carries are always zero as partitioned inputs are independent to each other’s. Again, the reason for two digits RB to two’s complement circuit is that it is the basic hardware element for the proposed SVM. The general architecture for proposed multipliers of any digits’ size is ultimately structured to two digits’ multipliers requiring two digits RB to two’s complement operation. Table 3 shows the dedicated two digits RB to 2’s complement converter. For conversion of RB digit pair ‘01’ (ri+1 = 0, ri = 1) in second row of Table 3, to its two’s complement representation is processed as follows: The digit ‘0’ is encoded as Xi+1 = 0, Yi+1 = 0, similarly, for digit 1, the encoded bits are Xi = 0, Yi = 1. The two’s complemented output (C0 , bi+1 , bi ) as per circuit diagram shown

Fig. 3 Multiplication example of A (1001) and B (0111) using UT sutra of Vedic algorithm J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

Table 3 Two digits RB to 2’s complement converter and RB partial product generation Two digits RB

RB encoding bits of ri+1

RB encoding bits of ri

Two’s complement representation

ri+1

ri

Xi+1

Yi+1

Xi

Yi

Co

bi+1

bi

0 0 0 1 1 1 1 1 1

0 1 1 0 1 1 0 1 1

0 0 0 0 0 0 1 1 1

0 0 0 1 1 1 0 0 0

0 0 1 0 0 1 0 0 1

0 1 0 0 1 0 0 1 0

0 1 0 1 1 1 0 0 0

0 1 0 1 0 1 1 0 1

0 1 1 0 1 1 0 1 1

Fig. 4 Circuit diagram for dedicated two digits RB to 2’s complement converter used in proposed SVM

Fig. 5 Block diagram a Proposed 2 digits RBPP GB b Proposed 4 digits SVM using four basic RBPPGBs This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

Table 4 Generation of RB partial product from two’s complemented multiplier and multiplicand Multiplier (mtper)

Value

RB partial product

Co 0 1 0 1 1

bi+1 0 1 0 1 0

bi 0 1 1 0 1

— 0 −1 +1 −2 −3

Pos(+) multiple 0000 mtpnd(2) & ‘000’ ‘0’ & mtpnd mtpnd(2)& ‘000’ mtpnd(2)& mtpnd

1 0

1 1

1 0

−1 +2

0 0

0 1

1 1

+1 +3

mtpnd(2)& ‘000 ‘0’& mtpnd (1 down to 0) & ’0’ ‘0’ & mtpnd Mtpnd (1 down to 0) & ‘00

Neg(−) multiple 0000 ‘0’ & mtpnd; mtpnd(2)& ‘000’ mtpnd & ‘0’ Mtpnd (1 down to 0) & ‘00’ ‘0’ & mtpnd Mtpnd (2) & ‘000 Mtpnd (2) & ‘000 mtpnd(2)& mtpnd

RBSDE for PMs. Similarly, for NMs, MSB of multiplicand is assigned to most signiﬁcant negative (m+). The advantages of RBR are well explained in Section 2. The carry free addition of PPs in a multiplication operation is possible, where PPs are need to be in RB form. We have used RBSDE reported in [6] for the generation of PPs of the proposed SVM in RB forms. Hence, overall this section has two main advantages (i) eliminate need of multiplication of reduced bits’ operand (ii) offer a choice to use RB adder providing PPs in RB forms. Table 4 shows the generation of RB partial product from two’s complemented multiplier (mtper) and multiplicand (mtpnd). Example 3.1 and 3.2 show the generation of RB PPs from respective mtper and mtpnd. Example 3.1, multiplication of two numbers, A1 = −2 (10) with B1 = −3 (11)As per Table 3, the two’s complement representation of A1 and B1 are obtained as ‘110’ and ‘101’ resulting mtpnd = 110 and mtper = 101. Hence, as per Table 4 Pos( + ) multiple Neg(−) multiple RB result

in Fig. 4 are obtained. bi = xi OR yi = ‘0’OR ‘1’ = 1 bi+1 = yi XOR (xi+1 OR yi+1 ) = ‘1’XOR (‘0’ OR ‘0’) = 1′ XOR ‘0’ = 1 c0 = (NOT (xi+1 ) )AND (yi OR yi+1 ) = NOT (‘0’)AND(‘0’ OR ‘1’) = ‘1’ AND ‘1’ = 1 Hence, for RB digit pair 01, the corresponding two’s complement output is obtained as ‘111’. In Table 3, ri+1 and ri are two digits of the RB inputs. The term weighted bit is described as per Tables 1a and b. Each RB digit is − + encoded using two binary bits (m+ i , mi ). The mi is referred as − positively weighted bit of mi and mi is designated as negatively weighted bit of mi. Xi+1 and Yi+1 are positively and negatively weighted encoding bits of ri+1. Similarly, Xi and Yi are positively and negatively weighted encoding bits of digit ri. The converted two’s complemented three bits from RB digit pair (ri+1 ri) are C0, bi+1, bi where C0 represents the sign bit. Note that, there is no effect of carry in or carry out in conversion process as all digits’ pairs (ri+1 ri) of any inputs are operated individually and in parallel manner. Fig. 4 shows the circuit diagram for dedicated two digits RB to 2’s complement converter used in proposed SVM. The stage2 of the proposed SVM proceeds outputs of stage1 i.e. AH, AL, BH and BL and obtain two’s complemented outputs AHT, ALT, BHT, BLT, respectively. 3.3 Generation of RB partial products using RB signed digit Booth encoder In Section 3.2, the basic hardware element of the proposed SVM i.e. two digits RB to three bits’ two’s complement representation is explained. In this section the generated two’s complemented form is fed as input to RB signed digit Booth encoding. The encoding of modiﬁed Booth’s algorithm is same as the RBSDE. However, the hard multiple is more convenient to obtain in case of RBSDE. A positive multiple (PM), not having a hard multiple is obtained by assigning positive bits (m+) of RBSDE PP of the multiple and negative bits (m−) as logic zero. Similarly, a negative multiple (NM), not having a hard multiple can be obtained by assigning negative bits (m−) for RBSDE PP of the multiple and positive bits (m+) to logic zero. In case of hard multiples, proper multiple is assigned to both positive (m+) and negative (m−) parts of PPs. Note that, in the proposed SVM, the inputs to the RBSDE is always in two’s complement form. Thus, the MSB of multiplicand is connected to most signiﬁcant negative (m−) bit of J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

mtpnd(2)& mtpnd mtpnd(1 downto 0) & ‘00’

1110 1000 0110 = 6

Similarly for example 3.2, multiplication of mtpnd = −2 (10) with mtper = 2 (10) As per Table 3, the two’s complement representation of mtpnd and mtper are obtained as ‘110’ and ‘010’, respectively. Hence, as per Table 4 Pos( + ) multiple Neg(−) multiple RB result

‘0’& mtpnd (1 down to 0) & ‘0’ mtpnd(2) & ‘000

0100 1000 1100 = (−4)

3.4 Adjustment in partial products followed by RB adders The output of RB signed digit Booth encoding are in redundant representation. For obtaining the ﬁnal multiplication result, the generated least signiﬁcant RBPP (RB PP1) and most signiﬁcant RBPP (RBPP4) are appended with zeroes similar to multiplication operation as explained in Section 2.2. Then the obtained RB PPs (addend1, addend2, addend3, addend4) are added in constant time as explained in Section 2.1. Example 3.3 is the multiplication operation of A (10 1 1 ) = (5)10 and B (10 10) = (−10)10 using proposed method of multiplication. All the stages (stage1, stage2, stage3 and stage4) are carried out as discussed in Sections 3.1, 3.2, 3.3 and 3.4, respectively. Fig. 6 shows multiplication operation of example 3.3, containing all four stages for the proposed SVM. Final multiplication result (r) is obtained as 0 1 1 100 10 = 0X 27 + 1X 26 + 1X 25 + 1X 24 + 0X 23 + 0X 22 + 1X 21 + 0X 20 = 0 − 64 + 32 − 16 + 0 + 0 − 2 + 0 = −50, which is correct result for multiplication of 5 and −10. 4

Result and discussion

The design of Proposed SVM using in UT algorithm of Vedic mathematics consists of design entry, synthesis, simulation and implementation in Virtex-4: vlx15sf363-12 (V4), Spartan3e: xc3s500efg320 (S3) and Virtex 2p:xc2vp2:-7(V2) FPGA devices. The VHDL code [22] for the proposed design is written in the design entry stage. Then the VHDL code is checked for error. When there is no error, then the code is synthesised using the Xilinx14.4 software synthesis tool. For simulation of the proposed SVM architecture, a test bench ﬁle is also simulated using Xilinx 14.4 ISE simulation tool. FPGAs are basically semiconductor integrated circuits having arrays of conﬁgurable logic blocks

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

Fig. 6 Example showing all four stages of multiplication using proposed SVM

linked via programmable interconnections. The level of functionality and cost of the particular FPGA device relies on the process technology used for manufacturing of the device. For example, the FPGA V2 device uses 0.15 µm/0.12 µm CMOS eight-layer metal process technology [23], while the FPGA V4 device uses 90 nm copper CMOS process technology [24]. The V4 FPGA hard-IP core blocks includes the PowerPC architecture while it is IP-Immersion architecture for V2 FPGA device. The V4 family of FPGA supports 18 × 18, two’s complement, signed multiplier. Distributed arithmetic (DA) have a signiﬁcant role in implementing DSP functions in the Xilinx FPGA devices [24]. DA, along with Modulo arithmetic, are computation algorithms that perform multiplication with look-up table based schemes. DA mainly implement the sum of products computation that includes DSP ﬁltering and frequency transforming functions. The multiplier

LogiCORE™ of FPGA can generate parallel multipliers and constant coefﬁcient multipliers (CCMs), both with differing implementation styles. The CCMs reduce FPGA resource by using constant-speciﬁc optimisation techniques. The different device families are utilised for the comparison of proposed SVM, with other previously reported designs considering same process technology and working environment. Table 5 contains the comparison data for the Proposed SVM in V4 device of FPGA family. This table compares proposed SVM with various state-of-the-art multiplier and square architectures, i.e. Booth multiplier, square reported in [8] and square reported in [9]. Note that, among these architectures, proposed SVM and

Table 5 Comparison table for the proposed SVM in V4 device of FPGA family

16 bit multiplier

Signed/ unsigned multiplier

Combinational delay(ns)

Percentage of improvement, %

array multiplier Wallace tree multiplier Booth multiplier Vedic multiplier reported in [16] Vedic multiplier using compressor adders reported in [15] proposed SVM

unsigned signed signed unsigned

43.946 46.046 37.041 37.50

31.49 30.5 13.5 14.6

unsigned

32

6.25

signed

30.16

—

Device: Virtex 4 vlx15sf363-12 operand size

4-bit 8-bit 16-bit 32-bit

Booth multiplier (in ns)

Vedic squaring unit reported in [8] (in ns)

Squaring unit reported in [9] (in ns)

Proposed SVM

8.154 15.718 36.657 74.432

4.993 14.256 33.391 68.125

4.993 8.948 18.564 37.345

10.43 15.22 19.58 22.32

Table 6 Comparison table for the proposed SVM in S3e device of FPGA family

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

Table 7 Comparison table for the proposed SVM in V2 device of FPGA family 8 × 8 multiplier

16 × 16 multiplier

15.815 15.680 16.20

36.071 23.064 20.96

booth multiplier vedic multiplier reported in [16] proposed SVM

Table 8 Comparison table of resource utilisation (four input LUTs) for the proposed SVM with booth multiplier Device: Virtex 4 vlx15sf363-12 operand size booth multiplier proposed SVM

4-bit, %

8-bit, %

16-bit, %

32-bit, %

0.2 1

1.5 4.0

7.2 15

23 62

Booth multiplier are only useful for a multiplication operation of signed numbers. Considering 32 × 32 multiplier architectures proposed SVM have almost 70, 67 and 40.2% of speed improvement as compared with Booth multiplier, Square unit reported in [8] and Square unit reported in [9], respectively. Similarly, Table 6 shows delay comparison of the proposed SVM in S3e device of FPGA family with array multiplier, Wallace tree multiplier, Booth multiplier, Multiplier reported in [15] and Multiplier reported in [16]. The 16 × 16 proposed SVM multiplier is proofed to be efﬁcient in terms of speed. Considering the same device platform, 16 × 16 proposed SVM have almost 31.4, 30.5, 13.5, 14.6 and 6.25% of speed improvement as compared with array multiplier,

Fig. 7 Delay comparison of Proposed SVM with various state-of-the-art multiplier and square a In V4 FPGA family b For 16-bit multiplier in S3e FPGA family c For 8-bit and 16-bit multiplier in V2 FPGA family J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

Wallace tree multiplier, Booth multiplier, Vedic Multiplier reported in [16] and Vedic Multiplier reported in [15], respectively. Table 7 shows the delay comparison of proposed SVM 8-bits and 16-bits multipliers in V2 FPGA family with Booth multiplier and Vedic multiplier reported in [16]. Note that, the Proposed SVM and Booth multiplier are only suitable for signed numbers with almost same delay parameter for 8 × 8-multiplier architecture. However, the proposed SVM achieves signiﬁcant improvement in delay for higher operand size i.e. 41.8% of improvement in delay for 16-bits operand size as compared with Booth multiplier. This is achieved due to the fact that, the RB adder is more effective for larger operand size. Table 8 shows the penalty in resource utilisation (four input LUTs) for the proposed SVM as compared with Booth multiplier. As shown in comparison results Tables 5 and 7, the hardware supported by RCA for smaller sizes operands is faster than the addition/mapping technique presented in Table 2. This is because of presences of specialised carry chain resources in FPGA devices [23, 24]. However, a RB adder is also preferred due to its constant time addition independent of operand size. We have utilised the property of RB representation, providing signiﬁcant reduction in propagation delay for proposed SVM architecture for higher order operands. Thus, the proposed designs can also be suitable for efﬁcient implementation in ASIC platform in absence of specialised carry chain resources. Fig. 7 shows the delay comparison of proposed SVM with various state-of-the-art multiplier and square architectures. The histogram representation of the delay parameter in V4 FPGA family is shown in Fig. 7a. Similarly, Figs. 7b and c displays

Fig. 8 Architecture level structure of proposed 4digits SVM

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

delay parameter for various multiplier architectures in S3e and V2 FPGA family, respectively. The synthesis result shows the proposed SVM has high-speed performance than booth multiplier for operand size more or equal to 8 bits in the FPGA V4 device. The proposed SVM is more efﬁcient in term of delay as compared with [8, 9] for operand size more or equal to 16 bits and 32 bits, respectively. Note that, for the comparison Fig. 7, only Proposed SVM, Booth multiplier and Wallace tree multiplier supports multiplication of two signed numbers. Considering same working environment i.e. FPGA S3e device, proposed SVM has speed improvement of 31.49 and 30.5% as compared with Booth multiplier and Wallace tree multiplier, respectively. Fig. 8 shows the architecture level structure of proposed 4 × 4 digits SVM. 5

Conclusion

In this paper, a new method of signed digit multiplication is presented. The proposed design is based on UT sutra Vedic multiplication. First, the scope of Vedic mathematics is extended to signed numbers using the concept of RB number system. Again, the property of carry free addition of RB representation is efﬁciently used for solving issue related to the carry propagation in UT Vedic multiplier. The proposed design is found to have highspeed performance as compared with various state-of-the-art conventional as well as Vedic multiplier. 6

Acknowledgments

All the simulations and implementation are carried out in Laboratory at VSS University of Technology, India. Authors also gratefully acknowledge the University Grants Commission, Government of India (UGC), New Delhi. 7

References

[1] Parhami B.: ‘Computer arithmetic and hardware designs’ (Oxford University Press, 2000) [2] Booth A.D.: ‘A signed binary multiplication technique’, Q. J. Mech. Appl. Math., 1951, 4, (2), pp. 236–240 [3] Wallace C.S.: ‘A suggestion for a fast multiplier’, IEEE Trans. Electron. Comput., 1964, (1), pp. 14–17 [4] Hu J., Wang L., Xu T.: ‘A low-power adiabatic multiplier based on modiﬁed Booth algorithm’. 2007 Int. Symp. on Integrated Circuits IEEE, 2007, pp. 489–492 [5] Kuang S.-R., Wang J.-P., Guo C.-Y.: ‘Modiﬁed booth multipliers with a regular partial product array’, IEEE Trans. Circuits Syst. II Express Briefs, 2009, 56, (5), pp. 404–408

[6] Besli N., Deshmukh R.G.: ‘A novel redundant binary signed-digit (RBSD) Booth’s encoding’. SoutheastCon, 2002. Proc. IEEE IEEE, 2002, pp. 426–431 [7] Tirtha S.B.K., Agrawala V.S., Agrawala V.S.: ‘Vedic mathematics’ (Motilal Banarsidass Publ., 1992) [8] Kasliwal P.S., Patil B.P., Gautam D.K.: ‘Performance evaluation of squaring operation by Vedic mathematics’, IETE J. Res., 2011, 57, (1), pp. 39–41 [9] Sethi K., Panda R.: ‘Multiplier less high-speed squaring circuit for binary numbers’, Int. J. Electron., 2015, 102, (3), pp. 433–443 [10] Pradhan M., Panda R.: ‘High speed multiplier using Nikhilam Sutra algorithm of Vedic mathematics’, Int. J. Electron., 2014, 101, (3), pp. 300–307 [11] Saha P., Banerjee A., Dandapat A., ET AL.: ‘ASIC design of a high speed low power circuit for factorial calculation using ancient Vedic mathematics’, Microelectronics J., 2011, 42, (12), pp. 1343–1352 [12] Mehta P., Gawali D.: ‘Conventional versus Vedic mathematical method for hardware implementation of a multiplier’. Advances in Computing, Control, & Telecommunication Technologies, 2009. ACT’09. Int. Conf. on IEEE, 2009, pp. 640–642 [13] Barik R.K., Pradhan M.: ‘Area-time efﬁcient square architecture’, AMSE J., Adv. D, 2015, 20, (1), pp. 21–34 [14] Barik R.K., Pradhan M.: ‘Efﬁcient ASIC and FPGA implementation of cube architecture’, IET Comput. Digit. Tech., 2017, 11, (1), pp. 43–49 [15] Bansal Y., Madhu C.: ‘A novel high-speed approach for 16 × 16 Vedic multiplication with compressor adders’, Comput. Electr. Eng., 2016, 49, pp. 39–49 [16] Pushpangadan R., Sukumaran V., Innocent R., ET AL.: ‘High speed vedic multiplier for digital signal processors’, IETE J. Res., 2009, 55, (6), pp. 282–286 [17] Ramalatha M., Dayalan K.D., Dharani P., ET AL.: ‘High speed energy efﬁcient ALU design using vedic multiplication techniques’. Advances in Computational Tools for Engineering Applications, 2009. ACTEA’09. Int. Conf. on IEEE, 2009, pp. 600–603 [18] Saha P., Banerjee A., Bhattacharyya P., ET AL.: ‘High speed ASIC design of complex multiplier using Vedic mathematics’. ‘Students’ Technology Symp. (TechSym), 2011 IEEE IEEE, 2011, pp. 237–241 [19] Kodali R.K., Boppana L., Yenamachintala S.S.: ‘FPGA implementation of vedic ﬂoating point multiplier’. Signal Processing, Informatics, Communication and Energy Systems (SPICES), 2015 IEEE Int. Conf. on IEEE, 2015, pp. 1–4 [20] Kandasamy W.B.V., Smarandache F.: ‘Vedic Mathematics, ‘Vedic’ or ‘Mathematics’: A Fuzzy & Neutrosophic Analysis: A Fuzzy and Neutrosophic Analysis’ (Inﬁnite Study, 2006) [21] Avizienis A.: ‘Signed-digit number representations for fast parallel arithmetic’, IRE Trans. Electron. Comput., 1961, (3), pp. 389–400 [22] Pedroni V.A.: ‘Circuit design with VHDL’ (MIT press, 2004) [23] Xilinx V.-I.: ‘Platform FPGA User Guide’UG002 (V2. 1) 2007, 28, pp. 1–502 [24] UG070 U.G.: ‘Virtex-4 FPGA user guide’ (Xilinx Inc, 2008)

This is an open access article published by the IET under the Creative Commons Attribution -NonCommercial License (http://creativecommons.org/licenses/by-nc/ 3.0/)

J Eng, 2017, Vol. 2017, Iss. 3, pp. 60–68 doi: 10.1049/joe.2016.0376

Time efficient signed Vedic multiplier using ... - IET Digital Library [PDF]

Recommend Stories

Idea Transcript

Helpful Links

Smile Life

Get in touch