an efficient hardware implementation of pipelined fft architecture using [PDF]

evaluation of DFT. It reduces the computations by taking advantage of the fact that the calculation of the coefficients

3 downloads 13 Views 265KB Size

Recommend Stories


An Implementation of Pipelined Radix-4 FFT Architecture on FPGAs
We must be willing to let go of the life we have planned, so as to have the life that is waiting for

Low Power R4SDC Pipelined FFT Processor Architecture
I cannot do all the good that the world needs, but the world needs all the good that I can do. Jana

Efficient Instruction Scheduling for a Pipelined Architecture
Do not seek to follow in the footsteps of the wise. Seek what they sought. Matsuo Basho

Design of Efficient Pipelined Radix-2 Single Path Delay Feedback FFT
Open your mouth only if what you are going to say is more beautiful than the silience. BUDDHA

An Efficient Hardware Implementation of Target Recognition Algorithms and Investigation of
We may have all come on different ships, but we're in the same boat now. M.L.King

Subtractor and Floating Point Multiplier for FFT Architecture Using
Live as if you were to die tomorrow. Learn as if you were to live forever. Mahatma Gandhi

Efficient Hardware Realization of Truncated Multipliers using FPGA
The best time to plant a tree was 20 years ago. The second best time is now. Chinese Proverb

Design and Hardware Implementation of Graphical Display Systems using PIC24FJ256DA210
In the end only three things matter: how much you loved, how gently you lived, and how gracefully you

Hardware Implementation of Transmission Line and Its Study Using MATLAB
Almost everything will work again if you unplug it for a few minutes, including you. Anne Lamott

FPGA Implementation of Point Processing Operation using Hardware Simulation
Don't be satisfied with stories, how things have gone with others. Unfold your own myth. Rumi

Idea Transcript


Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015

JJT-017-2015

AN EFFICIENT HARDWARE IMPLEMENTATION OF PIPELINED FFT ARCHITECTURE USING RECONFIGURABLE CONSTANT MULTIPLIER FOR RADIX – 4/8 STRUCTURES N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 1 PG Scholar , Associate Professor2,3 1,2,3 Karpaga Vinayaga College of Engineering and Technology,GST Road, Chinna kolambakkam , Maduranthagam (T.K.,), Kancheepuram-603308 1 Email ID:[email protected] 2 Email ID:[email protected]

Abstract Discrete Fourier transform (DFT) is a very important technique in modern digital signal processing (DSP) and telecommunications. The Fast Fourier transform is a highly efficient procedure for computing the DFT of a finite series and requires less number of computations than that of direct evaluation of DFT. In this work the efficient implementation of a pipeline FFT processor is presented. . FFT reduces the computations by taking advantage of the fact that the calculation of the coefficients of the DFT can be carried out iteratively. Due to this, FFT computation technique is used in digital spectral analysis, filter simulation, autocorrelation and pattern recognition. The proposed architecture design adopts a single path delay feedback (SDF) and this SDF pipeline requires less memory space and its multiplication utilization is less. Such implementations are very much use full in OFDM architecture advantageous to low power design. Keywords: SDF, DFT, ROM less Structure. Introduction The Fast Fourier transform is a highly efficient procedure for computing the DFT of a finite series and requires less number of computations than that of direct evaluation of DFT. It reduces the computations by taking advantage of the fact that the calculation of the coefficients

of the DFT can be carried out iteratively. Due to this, FFT computation technique is used in digital spectral analysis, filter simulation, autocorrelation and pattern recognition. Real-time or fast execution is an important criterion for many applications. For example, wide- band Orthogonal Frequency Division

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

217

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 Modulation (OFDM) systems, biomedical instrumentation and radar . Memory based architecture is commonly used to design FFT processor because of its low hardware cost and low power consumption. But it suffer problem of long latency and long throughput. The main contribution this work is demonstration of low power memory less pipeline FFT processor. FFT reduces the computation time required to compute a discrete fourier transform and improves the performance by a factor 100 or more over direct evaluation of the DFT The Discrete Fourier Transform is a procedure, or process, that can analyze the data points of a “digitized” time domain function to determine a series of sinusoids which, when summed together, reproduce the data points of the original function. Mathematically denoted as X(k) = , Where

0≤k≤N- 1

called as twiddle factor

The Fast Fourier transform reduces the number of complex multiplications required to perform DFT from N2 to N/2log2 N. In other words , for N=1024, this implies about 5000 instead of 106 multiplications- a reduction factor of 200. FFT Architecture The basic computation block in the diagram is the “butterfly” in which two inputs are combined to give two outputs.

Xm(p) P Xm+1(P)=Xm(P)+ WNK Xm(q)

q WNK Xm(q) Xm+1(P)=Xm(P)- WNK Xm(q) Figure 1. Flow graph of Basic Butterfly diagram for DIT Algorithm If we use m to represent the stage and p and q t represent nodes in the stage, each butterfly can be represented as shown in fig where the power k of WN is variable and depends on the position of the butterfly. The nodes p and q represent memory locations. At the input nodes p and q the input values Xm(p) and Xm(q) are stored. After the outputs Xm+1(p) and Xm+1(q) are calculated, the same memory location is used to store the new values in place of the input values. An algorithm that uses the same Locations to store both input and output sequences is called an inplace algorithm. The FFT algorithm reduces the number of computations. The total number of complex multiplications required for calculating DIT-FFT = N/2 log2 N and the total number of complex additions for evaluating a DFT using DITFFT is N log2 N. Before the development of FFT architecture it was difficult to process the data for digital signal processing circuits. The different architecture were developed to improve the performance of DFT in

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

218

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 terms of FFT . But each FFT architecture has a drawback that it need More ROM to store the twiddle .For each cycle of operation the number of hardware components increased and the power consumption problem arises and Mainly the FFT architecture of olden days does not follows Pipelined mechanism and the radix level power is very High, so this kind of circuits take more latency and produce the throughput after the long time. The FFT architecture consists of sequential processor, pipelined processor and parallel interactive processor. The basic sequential processor consists of a processing element (PE) that can compute a butterfly. The same memory can be used to store the data, intermediate results and the twiddle factors. The amount of hardware involved is very small and it takes (N/2)log2N sequential operations to compute the FFT. To improve the performance of the sequential processor, parallelism can be introduced by using a separate arithmetic unit for each stage of the FFT. This increases the throughput by a factor of log2 N when the different units are pipelined. This architecture is also known as cascaded FFT architecture and will be used in our proposed design. By adding more processing elements to the processor in each sequential pipeline stage, performance can be improved even further. The butterflies can then be

computed in parallel in any stage. The total execution time for the parallel iterative processor is log2N cycles. Pipeline FFT Architecture consists of a series of computational blocks each composed of delay lines, co efficient storage, commutator, multipliers, and adders. Multi-path Delay Commutator is the most classical approach for pipeline implementation of radix-2 FFT algorithm in that input sequence has been broken into two parallel data stream flowing and forward, with correct distance between the data elements and entering to the butterfly scheduled by proper delays.

Figure 2.R2MDC (N=16) Radix-2 Single-path Delay Feedback uses more registers to storing one output of each butterfly in feedback shift registers. A single data stream goes through the multiplier at every stage. It has same number of butterfly units and multipliers as in R2MDC approach, but with reduced memory requirement and N - 1 registers. Its data throughput is (1/x) times that of the corresponding RxMDC architecture.

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

219

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 Drawbacks: i. ii. iii. iv.

More ROM utilized to store each twiddle factor. More Latency and delayed Throughput. No pipelined architecture which increase the number of components. No reconfigurable memory.

Figure 4.Proposed radix-4 64-point pipeline FFT/IFFT processor Figure 3.R2SDF (N=16) Pipelined FFT Low Power Processor A reconfigurable complex constant multiplier is used to eliminate the twiddle factor ROM. This component plays an important role in reducing area and hardware cost. The architecture consists of processing elements which is divided into three stages and each stage is responsible to perform 8 point radix -2 FFT algorithms. a) Processing Elements The proposed architecture is composed of three different types of processing elements (PEs), a complex constant multiplier and delay-line (DL) buffers and using single path delay feedback pipeline architecture.

PE3 stage is used to implement a simple radix-2 butterfly structure and also serves as the sub modules of the PE2 and PE1 stages.

Figure 5. Proposed PE3 structure As for the PE2 stage, it is required to compute the multiplication by –j or 1. Note that the multiplication by-1 in Figure 6 is practically to take the 2’s complement of its input value.

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

220

1

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 Figure8.Reconfigurable constant multiplier

Figure 6.Proposed PE2 structure The proposed Architecture works based on two widely used algorithms .It works on radix -8 which is sub classified as two radix-4 or four radix-2 structure. Here the pipelined architecture is proposed in order to reduce the delay and reconfiguration multipliers are used to reduce the ROM usage .

Figure 7.Proposed PE1 structure

complex

Figure 9.Butterfly for Radix8 Algorithm The multiplication by can employ a bit parallel multiplier to replace the word length multiplier and square root evaluation for chip area reduction The realization of complex multiplication by using a radix-2 butterfly structure with its both outputs commonly multiplied by .Based on this reconfigurable low-complexity complex constant multiplier for computing is designed. This structure of this complex multiplier also adopts a cascaded scheme to achieve low-cost hardware. Meaning of two input signal (Iin and Iout ) and two output signals(Qint and Qout) are the same as the signals in the PE1 stage a). Radix -4/8 Algorithms The proposed system is designed based on radix 8 TheRadix-22 algorithm has the same multiplicative complexity of Radix-4 algorithms but has a signal flow graph similar to the Radix-2 algorithm.

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

221

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 It is mathematically expressed as shown Grouping 4 samples together gives the Radix-4 Algorithm.

.The reduction in area can be found by calculating the no of gates or slices used by the top level module, the summary of synthesis report is shown in Fig 11.

Radix-8 Algorithm is given by Grouping 8 samples together gives the Radix-8 Algorithm. Mathematically, It is expressed as shown in Figure 10.Simulation result of Top Level Design

Simulation Results The proposed FFT architecture uses three stages of processing elements .The entire structure is simulated in Model simulator which creates the way to run top level entity or modules .Here the radix-8 structure is divided in to sub modules and each module is simulated and every program work together as top level modeling .And the verilog Language is used since it has industry standards .The simulation result for top level module is shown in Fig 9. The tool Model simulator is used here to run the program in simulator mode. So the Xilinx ISE 11.1 is used to analyze the power level and area

The XPower Analyser tool is used to measure the amount of power consumed during the opertions .Here its used to calcultae both the static and dynamic power . The power level can be increased or modified based on the swithching activities of CMOS in hardware .So its necessary to chose the good SDK tool to implement this .The screen shots of power simulation is shown below.

Figure 11.Synthesis report of Top Level Design

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

222

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 The Register transfer level is to create high-level representations of a circuit, from which lower-level representations and ultimately actual wiring can be derived, This RTL is used to identify the Components involved or present in the processing elements. An RTL description is usually converted to a gate-level description of the circuit by a logic synthesis tool. The synthesis results are then used by placement and routing tools to create a physical layout. Logic simulation tools may use a design's RTL description to verify its correctness .The RTL view of top level module is shown in Fig 12.

Scope of the Study By using an FFT processor with less ROM Structure its observed that the area is reduces and also in- crease the speed in terms of reducing the processing speed. The proposed architecture is expected to be provided low complexity, high speed and also can reduce the area. In future this low complexity FFT processor will be used to implement Radix-16, Radix-32 FFT. Conclusion By using the reconfigurable techniques for constant multiplier the complexity to perform complex conjugate multiplication in coefficients of FFT can be reduced and there by consumable amount of memory usage to store the data is reduced. In future the same reconfigurable techniques can be implement at the processing blocks or computational blocks of FFT Architecture to reduce latency and increase throughput.

Figure 12. RTL view of Top level Design References [1] Ahmad Salehi, Rasoul Amirfattahi, and Keshab K. Parhi (2013) ‘Pipelined Architectures for Real-Valued FFT and Hermitian-Symmetric IFFT With Real Data paths’ IEEE transactions on circuits and systems.

Figure 13 . Power Consumption of Top level Design

[2]Earl E. Swartzlander and SungwookYu (2010) ‘A NewPipelined Implementation of the Fast Fourier Transform’IEEE transactions on circuits and systems.

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

223

Journal of Innovative Research and Solutions (JIRAS) A unit of UIIRS Print ISSN: 2320 1932 / Online ISSN – 2348 3636 Volume No.1, Issue No.1, Page No: 217 - 224 JULY- 2015 [3]Hanho Lee and Minhyeok Shin(2010) ‘A High-Speed Low-Complexity TwoParallel Radix-2FFT/IFFT Processor for UWB Applications’IEEE Asian Solid-State Circuits Conference [4]A. T. Erdogan ,M. Hasanand Wei Han T. Arslan(2008) ‘A novel low power pipelined fft based on sub expression sharing for wireless lan apllications’ IEEE Transactions On Very Large Scale Integration (Vlsi) System. [5]Hanho Lee and TaesangCho(2009) ‘A High-Speed Low-Complexity Modified FFT Processor for High Rate WPAN Applications’ IEEE Asian Solid-State Circuits Conference [6]T. Arslan, M. Hasanand J.S. Thompson(2008)‘A Novel Coefficient Ordering based Low Power Pipelined Radix-2/4 FFT Processor for Wireless LAN Applications’ IEEE Transactions on Consumer Electronics, Vol. 49,No. 1 [7]Jaeseok Kim and Yunho Jung, Hongil Yoon(2005) ‘New efficient FFT algorithm and Pipeline Implementation Results for OFDM/DMT Application’ IEEE International Transactions on Consumer Electronics,Vol. 49,No. 1 [8]Chein-Wei Jen and Wen-Chang Yeh(2005) ‘High-Speed and Low-Power Split-Radix FFT’ IEEE transactions on signal processing, Vol.51,.

Authors: N. Dakshinavarthini1, A. Manimaran2, J.R. Dinesh kumar3 Period from (Jan –Jun 2015)

224

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.