ACCELERATING THE CANNY EDGE DETECTION [PDF]

The interpolation in the Non-Maximum Suppression step is used to improve the quality of edge detection in the image. GPU

6 downloads 6 Views 324KB Size

Recommend Stories


contactless palm vein recognition using a canny edge detection algorithm
Those who bring sunshine to the lives of others cannot keep it from themselves. J. M. Barrie

Deteksi Tepi Canny Edge Detection Method Deteksi Tepi
Nothing in nature is unbeautiful. Alfred, Lord Tennyson

HARDWARE REALIZATION OF CANNY EDGE DETECTION ALGORITHM FOR UNDERWATER
We can't help everyone, but everyone can help someone. Ronald Reagan

Improved Entropic Edge-Detection
I cannot do all the good that the world needs, but the world needs all the good that I can do. Jana

(Edge Detection) Deteksi tepi
Knock, And He'll open the door. Vanish, And He'll make you shine like the sun. Fall, And He'll raise

GaN Canny
Stop acting so small. You are the universe in ecstatic motion. Rumi

Gradient Based Image Edge Detection
Learning never exhausts the mind. Leonardo da Vinci

[PDF] The Hedge Fund Edge
It always seems impossible until it is done. Nelson Mandela

Statistical Edge Detection: Learning and Evaluating Edge Cues
Life is not meant to be easy, my child; but take courage: it can be delightful. George Bernard Shaw

Idea Transcript


ACCELERATING THE CANNY EDGE DETECTION ALGORITHM WITH CUDA/GPU Bárbaro Maykel López-Portilla Vigil Telecommunications and Electronics Department, University of Pinar del Río, Cuba. [email protected]

ABSTRACT This paper presents an efficient implementation of the Canny edge detection algorithm on GPU using CUDA. Otsu algorithm for automatic calculation of the low and high thresholds of the Canny edge detection algorithm is employed. The interpolation in the Non-Maximum Suppression step is used to improve the quality of edge detection in the image. GPU implementation is compared with two CPU implementations (OpenCV & C ++ and Matlab). These tests are performed with multiple images of different sizes. Experimental results show that this GPU implementation achieves a better speedup over the other CPU implementations. This speedup value increases as increase the size of the image.

KEYWORDS: Algorithm, Canny, CUDA, detection, edge, GPU, image. I. INTRODUCCIÓN Recently, performances of programmable Graphics Processing Units have rapidly developed. GPU has possessed powerful parallel computational capability. At present, Flops of advanced single-chip GPU has reached 1Tflops/s and the memory bandwidth is up to 141GB/s which have exceeded that of mainstream CPU more than 10 times, in other words, the advanced GPU can be equal to a small computer cluster [1]. In addition, modern GPUs are equipped to support high-level languages. This significantly increases user programmability and facilitates use of GPUs for general purposes, also known as general-purpose computation on GPU (GPGPU) [2], [3]. Most of image processing and computer vision tasks perform the same computation on a number of pixels, typical data parallel operations. Thus, they can exploit the single instruction multiple data (SIMD) architecture and be effectively parallelized on GPU. Edge detection is the first step in many computer vision algorithms. It is used to identify sharp discontinuities in an image, such as changes in luminosity or in the intensity due to changes in scene structure. Edge detection has been researched extensively. A lot of edge detector algorithms have been proposed, such as Robert, Prewitt, Kirsch, Sobel,

Gauss-Laplace and Canny. Among all the above algorithms, Canny algorithm is the most widely used due to its good performance and its ability to extract optimally edges even in images that are contaminated by gaussian noise. Canny algorithm has the ability to achieve a low error-rate by eliminating almost all non-edges and improving the localization of all identified edges [1], [4]. In this paper, GPU is utilized to implement parallel processing of Canny edge detection algorithm with the aim of achieving a remarkable speedup and good accuracy, compared with sequential method based on CPU. rest of this paper is organized as follows: Section II provides a brief introduction on the Canny edge detecting algorithm. The proposed methodology for implementation on the GPU using CUDA of the previous algorithm is presented in Section III. The experimental results are discussed in Section IV. Finally, the conclusions are show in the Section V. II. CANNY EDGE DETECTION ALGORIITHM The Canny edge detection algorithm [5] follows a list of criteria to improve current methods of edge detection. The first and most obvious is low error rate. It is important that edges occurring in images should not be missed and that there be no responses to non-edges. The second criterion is that the edge points be well localized. In other words, the distance between the edge pixels as found by the detector and the actual edge is to be at a minimum. A third criterion is to have only one response to a single edge [6], [7]. This algorithm consists of four steps as shown in Figure 1. Each step is explained separately in the following subsections

Figure 1. Canny algorithm steps.

A. Gaussian Smoothing Filter Image Smoothing is the process of blending local intensity values in an image. This process can reduce the impact of noise signals and is usually undertaken before the real edge detection operations. Most common smoothing operator is the Gaussian. Gaussian smoothing is performed by convolving the image with two dimensional Gaussian filter (mask), which is equivalent to convolving the image successively with two one-dimensional Gaussian filter in the x and y direction [8]. The one dimensional Gaussian function is defined by the Equation (1).

−x2

1 2 G (x ) = e 2σ σ 2π

(1)

Parameters σ stands for the width of the Gaussian filter, meaning smoothness. The larger the width of the Gaussian mask, lower is the detector sensitivity to noise. The localization error in the detected edges also increases Parameters σ can be adjusted according to the different images. B. Sobel Edge Detection After smoothing, the next step is the calculation of the image gradient. The gradient calculation leads to the detection of the possible edge strength and direction. This is executed by another convolution with a gradient operator. The most commonly used gradient operators is the Sobel operator. The operator performs a 2-D spatial gradient measurement on an image. The Sobel edge detector uses a pair of 3x3 convolution masks, one estimating the gradient in the x direction (columns) and the other estimating the gradient in the y direction (rows) [8]. As in the previous case, these masks can be performed by convolving the image successively with two one-dimensional Sobel filter in the x and y direction. The Sobel filter used for edge detection along x and y can be decomposed as indicated by Equations (2) and (3) respectively. 1 0 − 1 1  G X = 2 0 − 2 = 2 [1 0 − 1] 1 0 − 1 1 

(2)

 1 2 1   1 GY =  0 0 0  =  0  [1 2 1] − 1 − 2 − 1 − 1

(3)

The gradient magnitude and direction are shown in Equations (4) and (5) respectively. G = GX2 + GY2

G θ = arc −1  Y  GX

  

(4)

(5)

C. Non-Maximum Suppression Although Equation (4) is used in the classical Canny algorithm to determine the gradient direction in nonmaximum suppression, better accuracy edge detection is obtained with another variant, as described below [9], [10], [11]. In performing the improved non-maximum suppression method, the gradient direction is divided into eight regions as shown in Figure 2, and every direction region contains a range of a 45 degrees angles. Depending on whether the gradient is positive or negative and its absolute value difference in the horizontal and vertical directions, the approximate gradient direction of the current pixel can be determined.

Figure 2. Eight gradient directions division diagram.

Once the approximate gradient direction is known, the linear interpolation is used to calculate the values of the two neighboring pixels in the gradient direction. Finally, it is checked whether the gradient magnitude pixel value is maximum or not along the direction of the pixels gradient according to the interpolation result. If so, this pixel keeps its value and is considered as part of a possible edge. Otherwise, the pixel takes the value 0. D. Hysteresis Thresholding After non-maximum suppression the image may still contain some spurious responses. These responses which are called streaking can be eliminated by the use of hysterisis thresholding. Firstly, the high (Th) and low (Tl) thresholds are computed using the Otsu algorithm [12] as proposed in [6], [13]. Subsequently, it scans the whole image to detect any pixels that are marked as candidate edge points. If the gradient magnitude of analyzed pixel is greater than the high threshold Th, then it is absolutely an edge pixel, however, if the gradient magnitude of

analyzed pixel is lower than the low threshold Tl, then it is not absolutely an edge pixel. The remaining pixels are considered as the suspected edge pixels and examine their connectivity. If their adjacent pixels have edge pixels, then they are also considered as edge pixels, otherwise, they are non-edge pixels [4], [6]. III. GPU IMPLEMENTATION This section aims to describe how the four steps of Canny edge detection algorithm (described in Section II) are implemented on the GPU using CUDA C/C++. The used images are in grayscale and have the same number of rows and columns. A Matlab script file is responsible for reading these images and writes your content in a text file. So initially the content of the text file is read and copied to the CPU memory and later send it to the GPU global memory. Once processing is complete, the reverse process is performed, so that other Matlab script file is required. Of this way, the use of libraries to manipulate images is avoided. As shown in Sections II-A and II-B, the Sobel and Gaussian filters are separable functions and as such will be used. Moreover, in this way the computational efficiency is increased [8], [10]. This is because generally, a nonseparable filter of window size M×M computes operations per pixel, whereas for a separable filters are used, the cost would be reduced to computing operations. The implementation the Sobel and Gaussian filters consists of performing a convolution between the mask and the image, which process is divided into two steps. First, the convolution is performed in the x direction, and thereafter the convolution is performed in the y direction. These two steps are carried out by a single kernel function that executes inside the GPU, which is reused twice. To select the direction of the convolution and the mask, a boolean variable is used. The algorithm used in these kernel functions is similar to [14]. This algorithm divides the output image into small blocks, and splits the processing required to compute each block to different processors. Each block of output pixels depends upon the data in the corresponding input pixels, as well as neighboring input pixels in an apron of width R. This apron only extends in the direction the filter is operating. If the size of each output block is N rows by N columns, then for the column filter, the size of the input data block is (N+2R) by (N+2R) by N. Although the input data required by neighboring output blocks overlaps, there is no data dependency between the outputs of different blocks. In the case of the Gaussian filter is taken R=2, N=16, σ = 1.4 and a mask of 5 elements, so that according to Equation (1), the normalized mask is as follows: m = [0.1102 0.2369 0.3058 0.2369 0.1102]

(6)

Due to the symmetry properties of the Gaussian filter, the same mask is used in both directions. In the case of the Sobel filter is taken R=1 and N=16. The values of all masks are stored in the GPU constant memory. Subsequently, a kernel function is used to calculate the gradient magnitude, as shown in Equation (4). Two kernel functions are used to implement the non-maximum suppression step. The first kernel function performs the interpolation, for each image pixel two additional pixels are obtained, which are used to create two respective images, keeping the same position of the pixel analyzed. Thus, the second kernel function uses these three images to determine if pixel takes the value 0 or maintains its value. In this second kernel function, each block of output pixels only depends upon the data in the corresponding input pixels, Finally, to make the Hysteresis Thresholding step is necessary to calculate the two thresholds required. As mentioned before, for this, the Otsu algorithm is used, which is based on the normalized image histogram. A kernel function is created to calculate the normalized histogram using the shared memory and the atomic operation similarly to [15]. Subsequently, other simple kernel functions that perform the remaining of the Otsu algorithm are implemented. Once the high threshold is calculated, this is multiplied by 0.4 and the low threshold is obtained. With these two thresholds values calculated, the image pixels are classified into definite edges, possible edges or non-edges. A kernel function is used to perform this operation. A CPU-GPU hybrid approach is used to determine whether the pixels previously classified as possible edges become definitive border or not. So, a kernel function configured as in the case Sobel algorithm is created, the neighborhood of each possible edge pixel is examined to determine their definitive condition. If at least one pixel belonging to the neighborhood is a definitive edge, then the pixel is will be too, otherwise keeps its value. The kernel function is called within a loop implemented in the CPU. The only information that travels between the GPU to CPU, or viceversa, is a flag variable that indicates whether at least one pixel of the image to be moved from possible edge to definitive edge, which is used as a rupture (break) criterion. The rest of the possible edges are defined as non-edges through another kernel function. IV. RESULTS AND DISCUSSIONS The hardware used is a CPU (Intel Core 2 Duo E6750 (2.66 GHz) 4 GB) and a GPU (NVIDIA GeForce GTX970 (1664 CUDA Cores) 4GB [16]). In addition, grayscale images with different sizes (256x256, 512x512, 1024x1024 and 2048x2048) and various edge conditions (Lena, Mandrill, Peppers) are taken as the examples of the experiments. Moreover, two

conventional software running on a PC are implemented. The first is based on C ++ language and OpenCV library, while the second is developed using the Matlab software. The effect of the edge detection in the Lena and Peppers images with the size of 512x512 is shown in Figure 3a and Figure 3b respectively.

a)

b) Figure 3. Detection effect on CPU and GPU of the a) Lena image b) Peppers image.

The performance tests aim to evaluate the execution time of the three variants mentioned above. In Table 1, the execution time of each variant is shown using different images and sizes. Furthermore, the speedup achieved by the Canny algorithm implemented in this paper with respect to the other two variants is also shown. Image Lena

Mandrill

Peppers

Size 256x256 512x512 1024x1024 2048x2048 256x256 512x512 1024x1024 2048x2048 256x256 512x512 1024x1024 2048x2048

OpenCV & C++ 13.58 49.50 192.46 741.77 13.88 51.53 203.48 797.01 13.12 49.32 189.14 733.61

Matlab 53.37 218.13 879.12 3471.55 58.54 238.69 945.38 3656.02 58.34 234.11 940.57 3597.38

GPU1 5.94 8.49 14.95 35.36 5.62 9.84 27.61 54.19 6.50 10.90 26.05 82.17

Speedup1 2.29 5.83 12.87 20.98 2.47 5.24 7.37 14.71 2.02 4.52 7.26 8.93

Speedup2 8.98 25.69 58.80 98.18 10.42 24.26 34.24 67.47 8.98 21.48 36.11 43.78

GPU2 6.23 9.20 19.71 45.21 6.06 10.76 29.55 63.56 6.76 11.27 30.90 92.06

Table 1: Comparison of execute time in milliseconds.

Speedup3 2.18 5.38 9.76 16.41 2.29 4.79 6.89 12.54 1.94 4.38 6.12 7.97

Speedup4 8.57 23.71 44.60 76.79 9.66 22.18 31.99 57.52 8.63 20.77 30.44 39.08

The speedup of Canny algorithm implemented in the GPU (GPU1 column) with respect to its equivalent in C++ & OpenCV and Matlab are shown in Speedup1 and Speedup2 columns respectively. In this case, the data transfer times of input and output images are obviated. The speedup of Canny algorithm implemented in the GPU (GP21 column) with respect to its equivalent in C++ & OpenCV and Matlab are shown in Speedup3 and Speedup4 columns respectively. In this case, the data transfer times of input and output images are considered. Each measurement that shown in Table 1 is the result of averaging the execution times obtained from 100 tests on equal terms. The execution times of the three implementations of the Canny edge detector to increase as increasing the size of the images, as shown in Table 1. The execution time obtained in the algorithm implemented in OpenCV is smaller than that obtained in the algorithm implemented in Matlab, this is because in the first of these, the linear interpolation method (Non-Suppression) is not used. Better performance in terms of relative speedup between the implemented version on the GPU and the other two versions implemented in CPU are shown in all cases. Thus, the greater speedup are achieved with 2048x2048 size images, being in all the cases greater than 7 and 39 times with respect to C++ & OpenCV and Matlab implementations respectively. If the data transfer time is not included, GPU implementation achieved a better speedup. The percentage of execution time occupying each step of the Canny algorithm implemented in the GPU in relation to the total execution time are shown in Figures 4a and 4b. The data transfer time of input and output images are obviated as show in Figure 4a. The data transfer time of input and output images are considered as show in Figure 4b. Graph bars are in logarithmic scale to allow better comparison. For this test, the Lena image of size 1024x1024 is employed.

a)

b)

Figure 4. Execution time for Canny algorithm steps a) without I/O data transfers b) with I/O data transfers.

The Hysteresis Thresholding step (in both Figures) occupies a large percentage of the total runtime. This is not unexpected because the main kernel function involved in this step is called multiple times. Furthermore, the importance of avoiding the transfer of data between the CPU and GPU (and viceversa) can be seen in Figure 6, in this case this transfer takes up more than fifth of the total execution time. V. CONCLUSIONS The Canny edge detection algorithm on GPU using CUDA is implemented. Otsu algorithm for automatic calculation of the low and high thresholds of the Canny edge detection algorithm is employed. Besides, the interpolation in the Non-Maximum Suppression step used to improve the quality of edge detection in the image. Experimental results show that with the Canny algorithm implemented in GPU a acceptable speedup are achieved regarding a CPU implementations (OpenCV & C ++ and Matlab). This speedup value increases as increase the size of the processed image. Thus, the great advantages of GPU devices for processing images in real time are shown. The Hysteresis Thresholding step occupies a considerable percentage in the total execution time, so it would be advisable to consider other approaches for this implementation, and thus, improve the implementation done in this paper. VI. REFERENCES [1]

N. Zhang, Y. Wang and J. Wang, "Image parallel processing based on GPU," 2nd International

Conference on Advanced Computer Control, Shenyang, China, pp. 367-370, 2010. [2]

J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krger, A.E. Lefohn, and T.J. Purcell, “A Survey of

General-Purpose Computation on Graphics Hardware,” Computer Graphics Forum, vol. 26, no. 1, pp. 80-113, Mar. 2007. [3]

I. K. Park, N. Singhal, M. H. Lee, S. Cho, and C. W. Kim, "Design and Performance Evaluation of Image

Processing Algorithms on GPUs," IEEE Transactions on Parallel and Distributed Systems, pp. 91-104, 2011. [4]

C. Gentsos, C. Sotiropoulou, S. Nikolaidis and N. Vassiliadis, “Real-time canny edge detection parallel

implementation for FPGAs,” 17th IEEE International Conference on Electronics Circuits and Systems (ICECS), pp. 499 –502, Dec. 2010.

[5]

J.F. Canny, "A computation approach to edge detection," IEEE Transactions on Pattern Analysis and

Machine Intelligence, vol. 8, no. 6, pp. 769-798, Nov. 1986. [6]

J. Gao and N. Liu, “An improved adaptive threshold canny edge detection algorithm,” International

Conference on Computer Science and Electronics Engineering, vol. 1, Zhejiang, China, pp. 164–168, 2012. [7]

M. Ramu and T.V.S. Adinarayana, “A Hardware/Software Co-Design Architecture Implementation of

Canny Edge Detection using FPGA and Matlab,” International Journal of Software & Hardware Research in Engineering, vol. 1, no. 4, pp. 69-73, Dec. 2013. [8]

I. Young, J. Gerbrands and L. Van Vliet, Fundamentals of Image Processing. Delft University of

Technology, Netherlands, 2009. [9]

Z. Yan, L. Tao and L. Qing-Ling, “Detection of Foreign Bodies and Bubble Defects in Tire Radiography

Images Based on Total Variation and Edge Detection,” Chinese Physics Letters, vol. 30, no. 8, 2013. [10]

B. Li, U. Söderström, S. Ur Réhman and H. Li, “Restricted Hysteresis Reduce Redundancy in Edge

Detection,” Journal of Signal and Information Processing, vol. 4 no. 3B, pp. 158-163, Aug. 2013. [11]

D.A. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003.

[12]

N. Otsu, “A threshold selection method from gray-level histogram,” IEEE Transactions on Systems, Man

and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979. [13]

M, Fang, G.X. Yue and Q.C. Yu, "The Study on An Application of Otsu Method in Canny Operator",

International Symposium on Information Process, Huangshan, China, pp. 109-112, Aug. 2009. [14]

V.

Podlozhnyuk.

(2007,

Jun.).

“Image

convolution

Nov.).

“Histogram

with

CUDA,”

Available

in:

in

CUDA,”

Available

in:

http://developer.download.nvidia.com [15]

V.

Podlozhnyuk.

(2007,

calculation

http://developer.download.nvidia.com [16]

NVIDIA Corporation. (2015, Jan.). “GeForce GTX 970 Specifications GeForce," Available in:

http://www.geforce.com/

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.