Image Classification using Convolutional Neural Networks - ece.​gmu [PDF]

matlab deep learning toolbox was used [1]. The deep learning toolbox was first tested on the. MINST database of digits.

0 downloads 13 Views 704KB Size

Recommend Stories


Classification of breast cancer histology images using Convolutional Neural Networks
The only limits you see are the ones you impose on yourself. Dr. Wayne Dyer

A Survey on Object Classification using Convolutional Neural Networks
Everything in the universe is within you. Ask all from yourself. Rumi

Cancer Hallmark Text Classification Using Convolutional Neural Networks
Courage doesn't always roar. Sometimes courage is the quiet voice at the end of the day saying, "I will

Convolutional neural networks
Never wish them pain. That's not who you are. If they caused you pain, they must have pain inside. Wish

Convolutional neural networks
When you talk, you are only repeating what you already know. But if you listen, you may learn something

Image Super-Resolution Using Deep Convolutional Networks
Life is not meant to be easy, my child; but take courage: it can be delightful. George Bernard Shaw

Programming Assignment 2: Convolutional Neural Networks Colourization using Convolutional
Silence is the language of God, all else is poor translation. Rumi

Lecture 5: Convolutional Neural Networks
Just as there is no loss of basic energy in the universe, so no thought or action is without its effects,

Calibration of Convolutional Neural Networks
Learning never exhausts the mind. Leonardo da Vinci

Local Binary Convolutional Neural Networks
Learn to light a candle in the darkest moments of someone’s life. Be the light that helps others see; i

Idea Transcript


1

Image Classification using Convolutional Neural Networks (December 2016) Kerolos Nashed 

Abstract — This project uses deep convolutional neural networks to classify a small subset of images from the Caltech image database. 10 classes of images were taken from the database to classify. The ten image classes are brain, calculator, CD, coffee mug, diamond ring, faces, license plates, shirts, top hat, and watches. The total number of images used for the training and test set was 1487 and 50 respectively. A 7 layer deep convolutional network was the algorithm used for classification. However the network did not converge to an appropriate error hence the images in the test set were not classified. The training error never went lower than 50 percent (no convergence). The stagnation at 50 percent error could have been caused either by not having a large training set and/or the complexity of the images. Also, the convolutional neural network toolbox was outdated. [1] The deep learning toolbox only used a sigmoid as an activation function instead of the modern RELU or a cross entropy functions. Neural Networks using Auto encoders were also attempted on the dataset as an alternative to the convolutional neural network. However due to the complexity of the images the first auto encoder did not converge in a sufficient time period. Hence the complexity of the images to reduce to a lower dimension in a reasonable amount of time can be indicative of why the convolutional neural network was stagnant. In conclusion, image classification from images such as the ones from the Caltech image database need significant time and resources to have a converging neural network. In addition, a modern day Convolutional Neural Network toolbox that are fast ( Use multiple GPU’s) and more flexible should be used when tackling image classification problems.

I. INTRODUCTION There has been significant progress in field of object recognition using deep convolutional neural networks. Most major companies such as google [2] Are using them for help them with their image search tools. This project is focused on how to implement image recognition on a small subset of image classes from the Caltech image database: http://www.vision.caltech.edu/Image_Datasets/Ca ltech256/. II. CHOICES OF IMAGES The first step was to choose the image classes. The ten image classes that were chosen are the brain, calculator, CD, coffee mug, diamond ring, faces, License plates, shirts, top hat, and watches. A sample of the images chosen are shown in figure 1 below. These classes of images were picked because there are many noticeable differences between image classes that can be exploited by the neural network. Also images that were outlier were removed from the respective image classes. For example packs of CDS were removed since the majority of images of CD’s just had one CD. Also images that had one object obstructing the view of the image that was needed to classify was also removed. Rotating and flipping the images from left to right and zooming in on the image to create a larger training set was not done but was considered.

2

Table 1: image labels

Figure 1: Samples from images III. PREPROCESSING The next step was to preprocess all the images so that the neural network can expect the same size and color from all the images. To resize the images The Matlab command imresize was used. This command changes the image size using a bicubic interpolation. The code was written flexible enough so different sizes of images can be experimented with the convolutional neural network. The images were then converted to black and white by averaging over all the RGB values. The mean of the images was then subtracted from all the images to help with training. A sample of the preprocessed images are show below in figure 2. A randomize function was also written to randomly picks n number of images from each image class for the test set. Also the training images were also randomly fed into the network.

IV. CONVOLUTIONAL NEURAL NETWORK AND MINST DATA

The convolutional neural network used in this project is similar to that of figure 3. To implement the convolutional neural network a third party matlab deep learning toolbox was used [1]. The deep learning toolbox was first tested on the MINST database of digits.

Figure 3: Convolutional Neural Network Example

Figure 2: Images after Preprocessing In addition image labels for each image class were created. Each label had ten different values associated with it since there were ten different image classes. Table 1 below shows which image corresponds with which label.

The MINST data base is images of handwritten numbers from 1 to 10. So it was an appropriate and yet simple way to test and learn how to use the deep learning toolbox. The MINST data was tested using 5 layers which included an input layer, 2 convolutional layers and 2 pooling layers. The network was trained with a batch of 50 and an epoch of 10. The convolutional neural network worked flawlessly. It converged to a squared error of .0358 % (figure 4) and was able to work on the test images 97 % of the time. Also if trained for a longer period of time > 10 epochs the results can probably get even better.

3

Although this convolutional neural network worked well on the MINST data base it didn’t converge with this new data base. This can be due to multiple factors since the images from the Caltech database are significantly more complicated and larger in size then the MINST data.

Figure 4: MINST Data Results V. IMAGE CLASSIFICATION RESULTS The same exact network was used on the image classes created using the Caltech image database and the neural network did not converge. The sizes of the images were varied from 256x256 to 128x128 to 64x64 and the curve pretty much stayed the same. The final test was creating a 7 layer convolutional neural network an input layer, convolutional layer, pooling, convolutional, pooling, convolutional, and a pooing layer. The sizes of the layers used are as follows the first layer is 128x128x5 second layer is 124x124x12 the third layer is 62x62x12 the 4th layer is 58x58x24 the 5th layer is 29x29x24 the 6th layer is 24x24x36 the 7th layer is 12x12x36. The 7 layer network was trained with a batch of 5 and 100 epochs (9.2 hours). However the squared error would not go down it still stayed stagnate at 50 % as shown in figure 5.

VI. NEURAL NETWORKS USING AUTO ENCODERS An alternative to convolutional neural networks is neural networks with the use of Auto Encoders. Auto encoders are used to calculate a lower dimensional feature space for the input. The way the encoder works is by using a neural network to replicate the input on the output using one hidden layer. This is shown in figure 6 below. This hidden layer is a smaller dimension and contains all the information since the output can be replicated. Calculating the lower dimensions features can use multiple encoders to do a step by step dimension reduction. For image classification 3 auto encoders were to be used to drop the dimensionality of the images from 4096 (64*64 image) to 50 dimensions. The first auto encoder was going to drop the dimension from 4096 to 1000, the second was from 1000 to 200 and 200 to 50.

Figure 6: Auto encoder diagram

Figure 5: 7 layer convolutional network error vs. number of batches

The first auto encoder started with a very large mean squared error that was being reduced very slowly. Even after 4.42 hours (Figure 7) and 4000 epochs the auto encoder was nowhere close to converging (Mean Squared Error to high). Due to time constraints the auto encoder never fully created the lower dimensionality features. The auto encoder could have converged after a couple of days since the mean squared error was decreasing over time (

4

figure 8) . This may be due to the difficulty in creating good features from the complexity of the data set.

Figure 7: Auto Encoder information

Figure 8: MSE vs. Epoch VII. POSSIBLE

EXPLANATION

FOR

[2] [4] or a cross entropy [3] function to learn faster. Also the toolbox is only limited with only two types of layers the convolutional neural network and pooling. Other layers such as feature normalization [4] is also very helpful with allowing the network to learn. VIII. CONCLUSION AND FUTURE WORK In conclusion, object and image classification needs time and major computing resources to have a converging convolutional neural network and/ or auto encoder network. In addition, a toolbox that has activation function such as a RELU or a cross entropy should be used when dealing with image classification for learning to happen. Until the RELU unit came along deep learning was not really feasible [2] Future work can be done with another deep convolutional neural network toolbox. For example the matconvnet toolbox is currently the toolbox of choice when implementing a convolutional neural network in matlab. It has the capability of using RELU [5] as an activation function and it has different types of normalization layers to help with learning [4]. In addition, it allows for the use of GPU to help learning go quicker. The only downside to the toolbox is it can take a while to get it to compile appropriately in matlab. In addition, In addition, there may not have been enough training images for learning to be successful. Other images may be required to help with learning convergence. Also using the RGB color maps instead of averaging the images to black and white images can also help with learning.

[1] STAGNATE

RESULTS

The length of time needed by the auto encoder to create good features can be indicative of the reason why the CNN stays stagnate. In addition the deep learning toolbox is outdated. [1]. This is because it limited to only the sigmoid activation function. This is an issue because a sigmoid function takes forever to learn (vanishing gradient problem) [2] [3]. Currently most modern day convolutional neural networks use either a Rectified Linear Unit (RELU)

[2]

[3] [4]

REFERENCES Bergpalm, R. (n.d.). Deep learning toolbox. Retrieved from https://github.com/rasmusbergpa Palm/deeplerntoolbox GoogLeNetpresentation, http://imagenet.org/challenges/LSVRC/2014/slides/GoogLe Net.pptx, Last retrieved on: 19-09.2014. Nielsen, M. (2015). Neural Networks and Deep learning (Vol. 1). Krizhevsky, A., Sutskever, I., & Hinton, G. (n.d.). ImageNet Classification with Deep Convolutional Neural Networks.

5

[5] "MatConvNet - Convolutional Neural Networks for MATLAB", A. Vedaldi and K. Lenc, Proc. of the ACM Int. Conf. on Multimedia, 2015. To run the program run these scripts in order: loadimages.m – loads images randomize_data.m – randomizes data randomize_data2.m – randomizes data only for two image classes test_example.CNN - subtracts the mean from the images and runs the images through the network. Autoencoder.m – run this code after running randomize_data2.m.

Smile Life

When life gives you a hundred reasons to cry, show life that you have a thousand reasons to smile

Get in touch

© Copyright 2015 - 2024 PDFFOX.COM - All rights reserved.