A COMPARATIVE STUDY OF NEURAL NETWORK ARCHITECTURES FOR IMAGE RECOGNITION TASKS

Authors

  • Venkata Sai Swaroop Reddy Senior Software Engineer, Twitter Inc, USA. Author
  • Nallapa Reddy Author

Keywords:

Neural Network Architectures, CNN, VGG-19

Abstract

Computer vision, medical image analysis, and autonomous vehicles are just a few of the many areas that have found success using Convolutional Neural Networks (CNNs), a potent tool for picture recognition. Choosing the correct CNN architecture and training method, however, can be difficult when working with huge datasets. The paper compares different convolutional neural network (CNN) designs and training strategies to determine which is the most effective for photo recognition. A variety of designs and training techniques for convolutional neural networks (CNNs) for image recognition are covered in the literature review, which covers earlier research on the subject. This paper outlines the pros and cons of popular convolutional neural network (CNN) designs including LeNet, AlexNet, VGG, and ResNet. Common training methods are also discussed in the literature, such as SGD, Adam, and BN. The research relied on the CIFAR-10 dataset, which included sixty thousand 10-category colour images at a resolution of thirty-two by thirty-two pixels. As a first step in the data preparation procedure, we normalised the pixel values and added some random rotations and flips to the training set. Using three distinct training methods—SGD, Adam, and SGD with BN—the researchers constructed and trained seven distinct CNN architectures, namely LeNet, AlexNet, VGG-16, VGG-19, ResNet-50, ResNet-101, and ResNet-152. With a 94.7% success rate, ResNet-152 proved to be the best architecture for the CIFAR-10 dataset. ResNet-101 and VGG-19 were just behind, both reaching 93.7% accuracy. With 152 layers, ResNet-152 outperformed VGG-19, which has only 19 layers, demonstrating that deeper networks beat shallower ones. Improved performance of the CNN architectures, leading to faster convergence and higher accuracy, was achieved by incorporating BN into the SGD training technique. This study sheds light on the relative merits of various convolutional neural network (CNN) designs and training methods for picture recognition. High accuracy in image recognition tasks is achieved by carefully choosing the CNN architecture and training technique, as shown by the findings. Adding BN to SGD enhanced performance, highlighting the importance of the training strategy in the study. These results have real-world consequences since they may guide future CNN designs by experts in the area of image recognition. Research in the future might look into how well these CNN designs and training methods work with other datasets, as well as investigate other cutting-edge approaches like adversarial training and transfer learning. Computer vision, medical image processing, and autonomous cars are just a few of the many potential applications of convolutional neural networks (CNNs) that could be made possible by this type of research.

References

Tra, Viet & Kim, Jaeyoung & Khan, Sheraz & Kim, Jongmyon. (2017). Bearing Fault Diagnosis under Variable Speed Using Convolutional Neural Networks and the Stochastic Diagonal Levenberg-Marquardt Algorithm. Sensors, 17, 2834. doi:10.3390/s17122834.

Han X, Zhong Y, Cao L, Zhang L. (2017). Pre-Trained AlexNet Architecture with Pyramid Pooling and Supervision for High Spatial Resolution Remote Sensing Image Scene Classification. Remote Sensing, 9(8), 848. doi:10.3390/rs9080848.

Algarni, Abeer. (2020). Efficient Object Detection and Classification of Heat Emitting Objects from Infrared Images Based on Deep Learning. Multimedia Tools and Applications, 79. doi:10.1007/s11042-020-08616-z.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).

Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

Szegedy, C., Ioffe, S., Vanhoucke, V., & Alemi, A. (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Thirty-first AAAI conference on artificial intelligence.

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., ... & Adam, H. (2017). Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861.

Simonyan, K., Vedaldi, A., & Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034.

Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision (pp. 818-833).

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Identity mappings in deep residual networks. In European conference on computer vision (pp. 630-645).

Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4700-4708).

Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning (Vol. 1). MIT press.

Chollet, F. (2018). Deep learning with Python. Manning Publications.

LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.

Géron, A. (2017). Hands-on machine learning with Scikit-Learn and TensorFlow: concepts, tools, and techniques to build intelligent systems. O'Reilly Media, Inc.

Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807-814).

"New computer vision challenge wants to teach robots to see in 3D". New Scientist. 7 April 2017. Retrieved 3 February 2018.

Markoff, John (19 November 2012). "For Web Images, Creating New Technology to Seek and Find". The New York Times. Retrieved 3 February 2018.

"ImageNet". 7 September 2020. Archived from the original on 7 September 2020.

"From not working to neural networking". The Economist. 25 June 2016. Retrieved 3 February 2018.

Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. IJCV.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84–90. doi:10.1145/3065386.

"Machines 'beat humans' for a growing number of tasks". Financial Times. 30 November 2017. Retrieved 3 February 2018.

Gershgorn, D. (18 June 2018). "The inside story of how AI got good enough to dominate Silicon Valley". Quartz. Retrieved 10 December 2018.

He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. doi:10.1109/CVPR.2016.90.

Hempel, J. (13 November 2018). "Fei-Fei Li's Quest to Make AI Better for Humanity". Wired. Retrieved 5 May 2019.

Gershgorn, D. (26 July 2017). "The data that transformed AI research—and possibly the world". Quartz. Retrieved 26 July 2017.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. 2009 conference on Computer Vision and Pattern Recognition. Retrieved from https://doi.org/10.1109/CVPR.2009.5206848.

Li, F.-F. (2015, March 23). How we're teaching computers to understand pictures. Retrieved 16 December 2018.

Ridnik, T., Ben-Baruch, E., Noy, A., & Zelnik-Manor, L. (2021). ImageNet-21K Pretraining for the Masses. arXiv:2104.10972 [cs.CV].

Robbins, M. (6 May 2016). "Does an AI need to make love to Rembrandt's girlfriend to make art?". The Guardian. Retrieved 22 June 2016.

Markoff, J. (10 December 2015). "A Learning Advance in Artificial Intelligence Rivals Human Abilities". The New York Times. Retrieved 22 June 2016.

Aron, J. (21 September 2015). "Forget the Turing test – there are better ways of judging AI". New Scientist. Retrieved 22 June 2016.

Gershgorn, D. (10 September 2017). "The Quartz guide to artificial intelligence: What is it, why is it important, and should we be afraid?". Quartz. Retrieved 3 February 2018.

Downloads

Published

2021-05-30

How to Cite

Venkata Sai Swaroop Reddy, & Nallapa Reddy. (2021). A COMPARATIVE STUDY OF NEURAL NETWORK ARCHITECTURES FOR IMAGE RECOGNITION TASKS. INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND TECHNOLOGY (IJARET), 12(05), 234-245. https://lib-index.com/index.php/IJARET/article/view/IJARET_12_05_022