ОГЛЯД СУЧАСНИХ АЛГОРИТМІВ ГЛИБИННОГО НАВЧАННЯ  ДЛЯ КЛАСИФІКАЦІЇ ЗОБРАЖЕНЬ

S. V. Shvets

Authors

S. V. Shvets Private higher educational institution "European University" https://orcid.org/0009-0006-0060-7760

Keywords:

image classification, convolutional neural networks, transformers, hybrid architectures, artificial neural networks, attention mechanism, computational complexity, deep learning

Abstract

The task of image classification remains one of the most relevant challenges in modern programming and has long been a central focus of scientific research. Since the introduction of artificial neural networks (ANNs), significant progress has been made in their development and adaptation to image classification problems. The implementation of convolutional neural networks (CNNs) provided a substantial impetus for the advancement of this field. Subsequently, the integration of attention mechanisms led to the emergence of new architectures that combine CNNs with attention modules, thereby enabling attention-based models to be effectively adapted for image classification tasks. Increasing attention is now being paid to models based on transformers, as well as to hybrid architectures that fuse convolutional neural networks with transformers. The hybrid model approach is currently regarded as one of the most promising directions in the development of image classification algorithms.This article presents a comprehensive analysis of the core neural network architectures designed for image classification, comparing their efficiency and computational complexity.It also outlines the key trends that shape ongoing research directions and summarizes the primary differences and applications of these models. The scientific novelty of this study lies in a analysis of the performance and computational complexity of modern state-of-the-art classification architectures, with particular emphasis on transformer-based and hybrid models.The analysis reveals that hybrid architectures, which integrate convolutional neural networks with attention mechanisms, represent a prospecting direction for solving image classification problems. The comparative overview of different architectures highlights the prevailing trends in the development of classification methods and their applicability to real-world tasks.

References

Multilayer perceptron: architecture optimization and training / H. Ramchoun et al. International journal of interactive multimedia and artificial intelligence. 2016. Vol. 4, no. 1. P. 26. URL: https://doi.org/10.9781/ijimai.2016.415 (date of access: 06.05.2025).

Object detection with deep learning: a review / Z.-Q. Zhao et al. IEEE transactions on neural networks and learning systems. 2019. Vol. 30, no. 11. P. 3212–3232. URL: https://doi.org/10.1109/tnnls.2018.2876865 (date of access: 06.05.2025).

Gradient-based learning applied to document recognition / Y. Lecun et al. Proceedings of the IEEE. 1998. Vol. 86, no. 11. P. 2278–2324. URL: https://doi.org/10.1109/5.726791 (date of access: 19.05.2025).

ImageNet large scale visual recognition challenge / O. Russakovsky et al. International journal of computer vision. 2015. Vol. 115, no. 3. P. 211–252. URL: https://doi.org/10.1007/s11263-015-0816-y (date of access: 19.05.2025).

Deep residual learning for image recognition / K. He et al. 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. 2016. URL: https://doi.org/10.1109/cvpr.2016.90 (date of access: 19.05.2025).

Vaswani A. Attention is all you need. NIPS’17: proceedings of the 31st international conference on neural information processing systems. 2017. URL: https://arxiv.org/abs/1706.03762.

An image is worth 16x16 words: transformers for image recognition at scale / A. Dosovitskiy et al. URL: https://arxiv.org/abs/2010.11929

CoAtNet: marrying convolution and attention for all data sizes / Z. Dai et al. URL: https://arxiv.org/abs/2106.04803

Going deeper with convolutions / C. Szegedy et al. URL: https://arxiv.org/abs/1409.4842

Sharma N., Jain V., Mishra A. An analysis of convolutional neural networks for image classification. Procedia computer science. 2018. Vol. 132. P. 377–384. URL: https://doi.org/10.1016/j.procs.2018.05.198 (date of access: 19.05.2025).

Dumoulin V., Visin F. A guide to convolution arithmetic for deep learning. URL: https://arxiv.org/abs/1603.07285

A Comparison of Pooling Methods for Convolutional Neural Networks / A. Zafar et al. Applied Sciences. 2022. Vol. 12, no. 17. P. 8643. URL: https://doi.org/10.3390/app12178643 (date of access: 19.05.2025).

LeCun Y., Kavukcuoglu K., Farabet C. Convolutional networks and applications in vision. 2010 IEEE international symposium on circuits and systems – ISCAS 2010, Paris, France, 30 May – 2 June 2010. 2010. URL: https://doi.org/10.1109/iscas.2010.5537907 (date of access: 19.05.2025).

Sharma S., Sharma S., Athaiya A. ACTIVATION FUNCTIONS IN NEURAL NETWORKS. International Journal of Engineering Applied Sciences and Technology. 2020. Vol. 04, no. 12. P. 310–316. URL: https://doi.org/10.33564/ijeast.2020.v04i12.054 (date of access: 19.05.2025).

Review of image classification algorithms based on convolutional neural networks / L. Chen et al. Remote sensing. 2021. Vol. 13, no. 22. P. 4712. URL: https://doi.org/10.3390/rs13224712 (date of access: 19.05.2025).

A comprehensive survey of loss functions in machine learning / Q. Wang et al. Annals of data science. 2020. URL: https://doi.org/10.1007/s40745-020-00253-5 (date of access: 19.05.2025).

Soydaner D. A comparison of optimization algorithms for deep learning. International journal of pattern recognition and artificial intelligence. 2020. Vol. 34, no. 13. P. 2052013. URL: https://doi.org/10.1142/s0218001420520138 (date of access: 19.05.2025).

Islam M. R., Matin A. Detection of COVID 19 from CT Image by The Novel LeNet-5 CNN Architecture. 2020 23rd international conference on computer and information technology (ICCIT), DHAKA, Bangladesh, 19–21 December 2020. 2020. URL: https://doi.org/10.1109/iccit51783.2020.9392723 (date of access: 19.05.2025).

Krizhevsky A., Sutskever I., Hinton G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM. 2017. Vol. 60, no. 6. P. 84–90. URL: https://doi.org/10.1145/3065386 (date of access: 19.05.2025).

Simonyan K., Zisserman A. Very deep convolutional networks for large-scale image recognition. URL: https://arxiv.org/abs/1409.1556.

Rethinking the inception architecture for computer vision / C. Szegedy et al. 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. 2016. URL: https://doi.org/10.1109/cvpr.2016.308 (date of access: 19.05.2025).

Ioffe S., Szegedy C. Batch normalization: accelerating deep network training by reducing internal covariate shift. URL: https://arxiv.org/abs/1502.03167.

Inception-v4, inception-resnet and the impact of residual connections on learning / C. Szegedy et al. Proceedings of the AAAI conference on artificial intelligence. 2017. Vol. 31, no. 1. URL: https://doi.org/10.1609/aaai.v31i1.11231 (date of access: 19.05.2025).

Lin M., Chen Q., Yan S. Network in network. URL: https://arxiv.org/abs/1312.4400

Densely connected convolutional networks / G. Huang et al. 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, 21–26 July 2017. 2017. URL: https://doi.org/10.1109/cvpr.2017.243 (date of access: 19.05.2025).

Brauwers G., Frasincar F. A general survey on attention mechanisms in deep learning. IEEE transactions on knowledge and data engineering. 2021. P. 1. URL: https://doi.org/10.1109/tkde.2021.3126456 (date of access: 19.05.2025).

Residual attention network for image classification / F. Wang et al. 2017 IEEE conference on computer vision and pattern recognition (CVPR), Honolulu, HI, 21–26 July 2017. 2017. URL: https://doi.org/10.1109/cvpr.2017.683 (date of access: 19.05.2025).

Squeeze-and-Excitation networks / J. Hu et al. IEEE transactions on pattern analysis and machine intelligence. 2020. Vol. 42, no. 8. P. 2011–2023. URL: https://doi.org/ 10.1109/tpami.2019.2913372 (date of access: 19.05.2025).

BAM: bottleneck attention module / J. Park et al. URL: https://arxiv.org/abs/1807.06514.

CBAM: convolutional block attention module / S. Woo et al. Computer vision – ECCV 2018. Cham, 2018. P. 3–19. URL: https://doi.org/10.1007/978-3-030-01234-2_1 (date of access: 19.05.2025).

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size / F. N. Iandola et al. URL: https://arxiv.org/abs/1602.07360

MobileNets: efficient convolutional neural networks for mobile vision applications / A. G. Howard et al. URL: https://arxiv.org/abs/1704.04861

MobileNetV2: inverted residuals and linear bottlenecks / M. Sandler et al. URL: https://arxiv.org/abs/1801.04381

Searching for MobileNetV3 / A. Howard et al. 2019 IEEE/CVF international conference on computer vision (ICCV), Seoul, Korea (South), 27 October – 2 November 2019. 2019. URL: https://doi.org/10.1109/iccv.2019.00140 (date of access: 19.05.2025).

Elsken T., Metzen J. H., Hutter F. Neural architecture search. Automated machine learning. Cham, 2019. P. 63–77. URL: https://doi.org/10.1007/978-3-030-05318-5_3 (date of access: 19.05.2025).

NetAdapt: platform-aware neural network adaptation for mobile applications / T.-J. Yang et al. Computer vision – ECCV 2018. Cham, 2018. P. 289–304. URL: https://doi.org/10.1007/978-3-030-01249-6_18 (date of access: 19.05.2025).

ShuffleNet: an extremely efficient convolutional neural network for mobile devices / X. Zhang et al. 2018 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Salt Lake City, UT, 18–23 June 2018. 2018. URL: https://doi.org/10.1109/cvpr.2018.00716 (date of access: 19.05.2025).

Tan M., Le Q. V. EfficientNet: rethinking model scaling for convolutional neural networks. URL: https://arxiv.org/abs/1905.11946

Trockman A., Kolter J. Z. Patches are all you need?. URL: https://arxiv.org/abs/2201.09792

Swin transformer: hierarchical vision transformer using shifted windows / Z. Liu et al. 2021 IEEE/CVF international conference on computer vision (ICCV), Montreal, QC, Canada, 10–17 October 2021. 2021. URL: https://doi.org/10.1109/iccv48922.2021.00986 (date of access: 19.05.2025).

Scaling local self-attention for parameter efficient visual backbones / A. Vaswani et al. 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. 2021. URL: https://doi.org/10.1109/cvpr46437.2021.01270 (date of access: 19.05.2025).

REVIEW OF MODERN DEEP LEARNING ALGORITHMS FOR IMAGE CLASSIFICATION

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Language