Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Digitization created a demand for highly efficient handwritten document recognition systems. A handwritten document consists of digits, text, symbols, diagrams, etc. Digits are an essential element of handwritten documents. Accurate recognition of handwritten digits is vital for effective communication and data analysis. Various researchers have attempted to address this issue with modern convolutional neural network (CNN) techniques. Even after training, CNN filter weights remain unchanged despite the high identification accuracy. As a result, the process cannot flexibly adapt to input changes. Hence computer vision researchers have recently become interested in Vision Transformers (ViTs) and Multilayer Perceptrons (MLPs). The shortcomings of CNNs gave rise to a hybrid model revolution that combines the best elements of the two fields. This paper analyzes how the hybrid convolutional ViT model affects the ability to recognize handwritten digits. Also, the real-time data contains noise, distortions, and varying writing styles. Hence, cleaned and uncleaned handwritten digit images are used for evaluation in this paper. The accuracy of the proposed method is compared with the state-of-the-art techniques, and the result shows that the proposed model achieves the highest recognition accuracy. Also, the probable solutions for recognizing other aspects of handwritten documents are discussed in this paper.

•

Analyzed the effect of convolutional vision transformer on cleaned and real-time handwritten digit images.
•

The model's performance improved with the implication of cross-validation and hyper-parameter tuning.
•

The results show that the proposed model is robust, feasible, and effective on cleaned and uncleaned handwritten digits.

Graphical abstract

Related collections

Most cited references 12

Record: found
Abstract: found
Article: not found

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander A Kolesnikov … (2020)

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected)

0 comments Cited 376 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

CvT: Introducing Convolutions to Vision Transformers

Haiping Wu, Bin Xiao, Noel Codella … (2021)

0 comments Cited 103 times – based on 0 reviews

Bookmark

Record: found
Abstract: not found
Conference Proceedings: not found

EDEN: Evolutionary deep networks for efficient machine learning

Emmanuel Dufourq, Bruce Bassett, E. Dufourq … (2017)

0 comments Cited 7 times – based on 0 reviews

Bookmark

All references

Author and article information

Contributors

Jayant Jagtap

Ketan Kotecha

Journal

Journal ID (nlm-ta): MethodsX

Journal ID (iso-abbrev): MethodsX

Title: MethodsX

Publisher: Elsevier

ISSN (Electronic): 2215-0161

Publication date PMC-release: 05 January 2024

Publication date Collection: June 2024

Publication date (Electronic): 05 January 2024

Volume: 12

Electronic Location Identifier: 102554

Affiliations

[a ]Department of Computer Science and Information Technology, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India

[b ]NIMS Institute of Computing, Artificial Intelligence and Machine Learning, NIMS University Rajasthan, Jaipur, India

[c ]Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India

[d ]UCSI University, Kuala Lumpur 56000, Malaysia

Author notes

[* ]Corresponding authors. jayantjagtap@ 123456nimsuniversity.org director@ 123456sitpune.edu.in

Article

Publisher Item ID: S2215-0161(24)00009-8 Publisher ID: 102554

DOI: 10.1016/j.mex.2024.102554

PMC ID: 10825681

PubMed ID: 38292314

SO-VID: dea8883b-2ef5-4374-9a91-7662665ed619

License:

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition

Read this article at

Abstract

Graphical abstract

Related collections

Annual Reviews AI, Machine Learning, and Society

Most cited references 12

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

CvT: Introducing Convolutions to Vision Transformers

EDEN: Evolutionary deep networks for efficient machine learning

Author and article information

Contributors

Journal

Affiliations

Author notes

Article

History

Categories

Comments

Comment on this article

Similar content 351

Cited by 1

Most referenced authors 151