1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Performance analysis of hybrid deep learning framework using a vision transformer and convolutional neural network for handwritten digit recognition

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Digitization created a demand for highly efficient handwritten document recognition systems. A handwritten document consists of digits, text, symbols, diagrams, etc. Digits are an essential element of handwritten documents. Accurate recognition of handwritten digits is vital for effective communication and data analysis. Various researchers have attempted to address this issue with modern convolutional neural network (CNN) techniques. Even after training, CNN filter weights remain unchanged despite the high identification accuracy. As a result, the process cannot flexibly adapt to input changes. Hence computer vision researchers have recently become interested in Vision Transformers (ViTs) and Multilayer Perceptrons (MLPs). The shortcomings of CNNs gave rise to a hybrid model revolution that combines the best elements of the two fields. This paper analyzes how the hybrid convolutional ViT model affects the ability to recognize handwritten digits. Also, the real-time data contains noise, distortions, and varying writing styles. Hence, cleaned and uncleaned handwritten digit images are used for evaluation in this paper. The accuracy of the proposed method is compared with the state-of-the-art techniques, and the result shows that the proposed model achieves the highest recognition accuracy. Also, the probable solutions for recognizing other aspects of handwritten documents are discussed in this paper.

          • Analyzed the effect of convolutional vision transformer on cleaned and real-time handwritten digit images.

          • The model's performance improved with the implication of cross-validation and hyper-parameter tuning.

          • The results show that the proposed model is robust, feasible, and effective on cleaned and uncleaned handwritten digits.

          Graphical abstract

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: not found

          An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

          While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to replace certain components of convolutional networks while keeping their overall structure in place. We show that this reliance on CNNs is not necessary and a pure transformer applied directly to sequences of image patches can perform very well on image classification tasks. When pre-trained on large amounts of data and transferred to multiple mid-sized or small image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.), Vision Transformer (ViT) attains excellent results compared to state-of-the-art convolutional networks while requiring substantially fewer computational resources to train. Fine-tuning code and pre-trained models are available at https://github.com/google-research/vision_transformer. ICLR camera-ready version with 2 small modifications: 1) Added a discussion of CLS vs GAP classifier in the appendix, 2) Fixed an error in exaFLOPs computation in Figure 5 and Table 6 (relative performance of models is basically not affected)
            Bookmark
            • Record: found
            • Abstract: not found
            • Conference Proceedings: not found

            CvT: Introducing Convolutions to Vision Transformers

              Bookmark
              • Record: found
              • Abstract: not found
              • Conference Proceedings: not found

              EDEN: Evolutionary deep networks for efficient machine learning

                Bookmark

                Author and article information

                Contributors
                Journal
                MethodsX
                MethodsX
                MethodsX
                Elsevier
                2215-0161
                05 January 2024
                June 2024
                05 January 2024
                : 12
                : 102554
                Affiliations
                [a ]Department of Computer Science and Information Technology, Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India
                [b ]NIMS Institute of Computing, Artificial Intelligence and Machine Learning, NIMS University Rajasthan, Jaipur, India
                [c ]Symbiosis Centre for Applied Artificial Intelligence (SCAAI), Symbiosis Institute of Technology, Symbiosis International (Deemed University), Pune, Maharashtra, India
                [d ]UCSI University, Kuala Lumpur 56000, Malaysia
                Author notes
                Article
                S2215-0161(24)00009-8 102554
                10.1016/j.mex.2024.102554
                10825681
                38292314
                dea8883b-2ef5-4374-9a91-7662665ed619
                © 2024 The Authors. Published by Elsevier B.V.

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 6 October 2023
                : 4 January 2024
                Categories
                Engineering

                convolutional neural network,vision transformer,handwritten digit recognition,machine learning,computer vision,convolutional vision transformer

                Comments

                Comment on this article