0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Towards better laparoscopic video segmentation: A class‐wise contrastive learning approach with multi‐scale feature extraction

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The task of segmentation is integral to computer‐aided surgery systems. Given the privacy concerns associated with medical data, collecting a large amount of annotated data for training is challenging. Unsupervised learning techniques, such as contrastive learning, have shown powerful capabilities in learning image‐level representations from unlabelled data. This study leverages classification labels to enhance the accuracy of the segmentation model trained on limited annotated data. The method uses a multi‐scale projection head to extract image features at various scales. The partitioning method for positive sample pairs is then improved to perform contrastive learning on the extracted features at each scale to effectively represent the differences between positive and negative samples in contrastive learning. Furthermore, the model is trained simultaneously with both segmentation labels and classification labels. This enables the model to extract features more effectively from each segmentation target class and further accelerates the convergence speed. The method was validated using the publicly available CholecSeg8k dataset for comprehensive abdominal cavity surgical segmentation. Compared to select existing methods, the proposed approach significantly enhances segmentation performance, even with a small labelled subset (1–10%) of the dataset, showcasing a superior intersection over union (IoU) score.

          Abstract

          This paper proposes a novel method for sample pair division and optimizes the training framework of the segmentation model by integrating it with classification and contrastive learning tasks. This approach enhances the model's segmentation training effectiveness with limited annotated data.

          Related collections

          Most cited references29

          • Record: found
          • Abstract: found
          • Article: not found

          DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

          In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos.

            Surgical workflow recognition has numerous potential medical applications, such as the automatic indexing of surgical video databases and the optimization of real-time operating room scheduling, among others. As a result, surgical phase recognition has been studied in the context of several kinds of surgeries, such as cataract, neurological, and laparoscopic surgeries. In the literature, two types of features are typically used to perform this task: visual features and tool usage signals. However, the used visual features are mostly handcrafted. Furthermore, the tool usage signals are usually collected via a manual annotation process or by using additional equipment. In this paper, we propose a novel method for phase recognition that uses a convolutional neural network (CNN) to automatically learn features from cholecystectomy videos and that relies uniquely on visual information. In previous studies, it has been shown that the tool usage signals can provide valuable information in performing the phase recognition task. Thus, we present a novel CNN architecture, called EndoNet, that is designed to carry out the phase recognition and tool presence detection tasks in a multi-task manner. To the best of our knowledge, this is the first work proposing to use a CNN for multiple recognition tasks on laparoscopic videos. Experimental comparisons to other methods show that EndoNet yields state-of-the-art results for both tasks.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Supervised Contrastive Learning

              Contrastive learning applied to self-supervised representation learning has seen a resurgence in recent years, leading to state of the art performance in the unsupervised training of deep image models. Modern batch contrastive approaches subsume or significantly outperform traditional contrastive losses such as triplet, max-margin and the N-pairs loss. In this work, we extend the self-supervised batch contrastive approach to the fully-supervised setting, allowing us to effectively leverage label information. Clusters of points belonging to the same class are pulled together in embedding space, while simultaneously pushing apart clusters of samples from different classes. We analyze two possible versions of the supervised contrastive (SupCon) loss, identifying the best-performing formulation of the loss. On ResNet-200, we achieve top-1 accuracy of 81.4% on the ImageNet dataset, which is 0.8% above the best number reported for this architecture. We show consistent outperformance over cross-entropy on other datasets and two ResNet variants. The loss shows benefits for robustness to natural corruptions and is more stable to hyperparameter settings such as optimizers and data augmentations. Our loss function is simple to implement, and reference TensorFlow code is released at https://t.ly/supcon.
                Bookmark

                Author and article information

                Contributors
                kensaku@is.nagoya-u.ac.jp
                Journal
                Healthc Technol Lett
                Healthc Technol Lett
                10.1049/(ISSN)2053-3713
                HTL2
                Healthcare Technology Letters
                John Wiley and Sons Inc. (Hoboken )
                2053-3713
                13 January 2024
                Apr-Jun 2024
                : 11
                : 2-3 , Special Issue: Papers from the 17th Joint Workshop on Augmented Environments for Computer Assisted Interventions at MICCAI 2023 ( doiID: 10.1049/htl2.v11.2-3 )
                : 126-136
                Affiliations
                [ 1 ] Graduate School of Informatics Nagoya University Nagoya Aichi Japan
                [ 2 ] Information and Communications Nagoya University Nagoya Aichi Japan
                [ 3 ] Research Center of Medical Bigdata National Institute of Informatics Tokyo Japan
                Author notes
                [*] [* ] Correspondence

                Kensaku Mori, Graduate School of Informatics, Nagoya University, Furo-cho, Chikusa-ku, 464-8601 Nagoya, Aichi, Japan.

                Email: kensaku@ 123456is.nagoya-u.ac.jp

                Author information
                https://orcid.org/0009-0002-4427-5113
                https://orcid.org/0000-0001-7714-422X
                https://orcid.org/0000-0002-0100-4797
                Article
                HTL212069
                10.1049/htl2.12069
                11022235
                38638491
                e40b528c-dba6-4ca8-90de-72c8df5d0138
                © 2024 The Authors. Healthcare Technology Letters published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.

                History
                : 03 December 2023
                : 05 December 2023
                Page count
                Figures: 4, Tables: 2, Pages: 11, Words: 7373
                Funding
                Funded by: JST Moonshot R&D
                Award ID: JPMJMS2214
                Funded by: MEXT/JSPS KAKENHI
                Award ID: 21K19898
                Award ID: 17H00867
                Award ID: 26108006
                Funded by: JST CREST
                Award ID: JPMJCR20D5
                Categories
                Letter
                Letters
                Custom metadata
                2.0
                April-June 2024
                Converter:WILEY_ML3GV2_TO_JATSPMC version:6.4.0 mode:remove_FC converted:17.04.2024

                computer vision,convolutional neural nets,endoscopes,medical image processing

                Comments

                Comment on this article