15
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Data harmonisation for information fusion in digital healthcare: A state-of-the-art systematic review, meta-analysis and future research directions

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Highlights

          • This work surveys computational data harmonisation approaches in digital healthcare.

          • A comprehensive checklist that summarises common practices for data harmonisation.

          • A meta-analysis is conducted to explore harmonisation studies in various modalities.

          • A critique of existing harmonisation strategies is presented for future research.

          Abstract

          Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.

          Related collections

          Most cited references164

          • Record: found
          • Abstract: found
          • Article: not found

          Comprehensive Integration of Single-Cell Data

          Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Integrating single-cell transcriptomic data across different conditions, technologies, and species

            Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: not found

              Fast unfolding of communities in large networks

              Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008
                Bookmark

                Author and article information

                Contributors
                Journal
                Inf Fusion
                Inf Fusion
                An International Journal on Information Fusion
                Elsevier
                1566-2535
                1872-6305
                1 June 2022
                June 2022
                : 82
                : 99-122
                Affiliations
                [a ]National Heart and Lung Institute, Imperial College London, London, Northern Ireland UK
                [b ]Cardiovascular Research Centre, Royal Brompton Hospital, London, Northern Ireland UK
                [c ]School of Biomedical Engineering & Imaging Sciences, King's College London, London, Northern Ireland UK
                [d ]Department of Communications Engineering, University of the Basque Country UPV/EHU, Bilbao 48013, Spain
                [e ]TECNALIA, Basque Research and Technology Alliance (BRTA), Derio 48160, Spain
                [f ]Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, Northern Ireland UK
                [g ]Oncology R&D, AstraZeneca, Cambridge, Northern Ireland UK
                [h ]Department of Radiology, University of Cambridge, Cambridge, Northern Ireland UK
                [i ]Clinical Data Interchange Standards Consortium, Austin, TX, United States of America
                [j ]University Hospital of Liège (CHU Liège), Respiratory medicine department, Liège, Belgium
                [k ]University of Liege, Department of clinical sciences, Pneumology-Allergology, Liège, Belgium
                [l ]QUIBIM, Valencia, Spain
                [m ]Technische Hochschule Ingolstadt, Ingolstadt, Germany
                [n ]GE Healthcare GmbH, Munich, Germany
                [o ]Radiomics (Oncoradiomics SA), Liège, Belgium
                [p ]Thirona, Nijmegen, The Netherlands
                [q ]Department of Precision Medicine, Maastricht University, Maastricht, The Netherlands
                [r ]Medical Imaging Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
                [s ]Department of Computer Sciences and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI) University of Granada, Granada, Spain
                [t ]Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
                Author notes
                [#]

                Francisco Herrera and Guang Yang are co-last authors of this work.

                Article
                S1566-2535(22)00015-X
                10.1016/j.inffus.2022.01.001
                8878813
                35664012
                2d8b7392-5ebf-4ce3-843b-2df7f3f4dcb2
                © 2022 The Author(s)

                This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

                History
                : 24 October 2021
                : 22 December 2021
                : 7 January 2022
                Categories
                Article

                information fusion,data harmonisation,data standardisation,domain adaptation,reproducibility

                Comments

                Comment on this article