1
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Caching and Reproducibility: Making Data Science Experiments Faster and FAIRer

      brief-report

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open access. The consequence is twofold. First, subsequent researchers must spend significant work hours building upon the proposed hypotheses or experimental framework. In the worst case, others cannot reproduce the experiment and reuse the findings for subsequent research. Second, suppose the ad-hoc research software fails during often long-running computational expensive experiments. In that case, the overall effort to iteratively improve the software and rerun the experiments creates significant time pressure on the researchers. We suggest making caching an integral part of the research software development process, even before the first line of code is written. This article outlines caching recommendations for developing research software in data science projects. Our recommendations provide a perspective to circumvent common problems such as propriety dependence, speed, etc. At the same time, caching contributes to the reproducibility of experiments in the open science workflow. Concerning the four guiding principles, i.e., Findability, Accessibility, Interoperability, and Reusability (FAIR), we foresee that including the proposed recommendation in a research software development will make the data related to that software FAIRer for both machines and humans. We exhibit the usefulness of some of the proposed recommendations on our recently completed research software project in mathematical information retrieval.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The FAIR Guiding Principles for scientific data management and stewardship

          There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            SCIENTIFIC STANDARDS. Promoting an open research culture.

              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              SymPy: symbolic computing in Python

              SymPy is an open source computer algebra system written in pure Python. It is built with a focus on extensibility and ease of use, through both interactive and programmatic applications. These characteristics have led SymPy to become a popular symbolic library for the scientific Python ecosystem. This paper presents the architecture of SymPy, a description of its features, and a discussion of select submodules. The supplementary material provide additional examples and further outline details of the architecture and features of SymPy.
                Bookmark

                Author and article information

                Contributors
                Journal
                Front Res Metr Anal
                Front Res Metr Anal
                Front. Res. Metr. Anal.
                Frontiers in Research Metrics and Analytics
                Frontiers Media S.A.
                2504-0537
                22 April 2022
                2022
                : 7
                : 861944
                Affiliations
                [1] 1Chair for Data and Knowledge Engineering, University of Wuppertal , Wuppertal, Germany
                [2] 2FIZ Karlsruhe - Leibniz Institute for Information Infrastructure , Berlin, Germany
                [3] 3Digital Content and Media Sciences Research Division, National Institute of Informatics , Tokyo, Japan
                [4] 4Chair for Scientific Information Analytics, University of Göttingen , Göttingen, Germany
                Author notes

                Edited by: Corina Pascu, European Union Agency for Cybersecurity, Greece

                Reviewed by: Andrea Mannocci, Istituto di Scienza e Tecnologie dell'informazione “Alessandro Faedo” (ISTI), Italy

                *Correspondence: Moritz Schubotz moritz.schubotz@ 123456fiz-karlsruhe.de

                This article was submitted to Scholarly Communication, a section of the journal Frontiers in Research Metrics and Analytics

                Article
                10.3389/frma.2022.861944
                9075102
                35531060
                212c0dd2-d1b9-4857-90f0-6d74578d159f
                Copyright © 2022 Schubotz, Satpute, Greiner-Petter, Aizawa and Gipp.

                This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

                History
                : 25 January 2022
                : 25 March 2022
                Page count
                Figures: 0, Tables: 1, Equations: 0, References: 13, Pages: 6, Words: 4564
                Categories
                Research Metrics and Analytics
                Perspective

                caching,data science (ds),reproducibility of results,open science,research software

                Comments

                Comment on this article