24
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Lexibank, a public repository of standardized wordlists with computed phonological and lexical features

      data-paper

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          The past decades have seen substantial growth in digital data on the world’s languages. At the same time, the demand for cross-linguistic datasets has been increasing, as witnessed by numerous studies devoted to diverse questions on human prehistory, cultural evolution, and human cognition. Unfortunately, most published datasets lack standardization which makes their comparison difficult. Here, we present a new approach to increase the comparability of cross-linguistic lexical data. We have designed workflows for the computer-assisted lifting of datasets to Cross-Linguistic Data Formats, a collection of standards that make these datasets more Findable, Accessible, Interoperable, and Reusable (FAIR). We test the Lexibank workflow on 100 lexical datasets from which we derive an aggregated database of wordlists in unified phonetic transcriptions covering more than 2000 language varieties. We illustrate the benefits of our approach by showing how phonological and lexical features can be automatically inferred, complementing and expanding existing cross-linguistic datasets.

          Abstract

          Measurement(s) expressions
          Technology Type(s) data aggregation
          Factor Type(s) none
          Sample Characteristic - Organism human language
          Sample Characteristic - Location global scale

          Related collections

          Most cited references66

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          The FAIR Guiding Principles for scientific data management and stewardship

          There is an urgent need to improve the infrastructure supporting the reuse of scholarly data. A diverse set of stakeholders—representing academia, industry, funding agencies, and scholarly publishers—have come together to design and jointly endorse a concise and measureable set of principles that we refer to as the FAIR Data Principles. The intent is that these may act as a guideline for those wishing to enhance the reusability of their data holdings. Distinct from peer initiatives that focus on the human scholar, the FAIR Principles put specific emphasis on enhancing the ability of machines to automatically find and use the data, in addition to supporting its reuse by individuals. This Comment is the first formal publication of the FAIR Principles, and includes the rationale behind them, and some exemplar implementations in the community.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            GenBank

            GenBank® (http://www.ncbi.nlm.nih.gov) is a comprehensive database that contains publicly available nucleotide sequences for almost 260 000 formally described species. These sequences are obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole-genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and GenBank staff assigns accession numbers upon data receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, begin at the NCBI home page: www.ncbi.nlm.nih.gov.
              Bookmark
              • Record: found
              • Abstract: found
              • Article: found
              Is Open Access

              D-PLACE: A Global Database of Cultural, Linguistic and Environmental Diversity

              From the foods we eat and the houses we construct, to our religious practices and political organization, to who we can marry and the types of games we teach our children, the diversity of cultural practices in the world is astounding. Yet, our ability to visualize and understand this diversity is limited by the ways it has been documented and shared: on a culture-by-culture basis, in locally-told stories or difficult-to-access repositories. In this paper we introduce D-PLACE, the Database of Places, Language, Culture, and Environment. This expandable and open-access database (accessible at https://d-place.org) brings together a dispersed corpus of information on the geography, language, culture, and environment of over 1400 human societies. We aim to enable researchers to investigate the extent to which patterns in cultural diversity are shaped by different forces, including shared history, demographics, migration/diffusion, cultural innovations, and environmental and ecological conditions. We detail how D-PLACE helps to overcome four common barriers to understanding these forces: i) location of relevant cultural data, (ii) linking data from distinct sources using diverse ethnonyms, (iii) variable time and place foci for data, and (iv) spatial and historical dependencies among cultural groups that present challenges for analysis. D-PLACE facilitates the visualisation of relationships among cultural groups and between people and their environments, with results downloadable as tables, on a map, or on a linguistic tree. We also describe how D-PLACE can be used for exploratory, predictive, and evolutionary analyses of cultural diversity by a range of users, from members of the worldwide public interested in contrasting their own cultural practices with those of other societies, to researchers using large-scale computational phylogenetic analyses to study cultural evolution. In summary, we hope that D-PLACE will enable new lines of investigation into the major drivers of cultural change and global patterns of cultural diversity.
                Bookmark

                Author and article information

                Contributors
                mattis_list@eva.mpg.de
                robert_forkel@eva.mpg.de
                Journal
                Sci Data
                Sci Data
                Scientific Data
                Nature Publishing Group UK (London )
                2052-4463
                16 June 2022
                16 June 2022
                2022
                : 9
                : 316
                Affiliations
                [1 ]GRID grid.419518.0, ISNI 0000 0001 2159 1813, Department of Linguistic and Cultural Evolution, , Max Planck Institute for Evolutionary Anthropology, ; Leipzig, Germany
                [2 ]GRID grid.9613.d, ISNI 0000 0001 1939 2794, Institut für Orientalistik, Indogermanistik, Ur- und Frühgeschichtliche Archäologie, , Friedrich-Schiller University, ; Jena, Germany
                [3 ]GRID grid.1001.0, ISNI 0000 0001 2180 7477, ARC Centre of Excellence for the Dynamics of Language, , Australia National University, ; Canberra, Australia
                [4 ]GRID grid.9654.e, ISNI 0000 0004 0372 3343, School of Psychology, , University of Auckland, ; Auckland, New Zealand
                Author information
                http://orcid.org/0000-0003-2133-8919
                http://orcid.org/0000-0003-1081-086X
                http://orcid.org/0000-0001-7832-6156
                http://orcid.org/0000-0002-6165-0440
                http://orcid.org/0000-0002-4746-5070
                http://orcid.org/0000-0002-9858-0191
                Article
                1432
                10.1038/s41597-022-01432-0
                9203750
                35013360
                ef318e63-e6ee-4f8a-9db3-f25327787820
                © The Author(s) 2022

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 14 October 2021
                : 26 May 2022
                Funding
                Funded by: FundRef https://doi.org/10.13039/100010663, EC | EU Framework Programme for Research and Innovation H2020 | H2020 Priority Excellent Science | H2020 European Research Council (H2020 Excellent Science - European Research Council);
                Award ID: 15618
                Award Recipient :
                Funded by: 1. Australian Research Council's Discovery Projects funding scheme (project number DE 120101954) 2. ARC Center of Excellence for the Dynamics of Language grant (CE140100041)
                Categories
                Data Descriptor
                Custom metadata
                © The Author(s) 2022

                social anthropology,interdisciplinary studies
                social anthropology, interdisciplinary studies

                Comments

                Comment on this article