0
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      NSP-SCD: A corpus construction protocol for child-directed print in understudied languages

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Child-directed print corpora enable systematic psycholinguistic investigations, but this research infrastructure is not available in many understudied languages. Moreover, researchers of understudied languages are dependent on manual tagging because precise automatized parsers are not yet available. One plausible way forward is to limit the intensive work to a small-sized corpus. However, with little systematic enquiry about approaches to corpus construction, it is unclear how robust a small corpus can be made. The current study examines the potential of a non-sequential sampling protocol for small corpus development (NSP-SCD) through a cross-corpora and within-corpus analysis. A corpus comprising 17,584 words was developed by applying the protocol to a larger corpus of 150,595 words from children’s books for 3-to-10-year-olds. While the larger corpus will by definition have more instances of unique words and unique orthographic units, still, the selectively sampled small corpus approximated the larger corpus for lexical and orthographic diversity and was equivalent for orthographic representation and word length. Psycholinguistic complexity increased by book level and varied by parts of speech. Finally, in a robustness check of lexical diversity, the non-sequentially sampled small corpus was more efficient compared to a same-sized corpus constructed by simply using all sentences from a few books (402 books vs. seven books). If a small corpus must be used then non-sequential sampling from books stratified by book level makes the corpus statistics better approximate what is found in larger corpora. Overall, the protocol shows promise as a tool to advance the science of child language acquisition in understudied languages.

          Supplementary Information

          The online version contains supplementary material available at 10.3758/s13428-024-02339-x.

          Related collections

          Most cited references40

          • Record: found
          • Abstract: found
          • Article: not found

          To read or not to read: a meta-analysis of print exposure from infancy to early adulthood.

          This research synthesis examines whether the association between print exposure and components of reading grows stronger across development. We meta-analyzed 99 studies (N = 7,669) that focused on leisure time reading of (a) preschoolers and kindergartners, (b) children attending Grades 1-12, and (c) college and university students. For all measures in the outcome domains of reading comprehension and technical reading and spelling, moderate to strong correlations with print exposure were found. The outcomes support an upward spiral of causality: Children who are more proficient in comprehension and technical reading and spelling skills read more; because of more print exposure, their comprehension and technical reading and spelling skills improved more with each year of education. For example, in preschool and kindergarten print exposure explained 12% of the variance in oral language skills, in primary school 13%, in middle school 19%, in high school 30%, and in college and university 34%. Moderate associations of print exposure with academic achievement indicate that frequent readers are more successful students. Interestingly, poor readers also appear to benefit from independent leisure time reading. We conclude that shared book reading to preconventional readers may be part of a continuum of out-of-school reading experiences that facilitate children's language, reading, and spelling achievement throughout their development. (c) 2011 APA, all rights reserved.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Cutting the Gordian Knot: The Moving-Average Type–Token Ratio (MATTR)

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Linguistic Features of the Language of Schooling

                Bookmark

                Author and article information

                Contributors
                sonali.nag@education.ox.ac.uk
                Journal
                Behav Res Methods
                Behav Res Methods
                Behavior Research Methods
                Springer US (New York )
                1554-351X
                1554-3528
                15 February 2024
                15 February 2024
                2024
                : 56
                : 4
                : 2751-2764
                Affiliations
                [1 ]GRID grid.4991.5, ISNI 0000 0004 1936 8948, Department of Education, , University of Oxford, ; Oxford, UK
                [2 ]Department of Speech and Hearing, Manipal College of Health Professions, Manipal Academy of Higher Education, ( https://ror.org/02xzytt36) Manipal, India
                [3 ]GRID grid.457334.2, ISNI 0000 0001 0667 2738, NeuroSpin, CEA, ; Gif-sur-Yvette, France
                [4 ]The Promise Foundation, Bangalore, India
                Author information
                http://orcid.org/0000-0002-9557-4431
                http://orcid.org/0000-0002-1448-4718
                http://orcid.org/0000-0001-8320-4516
                Article
                2339
                10.3758/s13428-024-02339-x
                11133114
                38361097
                fdbea2fc-ecb3-47d9-87e8-fb63a96bbbee
                © The Author(s) 2024

                Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

                History
                : 9 January 2024
                Categories
                Original Manuscript
                Custom metadata
                © The Psychonomic Society, Inc. 2024

                Clinical Psychology & Psychiatry
                written language,child-directed print corpus,lexical diversity,akshara,phoneme length

                Comments

                Comment on this article