Inviting an author to review:
Find an author and click ‘Invite to review selected article’ near their name.
Search for authorsSearch for similar articles
36
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Zipf's law holds for phrases, not words

      Preprint

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          With Zipf's law being originally and most famously observed for word frequency, it is surprisingly limited in its applicability to human language, holding over no more than three to four orders of magnitude before hitting a clear break in scaling. Here, building on the simple observation that phrases of one or more words comprise the most coherent units of meaning in language, we show empirically that Zipf's law for phrases extends over as many as nine orders of rank magnitude. In doing so, we develop a principled and scalable statistical mechanical method of random text partitioning, which opens up a rich frontier of rigorous text analysis via a rank ordering of mixed length phrases.

          Related collections

          Most cited references1

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Measuring the happiness of large-scale written expression: Songs, Blogs, and Presidents

          The importance of quantifying the nature and intensity of emotional states at the level of populations is evident: we would like to know how, when, and why individuals feel as they do if we wish, for example, to better construct public policy, build more successful organizations, and, from a scientific perspective, more fully understand economic and social phenomena. Here, by incorporating direct human assessment of words, we quantify happiness levels on a continuous scale for a diverse set of large-scale texts: song titles and lyrics, weblogs, and State of the Union addresses. Our method is transparent, improvable, capable of rapidly processing Web-scale texts, and moves beyond approaches based on coarse categorization. Among a number of observations, we find that the happiness of song lyrics trends downward from the 1960's to the mid 1990's while remaining stable within genres, and that the happiness of blogs has steadily increased from 2005 to 2009, exhibiting a striking rise and fall with blogger age and distance from the equator.
            Bookmark

            Author and article information

            Journal
            2014-06-19
            2015-03-04
            Article
            1406.5181
            6c498e29-073b-4619-ba24-6991494315c5

            http://arxiv.org/licenses/nonexclusive-distrib/1.0/

            History
            Custom metadata
            Manuscript: 6 pages, 3 figures; Supplementary Information: 8 pages, 18 tables
            cs.CL physics.soc-ph

            General physics,Theoretical computer science
            General physics, Theoretical computer science

            Comments

            Comment on this article