The meta book and size-dependent properties of written language

Sebastian Bernhardsson, Luis Enrique Correa da Rocha, Petter Minnhagen

Evidence is presented for a systematic text-length dependence of the power-law index γ of a single book. The estimated γ values are consistent with a monotonic decrease from 2 to 1 with increasing text length. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by γ = 1 instead of the value γ = 2 expected if Zipf's law is universally applicable. In addition, we explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.

Numéro d'article123015
journalNew Journal of Physics
Etat de la publicationPublié - 10 déc. 2009

