The meta book and size-dependent properties of written language

Sebastian Bernhardsson, Luis Enrique Correa da Rocha, Petter Minnhagen

Research output: Contribution to journalArticlepeer-review

Abstract

Evidence is presented for a systematic text-length dependence of the power-law index γ of a single book. The estimated γ values are consistent with a monotonic decrease from 2 to 1 with increasing text length. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by γ = 1 instead of the value γ = 2 expected if Zipf's law is universally applicable. In addition, we explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.

Original languageEnglish
Article number123015
JournalNew Journal of Physics
Volume11
DOIs
Publication statusPublished - 10 Dec 2009

Fingerprint

Dive into the research topics of 'The meta book and size-dependent properties of written language'. Together they form a unique fingerprint.

Cite this