The meta book and size-dependent properties of written language

Sebastian Bernhardsson, Luis Enrique Correa da Rocha, Petter Minnhagen

Résultats de recherche: Contribution à un journal/une revueArticleRevue par des pairs

Résumé

Evidence is presented for a systematic text-length dependence of the power-law index γ of a single book. The estimated γ values are consistent with a monotonic decrease from 2 to 1 with increasing text length. A direct connection to an extended Heap's law is explored. The infinite book limit is, as a consequence, proposed to be given by γ = 1 instead of the value γ = 2 expected if Zipf's law is universally applicable. In addition, we explore the idea that the systematic text-length dependence can be described by a meta book concept, which is an abstract representation reflecting the word-frequency structure of a text. According to this concept the word-frequency distribution of a text, with a certain length written by a single author, has the same characteristics as a text of the same length extracted from an imaginary complete infinite corpus written by the same author.

langue originaleAnglais
Numéro d'article123015
journalNew Journal of Physics
Volume11
Les DOIs
Etat de la publicationPublié - 10 déc. 2009

Empreinte digitale

Examiner les sujets de recherche de « The meta book and size-dependent properties of written language ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation