Size-dependent word frequencies and translational invariance of books

Sebastian Bernhardsson, Luis Enrique Correa da Rocha, Petter Minnhagen

Résultats de recherche: Contribution à un journal/une revueArticleRevue par des pairs

Résumé

It is shown that a real novel shares many characteristic features with a null model in which the words are randomly distributed throughout the text. Such a common feature is a certain translational invariance of the text. Another is that the functional form of the word-frequency distribution of a novel depends on the length of the text in the same way as the null model. This means that an approximate power-law tail ascribed to the data will have an exponent which changes with the size of the text-section which is analyzed. A further consequence is that a novel cannot be described by text-evolution models such as the Simon model. The size-transformation of a novel is found to be well described by a specific Random Book Transformation. This size transformation in addition enables a more precise determination of the functional form of the word-frequency distribution. The implications of the results are discussed.

langue originaleAnglais
Pages (de - à)330-341
Nombre de pages12
journalPhysica A: Statistical Mechanics and its Applications
Volume389
Numéro de publication2
Les DOIs
Etat de la publicationPublié - 15 janv. 2010

Empreinte digitale

Examiner les sujets de recherche de « Size-dependent word frequencies and translational invariance of books ». Ensemble, ils forment une empreinte digitale unique.

Contient cette citation