TY - JOUR
T1 - Size-dependent word frequencies and translational invariance of books
AU - Bernhardsson, Sebastian
AU - Correa da Rocha, Luis Enrique
AU - Minnhagen, Petter
PY - 2010/1/15
Y1 - 2010/1/15
N2 - It is shown that a real novel shares many characteristic features with a null model in which the words are randomly distributed throughout the text. Such a common feature is a certain translational invariance of the text. Another is that the functional form of the word-frequency distribution of a novel depends on the length of the text in the same way as the null model. This means that an approximate power-law tail ascribed to the data will have an exponent which changes with the size of the text-section which is analyzed. A further consequence is that a novel cannot be described by text-evolution models such as the Simon model. The size-transformation of a novel is found to be well described by a specific Random Book Transformation. This size transformation in addition enables a more precise determination of the functional form of the word-frequency distribution. The implications of the results are discussed.
AB - It is shown that a real novel shares many characteristic features with a null model in which the words are randomly distributed throughout the text. Such a common feature is a certain translational invariance of the text. Another is that the functional form of the word-frequency distribution of a novel depends on the length of the text in the same way as the null model. This means that an approximate power-law tail ascribed to the data will have an exponent which changes with the size of the text-section which is analyzed. A further consequence is that a novel cannot be described by text-evolution models such as the Simon model. The size-transformation of a novel is found to be well described by a specific Random Book Transformation. This size transformation in addition enables a more precise determination of the functional form of the word-frequency distribution. The implications of the results are discussed.
KW - Random book transformation
KW - Text evolution models
KW - Word frequency distributions
UR - http://www.scopus.com/inward/record.url?scp=70349728767&partnerID=8YFLogxK
U2 - 10.1016/j.physa.2009.09.022
DO - 10.1016/j.physa.2009.09.022
M3 - Article
AN - SCOPUS:70349728767
SN - 0378-4371
VL - 389
SP - 330
EP - 341
JO - Physica A: Statistical Mechanics and its Applications
JF - Physica A: Statistical Mechanics and its Applications
IS - 2
ER -