Word statistics in Blogs and RSS feeds: Towards empirical universal evidence

R. Lambiotte; M. Ausloos; M. Thelwall

doi:10.1016/j.joi.2007.07.001

Word statistics in Blogs and RSS feeds: Towards empirical universal evidence

R. Lambiotte, M. Ausloos, M. Thelwall

Research output: Contribution to journal › Article › peer-review

803 Downloads (Pure)

Abstract

We focus on the statistics of word occurrences and of the waiting times between such occurrences in Blogs. Due to the heterogeneity of words' frequencies, the empirical analysis is performed by studying classes of "frequently-equivalent" words, i.e. by grouping words depending on their frequencies. Two limiting cases are considered: the dilute limit, i.e. for those words that are used less than once a day, and the dense limit for frequent words. In both cases, extreme events occur more frequently than expected from the Poisson hypothesis. These deviations from Poisson statistics reveal non-trivial time correlations between events that are associated with bursts of activities. The distribution of waiting times is shown to behave like a stretched exponential and to have the same shape for different sets of words sharing a common frequency, thereby revealing universal features.

Original language	English
Pages (from-to)	277-286
Number of pages	10
Journal	Journal of Informetrics
Volume	1
Issue number	4
DOIs	https://doi.org/10.1016/j.joi.2007.07.001
Publication status	Published - 1 Jan 2007

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.joi.2007.07.001

WordFinal published version, 600 KB

Cite this

@article{b475e0e30cc140c2af3c3853c2b7a414,

title = "Word statistics in Blogs and RSS feeds: Towards empirical universal evidence",

abstract = "We focus on the statistics of word occurrences and of the waiting times between such occurrences in Blogs. Due to the heterogeneity of words' frequencies, the empirical analysis is performed by studying classes of {"}frequently-equivalent{"} words, i.e. by grouping words depending on their frequencies. Two limiting cases are considered: the dilute limit, i.e. for those words that are used less than once a day, and the dense limit for frequent words. In both cases, extreme events occur more frequently than expected from the Poisson hypothesis. These deviations from Poisson statistics reveal non-trivial time correlations between events that are associated with bursts of activities. The distribution of waiting times is shown to behave like a stretched exponential and to have the same shape for different sets of words sharing a common frequency, thereby revealing universal features.",

author = "R. Lambiotte and M. Ausloos and M. Thelwall",

year = "2007",

month = jan,

day = "1",

doi = "10.1016/j.joi.2007.07.001",

language = "English",

volume = "1",

pages = "277--286",

journal = "Journal of Informetrics",

issn = "1751-1577",

publisher = "Elsevier",

number = "4",

}

TY - JOUR

T1 - Word statistics in Blogs and RSS feeds

T2 - Towards empirical universal evidence

AU - Lambiotte, R.

AU - Ausloos, M.

AU - Thelwall, M.

PY - 2007/1/1

Y1 - 2007/1/1

N2 - We focus on the statistics of word occurrences and of the waiting times between such occurrences in Blogs. Due to the heterogeneity of words' frequencies, the empirical analysis is performed by studying classes of "frequently-equivalent" words, i.e. by grouping words depending on their frequencies. Two limiting cases are considered: the dilute limit, i.e. for those words that are used less than once a day, and the dense limit for frequent words. In both cases, extreme events occur more frequently than expected from the Poisson hypothesis. These deviations from Poisson statistics reveal non-trivial time correlations between events that are associated with bursts of activities. The distribution of waiting times is shown to behave like a stretched exponential and to have the same shape for different sets of words sharing a common frequency, thereby revealing universal features.

AB - We focus on the statistics of word occurrences and of the waiting times between such occurrences in Blogs. Due to the heterogeneity of words' frequencies, the empirical analysis is performed by studying classes of "frequently-equivalent" words, i.e. by grouping words depending on their frequencies. Two limiting cases are considered: the dilute limit, i.e. for those words that are used less than once a day, and the dense limit for frequent words. In both cases, extreme events occur more frequently than expected from the Poisson hypothesis. These deviations from Poisson statistics reveal non-trivial time correlations between events that are associated with bursts of activities. The distribution of waiting times is shown to behave like a stretched exponential and to have the same shape for different sets of words sharing a common frequency, thereby revealing universal features.

UR - http://www.scopus.com/inward/record.url?scp=34848834796&partnerID=8YFLogxK

U2 - 10.1016/j.joi.2007.07.001

DO - 10.1016/j.joi.2007.07.001

M3 - Article

AN - SCOPUS:34848834796

SN - 1751-1577

VL - 1

SP - 277

EP - 286

JO - Journal of Informetrics

JF - Journal of Informetrics

IS - 4

ER -

Word statistics in Blogs and RSS feeds: Towards empirical universal evidence

Abstract

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this