blog dds

2008.08.08

Two More Years of Wikipedia Data

Following a study that my colleague Panagiotis Louridas and I published in the August 2008 issue of the Communications of the ACM, Victor Grishchenko gave me a copy of a complete Wikipedia dump covering 2006 and 2007 (enwiki-20080103-pages-meta-history.xml.7z). Over the past four days I reran the study on this new data set.

A related Wikipedia discussion remarked that

being based only on February 2006 data means that [the study] missed the growth-mode transition that happened around September 2006, when the exponential phase of growth ended.
Fortunatelly, the new results I obtained from the dataset that ended on January 2008 don't appear to differ from the ones based on the study's 2006 data set. Here are the corresponding charts; refer to the article for the original charts and explainations.

Coverage of Wikipedia articles

Number of entries with a given difference between the time of the first reference to the entry and the addition of its definition

Number of references to an entry at the time of its definition

Quantile-quantile plot of the expected and actual number of references added each month to each article

Frequency distributions of the expected and actual number of references added each month to each article

Read and post comments    AddThis Social Bookmark Button


Creative Commons License Last modified: Friday, August 8, 2008 11:30 pm
Unless otherwise expressly stated, all original material on this page created by Diomidis Spinellis is licensed under a Creative Commons Attribution-Share Alike 3.0 Greece License.