A Tale of Four Kernels
How does the software development process affect quality attributes
of the source code?
This page contains supporing material relevant to a conference paper
that examines this question:
Diomidis Spinellis.
A tale of four kernels.
In Wilhem Schäfer, Matthew B. Dwyer, and Volker Gruhn, editors,
ICSE '08: Proceedings of the 30th International Conference on
Software Engineering, pages 381–390, New York, May 2008. Association for Computing Machinery.
In this paper I analyze the source code of four systems operating
system kernels, FreeBSD, Linux, Solaris, and Windows,
by collecting metrics in the areas of file organization,
code structure, code style, the use of the C preprocessor,
and data organization.
Databases
The data, source code, and CCFinder results are permanently archived at DOI
10.5281/zenodo.2526915.
All database files are MySQL SQL dumps.
You can also find the queries used for extracting the metrics
from each database here.
You can find the schema of the databases described here, and a diagram of the logical schema below.
A Perl script, which can be downloaded from here,
will process the duplication results found by
CCFinder, and
calculate and print the percentage of cloned tokens in a project.
The script takes as an argument th base name of a .ccfxd file
produced by CCFinder.
It will print on its standard output,
the name of the project,
the number of files,
the number of tokens,
the number of clones,
and the percentage of cloned elements.
Based on the DRY (don't repeat yourself) principle,
this last number can be used as one quality indicator for the project.
Diomidis Spinellis home page