Some links lead to content that does not contain navigation links.
Following such a link may open a new browser window.
Contents
See also the software posted on my blog
and my repositories on GitHub.
Contents — Home
Research Software and Data
-
The evolution of the Unix system architecture
Through this link you can obtain
the replication package, tools, and data files used in the following paper.
- Diomidis Spinellis and Paris Avgeriou.
Evolution of the Unix system architecture: An exploratory case study.
IEEE Transactions on Software Engineering, 2019.
doi:10.1109/TSE.2019.2892149
-
The evolution of programming practices in the Unix operating system
Through this link you can obtain
the scripts and data files used for tracking the evolution of C
programming practices in the following paper.
The study also uses the
cqmetrics tool and the
Unix history version-control repository listed below.
-
The evolution of the Unix operating system
is made available as a
version-control repository,
covering the period from its inception in 1972
as a five thousand line kernel, to 2015 as a widely-used 26 million line
system.
The repository contains 659 thousand commits and 2306 merges. The
repository employs the commonly used Git system for its storage, and is
hosted on the popular GitHub archive.
It has been created by synthesizing
with custom software 24 snapshots of systems developed at Bell Labs,
Berkeley University, and the 386BSD team, two legacy repositories, and
the modern repository of the open source FreeBSD system. In total, 850
individual contributors are identified, the early ones through primary
research.
The data set can be used for empirical research in software
engineering, information systems, and software archaeology.
Through the following links you can find the
repository data
and the
software used to build it.
The repository's contents and development process are described in
the following papers,
which are made available as a preprint in
HTML and
PDF form,
as well as an associated
poster.
- cqmetrics,
provides raw figures and diverse quality
metrics associated with C code.
These include the number of functions, lines, and statements;
the number of occurrences of various keywords;
the use of comments and preprocessing;
the number and length of identifiers;
the Halstead and cyclomatic compexity per function;
the use of spacing for indentation;
a measure of style inconsistency; and
numbers associated with probable style infractions.
- dgsh,
the directed graph shell, provides an expressive way to construct
sophisticated and efficient big data processing pipelines
using existing Unix tools as well as custom-built components. It is a
Unix-style shell allowing the specification of pipelines with non-linear
scatter-gather operations. These form a directed acyclic process graph,
which is typically executed by multiple processor cores, thus increasing
the operation's processing throughput.
-
The carbon footprint of conference papers
Through the DOI link 10.5281/zenodo.2526381 you can obtain
the scripts an data files used for geolocating conference locations
and authors and calculating the CO2 emissions associated with
the corresponding conference travel.
This is described in the following paper.
- CScout
is a source code analyzer and refactoring browser for
collections of C programs.
It can process workspaces of multiple projects
mapping the complexity introduced
by the C preprocessor back into the original C source code files.
CScout takes advantage of modern hardware advances (fast processors
and large memory capacities) to analyze C source code beyond the level
of detail and accuracy provided by current compilers and linkers.
See the following articles.
- UMLGraph
allows the declarative specification and drawing of a number of UML
diagrams.
See also the article titled
On the declarative specification of models.
IEEE Software, 20(2):94-96, March/April 2003.
-
Wikipedia growth contains the data and source
code of a
a longitudinal study of Wikipedia's evolution showing that although
Wikipedia's scope is increasing, its coverage is not deteriorating,
and demonstrating the creation of a large real world scale-free graph through a combination of incremental growth and preferential attachment.
The results of this study were published in the article:
Diomidis Spinellis and Panagiotis Louridas.
The collaborative organization of knowledge.
Communications of the ACM, 51(8):68–73, August 2008.
(doi:10.1145/1378704.1378720)
The data are permanently archived at DOI
10.5281/zenodo.2526703
and the source code at DOI
10.5281/zenodo.2526733.
-
Comparative Language Fuzz Testing
contains the source code as well as test and processing scripts
for comparing how various language implementations
allow the detection of simple errors at compile or at run time.
The study is based on a diverse corpus of programs written in several programming languages systematically perturbed using a mutation-based fuzz generator.
The results obtained prove that languages with weak type systems are significantly
likelier than languages that enforce strong typing to let fuzzed programs
compile and run, and, in the end, produce erroneous results.
The results of this study were published in the following paper.
Diomidis Spinellis, Vassilios Karakoidas,
and Panagiotis Louridas.
Comparative
language fuzz testing: Programming languages vs. fat fingers.
In PLATEAU 2012: 4th Annual International Workshop on Evaluation and
Usability of Programming Languages and Tools—Systems, Programming,
Languages and Applications: Software for Humanity (SPLASH 2012). ACM,
October 2012.
-
A tale of Four Kernels
contains the data and SQL queries used in the ICSE paper comparing the code
quality attributes of FreeBSD, Linux, Solaris, and the Windows Research Kernel.
A tale of four kernels.
In Wilhem Schäfer, Matthew B. Dwyer, and Volker Gruhn, editors,
ICSE '08: Proceedings of the 30th International Conference on
Software Engineering, pages 381–390, New York, May 2008. Association for Computing Machinery.
Further details appear in the book chapter
Quality wars: Open
source versus proprietary software.
In Andy Oram and Greg Wilson, editors, Making Software: What Really
Works, and Why We Believe It, chapter 15, pages 259–293. O'Reilly and
Associates, Sebastopol, CA, 2010.
The data and source code are permanently archived at DOI
10.5281/zenodo.2526915.
- ameso is
a complete emulation of the Antikythera mechanism on the Squeak EToys
environment. I have developed this as a prototype base for educational
activities on the XO machine of the One Laptop per Child initiative.
See also the article
The Antikythera mechanism: A computer science perspective.
IEEE Computer, 41(5):22–27, May 2008.
(doi:10.1109/MC.2008.166)
- The Information Train is a winning entry at the
"Wizards of Science 2009" scientific experiment contest.
The entry demonstrates how computers communicate with each other by setting up
a network in which a model train transfers a picture's pixels from
one computer (a normal laptop) to the other (an OLPC XO-1).
The software is implemented in
Processing and
Squeak EToys;
hardware schematics and construction details are also included.
See also the book chapter
The information train.
In Newton Lee, editor, Digital Da Vinci: Computers in
the Arts and Sciences, chapter 7, pages 129–142. Springer, 2014.
(doi:10.1007/978-1-4939-0965-0_7)
- api-verify demonstrates a framework design
for bundling static verification code with Java classes.
It extends the FindBugs static analysis tool with a plugin that checks
the corresponding method invocations.
See also the article
Diomidis Spinellis and Panagiotis Louridas.
A
framework for the static verification of API calls.
Journal of Systems and Software, 80(7):1156–1168, July 2007.
(doi:10.1016/j.jss.2006.09.040)
- ckjm
calculates Chidamber and Kemerer object-oriented metrics in Java programs
by processing the bytecode of the compiled Java files.
- The Decay and Failures of URL References
Data and programs used in the article
The Decay and Failures of Web References.
Communications of the ACM, 46(1):71-77, January 2003.
- e-democracy
is a simple e-voting application.
The goal is to keep the application's source code short and
easy to understand, so that is correctness and
trustworthiness (in terms of integrity and anonymity)
can be readilly verified and audited.
- User-level operating system transactions
allow system administrators
and ordinary users to perform a sequence of file
operations and then
commit them as a group, or abort them without leaving any
trace behind.
The page contains the source code corresponding to the article
User-level operating system transactions.
Software: Practice & Experience, 39(14):1215–1233, September 2009.
(doi:10.1002/spe.935).
- The GTWeb
system demonstrates how trip diaries can be created and presented by
exploiting the synergies of integrating different information appliances
and publicly accessible databases.
A GTWeb site
site consists of a trip overview, timelines, maps,
and annotated photographs.
The site is created by
processesing a user's GPS track log and digital camera pictures,
linking them with a gazetteer database, topography, and coastline data.
The software used for creating the site is accessible from
this page.
A related article desribes the technical aspects of the approach:
Position-annotated photographs: A geotemporal web.
IEEE Pervasive Computing, 2(2):72-79, April-June 2003.
- Information Furnace
Code samples for setting up
an Information Furnace, a
basement-installed PC-type device that
integrates existing consumer home-control, infotainment, security,
and communication technologies to transparently provide user-friendly
access and value-added services.
See also
The information furnace: Consolidated home control.
Personal and Ubiquitous Computing, 7(1):53-69, 2003, and
The information furnace: User-friendly home control.
In Proceedings of the 3rd International System Administration and
Networking Conference SANE 2002, pages 145-174, Maastricht, The
Netherlands, May 2002.
Contents — Home
Text Processing Tools
- bib2xhtml
is a program that converts BibTeX files into
HTML (specifically XHTML 1.0).
The conversion is mostly done by specialized BibTeX style files,
derived from a converted bibliography style template.
This ensures that the original BibTeX styles are faithfully
reproduced.
- Rtfbtx
BibTeX style files that create Microsoft Word RTF format output instead
of LaTeX output.
- Grconv
converts between around 100 character sets, encodings, transcription,
and transliteration methods that are used to represent Greek text.
Contents — Home
Unix Tools
- Socketpipe
Socketpipe
connects a remote command to local input and output processes without
the data copy and context switching overhead of the customary
rsh or ssh pipelines.
- Fileprune
Fileprune
will delete files from the specified set targetting a given distribution
of the files within time as well as size, number, and age constraints.
Its main purpose is to keep a set of daily-created backup files
in managable size,
while still providing reasonable access to older versions.
Specifying a size, file number, or age constraint will
simply remove files starting from the oldest, until the
constraint is met.
The distribution specification (exponential, Gaussian (normal), or Fibonacci)
provides finer control of the files to delete,
allowing the retention of recent copies and the increasingly
agressive pruning of the older files.
You can read more in the article
Organized pruning of file sets.
;login:, 28(3):39-42, June 2003.
- pmonitor
The pmonitor command allows you to monitor a job's progress
by specifying the name of the corresponding command, its process id,
or the the file being processed.
- Deltac
Byte difference compression method:
a very simple lossles compressor that does better than LZW on greymap images.
Usenet newsgroup comp.sources.misc v13i48.
This program is currently used by the CEDAR
USPS Office of Advanced Technology
Database of Handwritten Cities, States, ZIP Codes, Digits,
and Alphabetic Characters.
http://www.cedar.buffalo.edu/Databases/CDROM1/
- Pbmtochar, chartopbm
Bitmap to ASCII:
convert a bitmap to ASCII line printer art using font pattern matching.
Usenet newsgroup comp.sources.misc v11i80.
- Stat
Shell interface for the stat(2) system call.
Allow Unix shell scripts to process file characteristics .
Usenet newsgroup comp.sources.misc v10i82.
- Sed
Unix sed(1) stream editor
A POSIX compliant re-implementation of the Unix sed(1) stream editor.
Part of Apple's Mac OS X,
4.4BSD,
NetBSD,
FreeBSD, OpenBSD.
- RCS Tools
Check RCS checked out files, remove files that can be safely checked out,
and create a project state list.
- Tarfix
Convert and fix tar(1) archives.
Convert filenames in tar(1) files between different operating system
conventions.
Usenet newsgroup alt.sources Message-ID: 1990May21.134807.17537@cc.ic.ac.uk.
Contents — Home
Windows and DOS tools and ports
- Outwit
A tool suite that provides console-based access to the Windows clipboard,
registry, databases, document properties, and links.
See also
Outwit: Unix tool-based programming meets the Windows world.
In USENIX 2000 Technical Conference Proceedings, pages 149-158,
San Diego, CA, USA, June 2000. Usenix Association.
- Rtfbtx
BibTeX style files that create Microsoft Word RTF format output instead
of LaTeX output.
- The puttyclip patch to
Simon Tatham's
putty
Win32 Telnet and SSH client
allows you to easily copy remote files and the output of remote Unix programs
to your local Windows clipboard.
- MS-DOS Perl 3.0
The original port of Perl 3.0 to MS-DOS.
- Trace
Trace MS-DOS system calls.
Executes a program logging the system calls it makes.
Part of the SIMTEL archive.
See also
Trace: A tool for logging operating system call transactions.
Operating Systems Review, 28(4):56-63, October 1994.
- Popen for MS-DOS
Functions popen(3S), pclose(3S) under MS-DOS.
An implementation of popen(3S), pclose(3S) (open stdio pipes) under MS-DOS.
Usenet newsgroup alt.sources Message-ID: 1669@gould.doc.ic.ac.uk.
Contents — Home
Library functions
The following pages do not contain navigation links.
Following a link will open a new browser window.
- Credit card checksum verifier (C source code).
- Zopen
Compress library routines:
read and write compressed files opened with the zopen function call (C source code).
Part of 4.4BSD,
NetBSD,
FreeBSD, OpenBSD.
Contents Home
Greek language tools and resources
- The Greek classifier
will process lines of ASCII text, and will print those lines that (probably) match a Greek surname.
- Grconv
converts between around 100 character sets, encodings, transcription,
and transliteration methods that are used to represent Greek text.
- Greek ispell
Greek dictionary and related files for the ispell spell checker.
- Elvis Greek
Character maps and digraphs for editing Greek text in Windows CP1253 using
the elvis vi editor clone.
Contents — Home
Diversions
Contents — Home
Down the Memory Lane
-
TI-99/4A
Source code and screen dumps of programs written for the
TI-99/4A home computer in TI BASIC (1982-1983).
-
Pascal programs
Source code and executable code for programs written in Turbo Pascal on the
original IBM-PC (1985-1986).
The programs are a TMS-9900 family assembler, a universal parser, and a
configurable database system.
-
C Profiler
An execution profiler designed for programs compiled using the Microsoft C
16-bit compilers on the 8086 family of processors (1988).
-
Ericsson GH-388 interfacing
Programs that
interface to an Ericsson GH-388 mobile phone at the frame protocol level.
(Please read the relevant portion of the
frequently asked questions before
sending me mail concerning this program.)
- RC
Universal remote control for the HP-100 LX palmtop PC.
Provides a configuration script language.
See also
Palmtop programmable appliance controls.
Personal Technologies (Personal and Ubiquitous Computing),
2(1):11-17, March 1998.
- Progcalc
Programmer's calculator for the HP-100/HP-200LX palmtop PCs.
- Hpsound
Sound play program for the HP-95 LX palmtop PC.
Utilises the D/A converter of the HP95-LX to provide
realistic sound effects.
-
MP3 file mixer
Create, edit, and play mixed sequences of MP3 (MPEG layer 3) files.
Contents — Home