Documentation
Diomidis Spinellis
Department of Management Science and Technology
Athens University of Economics and Business
Athens, Greece
dds@aueb.gr
Documentation Types
- System specification
- Software requirements specification
- Design specification
- Test specification
- User documentation
- Functional description
- Installation instructions
- Introductory guide, tutorial
- Reference manual
- Administrator manual
Often the only documentation available!
Shortcut for Code Understanding
Code
line = gobble = 0;
for (prev = '\n'; (ch = getc(fp)) != EOF; prev = ch) {
if (prev == '\n') {
if (ch == '\n') {
if (sflag) {
if (!gobble && putchar(ch) == EOF)
break;
gobble = 1;
continue;
}
[...]
}
}
gobble = 0;
[...]
}
Documentation
-s Squeeze multiple adjacent empty lines, causing the output to be
single spaced.
Specifications for Code Inspection
Code
(Apache)
switch (*method) {
case 'H':
if (strcmp(method, "HEAD") == 0)
return M_GET; /* see header_only in request_rec */
break;
case 'G':
if (strcmp(method, "GET") == 0)
return M_GET;
break;
case 'P':
if (strcmp(method, "POST") == 0)
return M_POST;
if (strcmp(method, "PUT") == 0)
return M_PUT;
if (strcmp(method, "PATCH") == 0)
return M_PATCH;
Specification
(RFC-2068)
The Method token indicates the method to be performed on the
resource identified by the Request-URI. The method is
case-sensitive.
Method = "OPTIONS" ; Section 9.2
| "GET" ; Section 9.3
| "HEAD" ; Section 9.4
| "POST" ; Section 9.5
| "PUT" ; Section 9.6
| "DELETE" ; Section 9.7
| "TRACE" ; Section 9.8
| extension-method
Obtain System Structure
Sendmail Files
arpadate.c, clock.c, collect.c, conf.c, convtime.c, daemon.c, deliver.c,
domain.c, envelope.c, err.c, headers.c, macro.c, main.c, map.c, mci.c,
mime.c, parseaddr.c, queue.c, readcf.c, recipient.c, safefile.c,
savemail.c, srvrsmtp.c, stab.c, stats.c, sysexits.c, trace.c, udb.c,
usersmtp.c, util.c, version.c,
Sendmail Documentation Headings
2.5. | Configuration file | readcf.c |
3.3.1. | Aliasing | alias.c |
3.4. | Message collection | collect.c |
3.5. | Message delivery | deliver.c |
3.6. | Queued messages | queue.c |
3.7. | Configuration | conf.c |
3.7.1. | Macros | macro.c |
3.7.2. | Header declarations | headers.c, envelope.c |
3.7.4. | Address rewriting rules | parseaddr.c |
Understand complicated algorithms
Code
for (arcp = memp->parents ; arcp ; arcp = arcp->arc_parentlist) {
[...]
if ( headp -> npropcall ) {
headp -> propfraction += parentp -> propfraction
* ( ( (double) arcp -> arc_count )
/ ( (double) headp -> npropcall ) );
}
}
Documentation
Obtain the Meaning of Source Code Identifiers
Code
#define TCPS_ESTABLISHED 4 /* established */
(Notice useless comment.)
Documentation
RFC-793
ESTABLISHED - represents an open connection, data received can be delivered to the user. The normal state for the data transfer phase of the connection.
+---------+ ---------\ active OPEN
| CLOSED | \ -----------
+---------+<---------\ \ create TCB
| ^ \ \ snd SYN
passive OPEN | | CLOSE \ \
------------ | | ---------- \ \
create TCB | | delete TCB \ \
V | \ \
+---------+ CLOSE | \
| LISTEN | ---------- | |
+---------+ delete TCB | |
rcv SYN | | SEND | |
----------- | | ------- | V
+---------+ snd SYN,ACK / \ snd SYN +---------+
| |<----------------- ------------------>| |
| SYN | rcv SYN | SYN |
| RCVD |<-----------------------------------------------| SENT |
| | snd ACK | |
| |------------------ -------------------| |
+---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+
| -------------- | | -----------
| x | | snd ACK
| V V
| CLOSE +---------+
| ------- | ESTAB |
| snd FIN +---------+
| CLOSE | | rcv FIN
V ------- | | -------
+---------+ snd FIN / \ snd ACK +---------+
| FIN |<----------------- ------------------>| CLOSE |
| WAIT-1 |------------------ | WAIT |
+---------+ rcv FIN \ +---------+
| rcv ACK of FIN ------- | CLOSE |
| -------------- snd ACK | ------- |
V x V snd FIN V
+---------+ +---------+ +---------+
|FINWAIT-2| | CLOSING | | LAST-ACK|
+---------+ +---------+ +---------+
| rcv ACK of FIN | rcv ACK of FIN |
| rcv FIN -------------- | Timeout=2MSL -------------- |
| ------- x V ------------ x V
\ snd ACK +---------+delete TCB +---------+
------------------------>|TIME WAIT|------------------>| CLOSED |
+---------+ +---------+
Rationale Behind Nonfunctional Requirements
Code
if (newdp->d_cred > dp->d_cred) {
/* better credibility.
* remove the old datum.
*/
goto delete;
}
Documentation
(P. Vixie's BIND Security Paper)
5.1. Cache Tagging
BIND now maintains for each cached RR a "credibility" level showing
whether the data came from a zone, an authoritative answer, an authority
section, or additional data section. When a more credible RRset comes in,
the old one is completely wiped out. Older BINDs blindly aggregated data
from all sources, paying no attention to the maxim that some sources
are better than others.
Design Intelligence
- System requirements
- Architecture
- Implementation
- Rejected alternatives
Case
Pike and Thompson on adopting UTF over 16-bit Unicode representation
in Plan 9:
Unicode defines an adequate character set but an unreasonable
representation. The Unicode standard states that all characters are 16
bits wide and are communicated in 16-bit units.... To adopt Unicode,
we would have had to convert all text going into and out of Plan
9 between ASCII and Unicode, which cannot be done. Within a single
program, in command of all its input and output, it is possible to
define characters as 16-bit quantities; in the context of a networked
system with hundreds of applications on diverse machines by different
manufacturers, it is impossible.
[...]
The UTF encoding has several good properties. By far the most important
is that a byte in the ASCII range 0-127 represents itself in UTF. Thus
UTF is backward compatible with ASCII.
Internal Programming Interfaces
Examples:
- Hsqldb HTML code documentation
- Perlguts
- FreeBSD kernel interfaces: manual volume 9
Test Cases and Examples of Actual Use
Examples from the tcpdump documentation:
To print all ftp traffic through internet gateway snup: (note that the
expression is quoted to prevent the shell from (mis-)interpreting the
parentheses):
tcpdump 'gateway snup and (port ftp or ftp-data)'
To print the start and end packets (the SYN and FIN packets) of each
TCP conversation that involves a non-local host.
tcpdump 'tcp[13] & 3 != 0 and not src and dst net localnet'
Implementation Problems and Bugs
at: limitations
At and batch as presently implemented are not suitable when users are
competing for resources. If this is the case for your site, you might
want to consider another batch system, such as nqs.
cat: caveats
Because of the shell language mechanism used to perform output
redirection, the command
"cat file1 file2 > file1"
will cause the
original data in file1 to be destroyed! This is performed by the shell
before cat is run.
strftime: humor
There is no conversion specification for the phase of the moon.
ctags: bugs
Recognition of functions, subroutines and procedures for FORTRAN and
Pascal is done in a very simpleminded way. No attempt is made to deal
with block structure; if you have two Pascal procedures in different
blocks with the same name you lose.
Development and Execution Environment Problems
// The following function is not inline, to avoid build (template
// instantiation) problems with Sun C++ 4.2 patch 104631-07/SunOS 5.6.
(Often comments are harsher)
Trouble Spots
2001-09-17 Urban [...]
* proc.c: Go back to the interruptible sleep as reconnects
seem to handle it now.
[...]
2001-07-09 Jochen [...]
* proc.c, ioctl.c: Allow smbmount to signal failure to reconnect
with a NULL argument to SMB-IOC-NEWCONN (speeds up error
detection).
[...]
2001-04-21 Urban [...]
* dir.c, proc.c: replace tests on conn-pid with tests on state
to fix smbmount reconnect on smb_retry timeout and up the
timeout to 30s.
[...]
2000-08-14 Urban [...]
* proc.c: don't do interruptable_sleep in smb_retry to avoid
signal problem/race.
[...]
1999-11-16 Andrew [...]
* proc.c: don't sleep every time with win95 on a FINDNEXT
Undocumented Features
Why?
- Not officially supported
- Provided only as a support mechanism for suitably trained engineers
- Experimental or intended for a future release
- Used by the product's vendor to gain an advantage over the competition
- Badly implemented
- A security threat
- Intended only for a subset of the users or product versions
- A Trojan horse, time bomb, or back door
- Oversight
Additional Documentation Sources
- Comments
- Standards
- Publications
- Test cases
- Mailing lists
- Newsgroups
- Revision logs
- Issue-tracking databases
- Marketing material
- The source code
Common Open-Source Documentation Formats
- troff
- texinfo
- DocBook
- javadoc
- Doxygen (C, C++, Java into HTML LaTeX)
Important: properly typeset the documentation for printing.
Further Reading
- Jon Louis Bentley,
Donald E. Knuth, and Douglas McIlroy.
A literate program.
Communications of the ACM, 19(6):471–483, June 1986.
- Rohan T. Douglas.
Error message management.
Dr. Dobb's Journal, 15(1):48–51, January 1990.
- Narain Gehani.
Document
Formatting and Typesetting on the UNIX System.
Silicon Press, Summit, NJ, second edition, 1987.
- Eric Hamilton.
Literate programming—expanding generalized regular expressions.
Communications of the ACM, 31(12):1376–1385, December 1988.
- David R. Hanson.
Literate programming—printing common words.
Communications of the ACM, 30(7):594–599, July 1987.
- Robert A. Heinlein.
Stranger in a Strange Land.
G. P. Putnam's Sons, New York, 1961.
- Michael A. Jackson.
Literate programming—processing transactions.
Communications of the ACM, 30(12):1000–1010, December 1987.
- Brian W. Kernighan.
A typesetter-independent TROFF.
Computer Science Technical Report 97, Bell Laboratories, Murray Hill, NJ, 1982.
Available online at http://cm.bell-labs.com/cm/cs/cstr.
- Donald E. Knuth and
Silvio Levy.
The CWEB
System of Structured Documentation.
Addison-Wesley, Reading, MA, 1993.
- Donald E. Knuth.
The
TeXbook.
Addison-Wesley, Reading, MA, 1989.
- Donald E. Knuth.
Literate
Programming.
CSLI Lecture Notes Number 27. Stanford University Center for the Study of
Language and Information, Stanford, CA, 1992.
Distributed by the University of Chicago Press.
- Mark F. Komarinski,
Jorge Godoy, and David C. Merrill.
LDP author
guide.
Available online http://www.linuxdoc.org/LDP/LDP-Author-Guide.pdf (January
2002), December 2001.
- Leslie Lamport.
LATEX: A
Document Preparation System.
Addison-Wesley, Reading, MA, second edition, 1994.
- Samuel J. Leffler,
Marshall Kirk McKusick, Michael J. Karels, and John S. Quarterman.
The Design and
Implementation of the 4.3BSD Unix Operating System.
Addison-Wesley, Reading, MA, 1988.
- J. F. Ossanna.
NROFF/TROFF user's manual.
In Unix Programmer's Manual [Unix Programmer's Manual, 1979].
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Rob Pike and Ken
Thompson.
Hello world.
In Dan Geer, editor, USENIX Technical Conference Proceedings,
pages 43–50, Berkeley, CA, Winter 1993. Usenix Association.
- Eric S. Raymond.
The New
Hacker's Dictionary.
MIT Press, Cambridge, third edition, 1996.
- Diomidis Spinellis.
Code Reading: The Open
Source Perspective, pages 241–266.
Effective Software Development Series. Addison-Wesley, Boston, MA, 2003.
- Diomidis Spinellis.
Code
documentation.
IEEE Software, 27(4):18–19, July/August 2010.
(doi:10.1109/MS.2010.95 (http://dx.doi.org/10.1109/MS.2010.95))
- UNIX
Programmer's Manual. Volume 2—Supplementary Documents.
Bell Telephone Laboratories, Murray Hill, NJ, seventh edition, 1979.
Also available online http://plan9.bell-labs.com/7thEdMan/.
- Norman Walsh and
Leonard Muellner, editors.
DocBook: The
Definitive Guide.
O'Reilly and Associates, Sebastopol, CA, 1999.
- Christopher J. Van Wyk
and Donald C. Lindsay.
Literate programming: A file difference program.
Communications of the ACM, 32(6):740–755, June 1989.
Exercises and Discussion Topics
-
Select three large projects from the course's reference source code
and classify
the available documentation.
-
Comment on the applicability of the documentation types
we described in open-source development efforts.
-
Present an overview of the source organization of apache Web server
by examining the provided documentation.
-
Locate one instance of a published algorithm reference
in the course's reference source code.
Map the published version of the algorithm against its implementation.
-
Categorize and tabulate the types of problems described in the
Bugs section of the Unix manual pages and sort them
according to their frequency.
Discuss the results you obtained.
-
The course's reference source code
contains over 40 references to undocumented behavior.
Locate them and discuss the most common observed cause of documentation
discrepancies.
-
Compare the documentation formats we described on usability,
readability, features provided, and amenability to automated
processing by ad-hoc tools.
-
Locate and typeset on a high quality output device
each of the different documentation formats
available in the course's reference source code
in your local environment.
Discuss the difficulties you encountered.