This document
is a modified version of a document from
a committee formed at AT&T's Indian Hill labs to establish
a common set of coding standards and recommendations for the
Indian Hill community.
The scope of this work is C coding style.
Good style should encourage consistent layout, improve
portability, and reduce errors.
This work does not cover functional organization, or general
issues such as the use of
goto
s.
We
have tried to combine previous work [1,6,8] on C style into a uniform
set of standards that should be appropriate for any project using C,
although parts are biased towards particular systems.
The opinions in this document
do not reflect the opinions of all authors.
Please reflect comments and suggestions to
the last author.
Of necessity, these standards cannot cover all situations.
Experience and informed judgement count for much.
Programmers who encounter unusual situations should
consult either
experienced C programmers or code written by experienced C
programmers (preferably following these rules).
Ultimately, the goal of these standards is to increase portability, reduce maintenance, and above all improve clarity.
Many of the style choices here are somewhat arbitrary. Mixed coding style is harder to maintain than bad coding style. When changing existing code it is better to conform to the style (indentation, spacing, commenting, naming conventions) of the existing code than it is to blindly follow this document. This is particularly relevant when coding Microsoft Windows programs which depend on the Microsoft style of declarations and coding.
``To be clear is professional; not to be clear is unprofessional.'' - Sir Ernest Gowers.
A file consists of various sections that should be separated by several blank lines. Although there is no maximum length limit for source files, files with more than about 1000 lines are cumbersome to deal with. The editor may not have enough temp space to edit the file, compilations will go more slowly, etc. Many rows of asterisks, for example, present little information compared to the time it takes to scroll past, and are discouraged. Lines longer than 79 columns are not handled well by all terminals or windows and should be avoided if possible. Excessively long lines which result from deep indenting are often a symptom of poorly-organized code.
File names are made up of a base name, and an optional period and suffix. The first character of the name should be a letter and all characters (except the period) should be lower-case letters and numbers. The base name should be eight or fewer characters and the suffix should be three or fewer characters (four, if you include the period). These rules apply to both program files and default files used and produced by the program (e.g., ``rogue.sav'').
Some compilers and tools require certain suffix conventions for names of files [5]. The following suffixes are required:
The following conventions are universally followed:
In addition, it is conventional to use ``Makefile'' for the control file for make (for systems that support it) and ``README'' for a summary of the contents of the directory or directory tree.
The suggested order of sections for a program file is as follows:
/*
* bitmap -- Routines that operate on square bitmaps
*
* (C) Copyright Yoyodyne Enterprises. All rights reserved.
*
* Author: John Smith
*
* $Header$
*
*/
Header files are files that are included in other files prior to compilation by the C preprocessor. Some, such as stdio.h, are defined at the system level and must included by any program using the standard I/O library. Header files are also used to contain data declarations and defines that are needed by more than one program. Header files should be functionally organized, i.e., declarations for separate subsystems should be in separate header files. Also, if a set of declarations is likely to change when code is ported from one machine to another, those declarations should be in a separate header file.
Avoid private header filenames that are the same
as library header filenames.
The statement
#include
"""math.h"""
will include the standard library math header file
if the intended one is not
found in the current directory.
If this is what you want to happen,
comment this fact.
Don't use absolute pathnames for header files.
Use the
<name>
construction for getting them from a standard
place, or define them relative to the current directory.
The ``include-path'' option of the C compiler
(-I on many systems)
is the best way to handle
extensive private libraries of header files; it permits reorganizing
the directory structure without having to alter source files.
Header files that declare functions or external variables should be included in the file that defines the function or variable. That way, the compiler can do type checking and the external declaration will always agree with the definition.
Defining variables in a header file is often a poor idea. Frequently it is a symptom of poor partitioning of code between files. Also, some objects like typedefs and initialized data definitions cannot be seen twice by the compiler in one compilation. On some systems, repeating uninitialized declarations without the extern keyword also causes problems. Repeated declarations can happen if include files are nested and will cause the compilation to fail.
Header files should not be nested. The prologue for a header file should, therefore, describe what other headers need to be #included for the header to be functional. In extreme cases, where a large number of header files are to be included in several different source files, it is acceptable to put all common #includes in one include file.
It is common to put the following into each
.h
file
to prevent accidental double-inclusion.
#ifndef EXAMPLE_H
#define EXAMPLE_H
/* body of example.h file */
/* ... */
#endif /* EXAMPLE_H */
This double-inclusion mechanism should not be relied upon, particularly to perform nested includes.
It is conventional to have a file called ``README'' to document both ``the bigger picture'' and issues for the program as a whole. For example, it is common to include a list of all conditional compilation flags and what they mean. It is also common to list files that are machine dependent, etc.
``When the code and the comments disagree, both are probably wrong.'' - Norm Schryer
The comments should describe what is happening, how it is being done, what parameters mean, which globals are used and which are modified, and any restrictions or bugs. Avoid, however, comments that are clear from the code, as such information rapidly gets out of date. Comments that disagree with the code are of negative value. Short comments should be what comments, such as ``compute mean value'', rather than how comments such as ``sum of values divided by n''. C is not assembler; putting a comment at the top of a 3-10 line section telling what it does overall is often more useful than a comment on each line describing micrologic.
Comments should justify offensive code. The justification should be that something bad will happen if unoffensive code is used. Just making code faster is not enough to rationalize a hack; the performance must be shown to be unacceptable without the hack. The comment should explain the unacceptable behavior and describe why the hack is a ``good'' fix.
Comments that describe data structures, algorithms, etc., should be
in block comment form with the opening
/*
in columns 1-2, a
*
in column 2 before each line of comment text,
and the closing
*/
in columns 2-3.
/*
* Here is a block comment.
* The comment text should be tabbed or spaced over uniformly.
* The opening slash-star and closing star-slash are alone on a line.
*/
Note that grep '^. *' will catch all block comments in the
file.
Some automated program-analysis
packages use different characters before comment lines as
a marker for lines with specific items of information.
In particular, a line with a
`-
'
in a comment preceding a function
is sometimes assumed to be a one-line summary of the function's
purpose.
Very long block comments such as drawn-out discussions and copyright
notices often start with
/*
in columns 1-2, no leading
*
before lines of text, and the closing
*/
in columns 1-2.
Block comments inside a function are appropriate, and
they should be tabbed over to the same tab setting as the code that
they describe.
One-line comments alone on a line should be indented to the tab
setting of the code that follows.
if (argc > 1) {
/* Get input file from command line. */
if (freopen(argv[1], "r", stdin) == NULL) {
perror (argv[1]);
}
}
Very short comments may appear on the same line as the code they describe, and should be tabbed over to separate them from the statements. If more than one short comment appears in a block of code they should all be tabbed to the same tab setting.
if (a == EXCEPTION) {
b = TRUE; /* special case */
} else {
b = isprime(a); /* works only for odd a */
}
Global declarations should begin in column 1.
All external data declaration should be preceded by the
extern
keyword.
If an external variable is an array that is defined with an explicit
size, then the array bounds must be repeated in the extern
declaration unless the size is always encoded in the array
(e.g., a read-only character array that is always null-terminated).
Repeated size declarations are
particularly beneficial to someone picking up code written by another.
The ``pointer'' qualifier,
`*
',
should be with the variable name rather
than with the type.
char *s, *t, *u;
char* s, t, u;
t
'
and
`u
'
do not get declared as pointers.
Unrelated declarations, even of the same type,
should be on separate lines.
A comment describing the role of the object being declared should be
included, with the exception
that a list of
#define
d
constants do not need comments
if the constant names are sufficient documentation.
The names, values, and comments
are usually
tabbed so that they line up underneath each other.
Use the tab character rather than blanks (spaces).
For structure and union template declarations,
each element should be alone on a line
with a comment describing it.
The opening brace
({
)
should be on the same line as the structure
tag, and the closing brace
(}
)
should be in column 1.
struct boat {
int wllength; /* water line length in meters */
int type; /* see below */
long sailarea; /* sail area in square mm */
};
/* defines for boat.type */
#define KETCH (1)
#define YAWL (2)
#define SLOOP (3)
#define SQRIG (4)
#define MOTOR (5)
These defines are sometimes put right after the declaration of
type
,
within the
struct
declaration, with enough tabs after the
`#
'
to indent
define
one level more than the structure member declarations.
When the actual values are unimportant,
the
enum
facility is better.
enum bt { KETCH=1, YAWL, SLOOP, SQRIG, MOTOR };
struct boat {
int wllength; /* water line length in meters */
enum bt type; /* what kind of boat */
long sailarea; /* sail area in square mm */
};
Any variable whose initial value is important should be
explicitly initialized, or at the very least should be commented
to indicate that C's default initialization to zero
is being relied upon.
The empty initializer,
``{}
'',
should never be used.
Structure
initializations should be fully parenthesized with braces.
Constants used to initialize longs should be explicitly long.
Use capital letters; for example two long
``2l
''
looks a lot like
``21
'',
the number twenty-one.
int x = 1;
char *msg = "message";
struct boat winner[] = {
{ 40, YAWL, 6000000L },
{ 28, MOTOR, 0L },
{ 0 },
};
In any file which is part of a larger whole rather than a self-contained
program, maximum use should be made of the
static
keyword to make functions and variables local to single files.
Variables in particular should be accessible from other files
only when there is a clear
need that cannot be filled in another way.
Such usage should be commented to make it clear that another file's
variables are being used; the comment should name the other file.
If your debugger hides static objects you need to see during
debugging,
declare them as
STATIC
and #define
STATIC
as needed.
The most important types should be highlighted by typedeffing them, even if they are only integers, as the unique name makes the program easier to read (as long as there are only a few things typedeffed to integers!). Avoid typedeffing structures and unions, as this hides the fact that an object is composite from the code reader.
The return type of functions should always be declared.
Always use function prototypes.
One common mistake is to omit the
declaration of external math functions that return
double
.
The compiler then assumes that
the return value is an integer and the bits are dutifully
converted into a (meaningless) floating point value.
``C takes the point of view that the programmer is always right.'' - Michael DeCorte
Each function should be preceded by a block comment prologue that gives a short description of what the function does and (if not clear) how to use it. Discussion of non-trivial design decisions and side-effects is also appropriate. Avoid duplicating information clear from the code.
The function return type should be alone on a line,
(optionally) indented one stop.
``Tabstops'' can be blanks (spaces) inserted by your editor in clumps
of 2, 4, or 8.
Do not default to
int
;
if the function does not return a value then it should be given
return type void.
If the value returned requires a long explanation,
it should be given in the prologue;
otherwise it can be on the same line as the return type, tabbed over.
The function name
(and the formal parameter list)
should be alone on a line, in column 1.
Destination (return value) parameters
should generally be first (on the left).
All local declarations and code within the function body
should be tabbed over one stop.
The opening brace of the function body should be alone on a line
beginning in column 1.
Each parameter should be declared (do not default to
int
).
In general the role of each variable in the function should be described.
This may either be done in the function comment or, if each declaration
is on its own line, in a comment on that line.
Loop counters called ``i'', string pointers called ``s'',
and integral types called ``c'' and used for characters
are typically excluded.
If a group of functions all have a like parameter or local variable,
it helps to call the repeated variable by the same name in all
functions.
(Conversely, avoid using the same name for different purposes in
related functions.)
Like parameters should also appear in the same place in the various
argument lists.
Comments for parameters and local variables should be tabbed so that they line up underneath each other. Local variable declarations should be separated from the function's statements by a blank line.
Be careful when you use or declare functions that take a variable number of arguments (``varargs''). Always use the ``stdarg.h'' header definitions and do not rely on item order on the stack.
If the function uses any external variables (or functions)
that are not declared globally in the file,
these should have their
own declarations in the function body using the
extern
keyword.
Avoid local declarations that override declarations at higher levels. In particular, local variables should not be redeclared in nested blocks. Although this is valid C, the potential confusion is enough that lint will complain about it when given the -h option.
int i;main(){for(;i["]<i;++i){--i;}"];read('-'-'-',i+++"hell\
o, world!\n",'/'/'/'));}read(j,i,p){write(j/p+p,i---j,i/i);}
- Dishonorable mention, Obfuscated C Code Contest, 1984.
Author requested anonymity.
Use vertical and horizontal whitespace generously. Indentation and spacing should reflect the block structure of the code; e.g., there should be at least 2 blank lines between the end of one function and the comments for the next.
A long string of conditional operators should be split onto separate lines.
if (foo->next==NULL && totalcount<needed && needed<=MAX_ALLOT
&& server_active(current_input)) { ...
if (foo->next == NULL
&& totalcount < needed && needed <= MAX_ALLOT
&& server_active(current_input))
{
...
for
loops should be split onto different lines.
for (curr = *listp, trail = listp;
curr != NULL;
trail = &(curr->next), curr = curr->next )
{
...
?:
operator,
are best split on to several lines, too.
c = (a == b)
? d + f(a)
: f(b) - d;
sizeof
operator is an exception.)
Blanks should also appear after commas in argument lists to help
separate the arguments visually.
On the other hand, macro definitions with arguments must
not have a blank between the name and the left parenthesis,
otherwise the C preprocessor will not recognize the argument list.
/*
* Determine if the sky is blue by checking that it isn't night.
* CAVEAT: Only sometimes right. May return TRUE when the answer
* is FALSE. Consider clouds, eclipses, short days.
* NOTE: Uses `hour' from `hightime.c'. Returns `int' for
* compatibility with the old version.
*/
int /* true or false */
skyblue(void)
{
extern int hour; /* current hour of the day */
return (hour >= MORNING && hour <= EVENING);
}
/*
* Find the last element in the linked list
* pointed to by nodep and return a pointer to it.
* Return NULL if there is no last element.
*/
node_t *
tail(node_t *nodep)
{
node_t *np; /* advances to NULL */
node_t *lp; /* follows one behind np */
if (nodep == NULL)
return (NULL);
for (np = lp = nodep; np != NULL; lp = np, np = np->next)
; /* VOID */
return (lp);
}
There should be only one statement per line unless the statements are very closely related.
case FOO: oogle (zork); boogle (zork); break;
case BAR: oogle (bork); boogle (zork); break;
case BAZ: oogle (gork); boogle (bork); break;
for
or
while
loop should be alone on a line and commented
so that it is clear that the null body is intentional
and not missing code.
while (*dest++ = *src++)
; /* VOID */
Do not default the test for non-zero, i.e.
if (f() != FAIL)
if (f())
FAIL
may have the value 0 which C considers to be false.
An explicit test will help you out later when somebody decides that a
failure return should be -1 instead of 0.
Explicit comparison should be used even if the comparison value will
never change; e.g.,
``if (!(bufsize % sizeof(int)))
''
should be written instead as
``if ((bufsize % sizeof(int)) == 0)
''
to reflect the numeric (not boolean) nature of the test.
A frequent trouble spot is using
strcmp
to test for string equality, where the result should never
ever be defaulted.
The preferred approach is to define a macro STREQ.
#define STREQ(a, b) (strcmp((a), (b)) == 0)
The non-zero test is often defaulted for predicates and other functions or expressions which meet the following restrictions:
It is common practice to declare a boolean type
``bool
''
in a global include file.
The special names improve readability immensely.
typedef int bool;
#define FALSE 0
#define TRUE 1
typedef enum { NO=0, YES } bool;
Even with these declarations, do not check a boolean value for equality with 1 (TRUE, YES, etc.); instead test for inequality with 0 (FALSE, NO, etc.). Most functions are guaranteed to return 0 if false, but only non-zero if true. Thus,
if (func() == TRUE) { ...
if (func() != FALSE) { ...
if (isvalid()) { ...
There is a time and a place for embedded assignment statements. In some constructs there is no better way to accomplish the results without making the code bulkier and less readable.
while ((c = getchar()) != EOF) {
process the character
}
++
and
--
operators count as assignment statements.
So, for many purposes, do functions with side effects.
Using embedded assignment statements to improve run-time performance
is also possible.
However, one should consider the tradeoff between increased speed and
decreased maintainability that results when embedded assignments are
used in artificial places.
For example,
a = b + c;
d = a + r;
d = (a = b + c) + r;
Goto statements should be used sparingly, as in any well-structured
code.
The main place where they can be usefully employed is to break out
of several levels of
switch
,
for
,
and
while
nesting,
although the need to do such a thing may indicate
that the inner constructs should be broken out into
a separate function, with a success/failure return code.
for (...) {
while (...) {
...
if (disaster)
goto error;
}
}
...
error:
clean up the mess
goto
is necessary the accompanying label should be alone
on a line and tabbed one stop to the left of the
code that follows.
The goto should be commented (possibly in the block header)
as to its utility and purpose.
Continue
should be used sparingly and near the top of the loop.
Break
is less troublesome.
A compound statement is a list of statements enclosed by braces. There are many common ways of formatting the braces. Please be consistent with our local standard. When editing someone else's code, always use the style used in that code.
control {
statement;
statement;
}
The style above is called ``K&R style'', and is
preferred if you haven't already got a favorite.
With K&R style, the
else
part of an
if-else statement
and the
while
part of a do-while statement
should appear on the same line as the close brace.
With most other styles, the braces are always alone on a line.
When a block of code has several labels
(unless there are a lot of them),
the labels are placed on separate lines.
The fall-through feature of the C switch statement,
(that is, when there is no
break
between a code segment and the next
case
statement)
must be commented for future maintenance.
A lint-style comment/directive is best.
switch (expr) {
case ABC:
case DEF:
statement;
break;
case UVW:
statement;
/*FALLTHROUGH*/
case XYZ:
statement;
break;
}
Here, the last
break
is unnecessary, but is required
because it prevents a fall-through error if another
case
is added later after the last one.
The
default
case, if used, should be last and does not require a
break
if it is last.
Whenever an
if-else
statement has a compound statement for either the
if
or
else
section, the statements of both the
if
and
else
sections should both be enclosed in braces
(called fully bracketed syntax).
if (expr) {
statement;
} else {
statement;
statement;
}
(ex1)
and its mate are omitted:
if (ex1) {
if (ex2) {
funca();
}
} else {
funcb();
}
An if-else with else if should be written with the else conditions left-justified.
if (STREQ (reply, "yes")) {
statements for yes
...
} else if (STREQ (reply, "no")) {
...
} else if (STREQ (reply, "maybe")) {
...
} else {
statements for default
...
}
Do-while
loops should always have braces around the body.
Forever loops should be coded using the
for
(;;)
construct, and not the
while
(1)
construct.
Do not use braces for single statement blocks.
for (;;)
function();
Sometimes an
if
causes an unconditional control transfer
via
break
,
continue
,
goto
,
or
return
.
The
else
should be implicit and the code should not be indented.
if (level > limit)
return (OVERFLOW)
normal();
return (level);
Unary operators should not be separated from their single operand.
Generally, all binary operators
except
`.
'
and
`->
'
should be separated from their operands by blanks.
Some judgement is called for in the case of complex expressions,
which may be clearer if the ``inner'' operators are not surrounded
by spaces and the ``outer'' ones are.
If you think an expression will be hard to read, consider breaking it across lines. Splitting at the lowest-precedence operator near the break is best. Since C has some unexpected precedence rules, expressions involving mixed operators should be parenthesized. Too many parentheses, however, can make a line harder to read because humans aren't good at parenthesis-matching.
There is a time and place for the binary comma operator,
but generally it should be avoided.
The comma operator is most useful
to provide multiple initializations or operations,
as in for statements.
Complex expressions,
for instance those with nested ternary
?:
operators,
can be confusing and should be avoided if possible.
There are some macros like
getchar
where both the ternary
operator and comma operators are useful.
The logical expression operand before the
?:
should be parenthesized and both return values must be the same type.
Individual projects will no doubt have their own naming conventions. There are some general rules however.
getchar
and
putchar
)
are in lower case
since they may also exist as functions.
Lower-case macro names are only acceptable if the macros behave
like a function call,
that is, they evaluate their parameters exactly once and
do not assign values to named parameters.
Sometimes it is impossible to write a macro that behaves like a
function even though the arguments are evaluated exactly once.
In general, global names (including
enum
s)
should have a
common prefix identifying the module that they belong with.
Globals may alternatively be grouped in a global structure.
Typedeffed names often have
``_t
''
appended to their name.
Avoid names that might conflict with various standard library names. Some systems will include more library code than you want. Also, your program may be extended someday.
Also note the following (from [15]):
``Length is not a virtue in a name; clarity of expression is. A global variable rarely used may deserve a long name,
maxphysaddr
say. An array index used on every line of a loop needn't be named any more elaborately thani
. Sayingindex
orelementnumber
is more to type (or calls upon your text editor) and obscures the details of the computation. When the variable names are huge, it's harder to see what's going on. This is partly a typographic issue; considervs.for(i=0 to 100) array[i]=0The problem gets worse fast with real examples. Indices are just notation, so treat them as such.''for(elementnumber=0 to 100) array[elementnumber]=0;``Pointers also require sensible notation.
np
is just as mnemonic asnodepointer
if you consistently use a naming convention from whichnp
means ``node pointer'' is easily derived.''As in all other aspects of readable programming, consistency is important in naming. If you call one variable
maxphysaddr
, don't call its cousinlowestaddress
.''``Finally, I prefer minimum-length but maximum-information names, and then let the context fill in the rest. Globals, for instance, typically have little context when they are used, so their names need to be relatively evocative. Thus I say
maxphysaddr
(notMaximumPhysicalAddress
) for a global variable, butnp
notNodePointer
for a pointer locally defined and used. This is largely a matter of taste, but taste is relevant to clarity.I eschew embedded capital letters in names; to my prose-oriented eyes, they are too awkward to read comfortably. They jangle like bad typography.'' ``Procedure names should reflect what they do; function names should reflect what they return. Functions are used in expressions, often in things like
if
's, so they need to read appropriately.is unhelpful because we can't deduce whether checksize returns true on error or non-error; insteadif(checksize(x))makes the point clear and makes a future mistake in using the routine less likely.''if(validsize(x))
Numerical constants should not be coded directly.
The
#define
feature of the C preprocessor should be used to
give constants meaningful names.
Symbolic constants make the code easier to read.
Defining the value in one place
also makes it easier to administer large programs since the
constant value can be changed uniformly by changing only the
define.
The enumeration data type is a better way to declare variables
that take on only a discrete set of values, since
additional type checking is often available.
At the very least, any directly-coded numerical constant must have a
comment explaining the derivation of the value.
Constants should be defined consistently with their use;
e.g. use
540.0
for a float instead of
540
with an implicit float cast.
There are some cases where the constants 0 and 1 may appear as
themselves instead of as defines.
For example if a
for
loop indexes through an array, then
for (i = 0; i < ARYBOUND; i++)
door_t *front_door = opens(door[i], 7);
if (front_door == 0)
error("can't open %s\n", door[i]);
front_door
is a pointer.
When a value is a pointer it should be compared to
NULL
instead of 0.
NULL
is available
as part of the standard I/O library's header file stdio.h
and stdlib.h.
Even simple values like 1 or 0 are often better expressed using
defines like
TRUE
and
FALSE
(sometimes
YES
and
NO
read better).
Simple character constants should be defined as character literals
rather than numbers.
Non-text characters are discouraged as non-portable.
If non-text characters are necessary,
particularly if they are used in strings,
they should be written using a escape character of three octal digits
rather than one
(e.g.,
`\007
').
Even so, such usage should be considered machine-dependent and treated
as such.
Complex expressions can be used as macro parameters, and operator-precedence problems can arise unless all occurrences of parameters have parentheses around them. There is little that can be done about the problems caused by side effects in parameters except to avoid side effects in expressions (a good idea anyway) and, when possible, to write macros that evaluate their parameters exactly once. There are times when it is impossible to write macros that act exactly like functions.
Some macros also exist as functions (e.g.,
getc
and
fgetc
).
The macro should be used in implementing the function
so that changes to the macro
will be automatically reflected in the function.
Care is needed when interchanging macros and functions since function
parameters are passed by value, while macro parameters are passed by
name substitution.
Carefree use of macros requires that they be declared carefully.
Macros should avoid using globals, since the global name may be hidden by a local declaration. Macros that change named parameters (rather than the storage they point at) or may be used as the left-hand side of an assignment should mention this in their comments. Macros that take no parameters but reference variables, are long, or are aliases for function calls should be given an empty parameter list, e.g.,
#define OFF_A() (a_global+OFFSET)
#define BORK() (zork())
#define SP3() if (b) { int x; av = f (&x); bv += x; }
Macros save function call/return overhead, but when a macro gets long, the effect of the call/return becomes negligible, so a function should be used instead.
In some cases it is appropriate to make the compiler insure that a macro is terminated with a semicolon.
if (x==3)
SP3();
else
BORK();
SP3
,
then the
else
will (silently!) become associated with the
if
in the
SP3
macro.
With the semicolon, the
else
doesn't match any
if
!
The macro
SP3
can be written safely as
#define SP3() \
do { if (b) { int x; av = f (&x); bv += x; }} while (0)
do-while
by hand is awkward and some compilers and tools
may complain that there is a constant in the
``while
''
conditional.
A macro for declaring statements may make programming easier.
#ifdef lint
static int ZERO;
#else
# define ZERO 0
#endif
#define STMT( stuff ) do { stuff } while (ZERO)
SP3
with
#define SP3() \ STMT( if (b) { int x; av = f (&x); bv += x; } )Using
STMT
will help prevent small typos from silently changing programs.
Except for type casts,
sizeof
,
and hacks such as the above,
macros should contain keywords only if the entire
macro is surrounded by braces.
Conditional compilation is useful for things like
machine-dependencies,
debugging,
and for setting certain options at compile-time.
Beware of conditional compilation.
Various controls can easily combine in unforeseen ways.
If you #ifdef machine dependencies,
make sure that when no machine is specified,
the result is an error, not a default machine.
(Use
``#error
''
and indent it so it works with older compilers.)
If you #ifdef optimizations,
the default should be the unoptimized code
rather than an uncompilable program.
Be sure to test the unoptimized code.
Note that the text inside of an #ifdeffed section may be scanned
(processed) by the compiler, even if the #ifdef is false.
Thus, even if the #ifdeffed part of the file never gets compiled
(e.g.,
),"#ifdef
COMMENT"
it cannot be arbitrary text.
Put #ifdefs in header files instead of source files when possible.
Use the #ifdefs to define macros
that can be used uniformly in the code.
For instance, a header file for checking memory allocation
might look like (omitting definitions for
REALLOC
and
FREE
):
#ifdef DEBUG
extern void *mm_malloc();
# define MALLOC(size) (mm_malloc(size))
#else
extern void *malloc();
# define MALLOC(size) (malloc(size))
#endif
Conditional compilation should generally be on a feature-by-feature basis. Machine or operating system dependencies should be avoided in most cases.
#ifdef 4BSD
long t = time ((long *)NULL);
#endif
TIME_LONG
and
TIME_STRUCT
and define the appropriate one
in a configuration file such as config.h.
The following are some excerpts from [15] relevant to program structure and organisation.
Most programs are too complicated - that is, more complex than they need to be to solve their problems efficiently. Why? Mostly it's because of bad design, but I will skip that issue here because it's a big one. But programs are often complicated at the microscopic level, and that is something I can address here.
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.) For example, binary trees are always faster than splay trees for workaday problems.
Rule 4. Fancy algorithms are buggier than simple ones, and they're much harder to implement. Use simple algorithms as well as simple data structures.
The following data structures are a complete list for almost all practical programs:
Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming. (See Brooks p. 102.)
Rule 6. There is no Rule 6.
Algorithms, or details of algorithms,
can often be encoded compactly, efficiently and expressively as data
rather than, say, as lots of
if
statements.
The reason is that the
complexity
of the job at hand, if it is due to a combination of
independent details,
can be encoded.
A classic example of this is parsing tables,
which encode the grammar of a programming language
in a form interpretable by a fixed, fairly simple
piece of code.
Finite state machines are particularly amenable to this
form of attack, but almost any program that involves
the `parsing' of some abstract sort of input into a sequence
of some independent `actions' can be constructed profitably
as a data-driven algorithm.
Perhaps the most intriguing aspect of this kind of design is that the tables can sometimes be generated by another program - a parser generator, in the classical case. As a more earthy example, if an operating system is driven by a set of tables that connect I/O requests to the appropriate device drivers, the system may be `configured' by a program that reads a description of the particular devices connected to the machine in question and prints the corresponding tables.
One of the reasons data-driven programs are not common, at least among beginners, is the tyranny of Pascal. Pascal, like its creator, believes firmly in the separation of code and data. It therefore (at least in its original form) has no ability to create initialized data. This flies in the face of the theories of Turing and von Neumann, which define the basic principles of the stored-program computer. Code and data are the same, or at least they can be. How else can you explain how a compiler works? (Functional languages have a similar problem with I/O.)
Another result of the tyranny of Pascal is that beginners don't use function pointers. (You can't have function-valued variables in Pascal.) Using function pointers to encode complexity has some interesting properties.
Some of the complexity is passed to the routine pointed to. The routine must obey some standard protocol - it's one of a set of routines invoked identically - but beyond that, what it does is its business alone. The complexity is distributed.
There is this idea of a protocol, in that all functions used similarly must behave similarly. This makes for easy documentation, testing, growth and even making the program run distributed over a network - the protocol can be encoded as remote procedure calls.
I argue that clear use of function pointers is the heart of object-oriented programming. Given a set of operations you want to perform on data, and a set of data types you want to respond to those operations, the easiest way to put the program together is with a group of function pointers for each type. This, in a nutshell, defines class and method. The O-O languages give you more of course - prettier syntax, derived types and so on - but conceptually they provide little extra.
Combining data-driven programs with function pointers leads to an astonishingly expressive way of working, a way that, in my experience, has often led to pleasant surprises. Even without a special O-O language, you can get 90% of the benefit for no extra work and be more in control of the result. I cannot recommend an implementation style more highly. All the programs I have organized this way have survived comfortably after much development - far better than with less disciplined approaches. Maybe that's it: the discipline it forces pays off handsomely in the long run.
``C Code. C code run. Run, code, run... PLEASE!!!'' - Barbara Tongue
If you use
enum
s,
the first enum constant should have a non-zero value,
or the first constant should indicate an error.
enum { STATE_ERR, STATE_START, STATE_NORMAL, STATE_END } state_t;
enum { VAL_NEW=1, VAL_NORMAL, VAL_DYING, VAL_DEAD } value_t;
Check for error return values, even from functions that ``can't''
fail.
Consider that
close()
and
fclose()
can and do fail, even when all prior file operations have succeeded.
Write your own functions so that they test for errors
and return error values or abort the program in a well-defined way.
Include a lot of debugging and error-checking code
and leave most of it in the finished product.
Check even for ``impossible'' errors. [8]
Use the
assert
facility to insist that
each function is being passed well-defined values,
and that intermediate results are well-formed.
Build in the debug code using as few #ifdefs as possible.
For instance, if
``mm_malloc
''
is a debugging memory allocator, then
MALLOC
will select the appropriate allocator,
avoids littering the code with #ifdefs,
and makes clear the difference between allocation calls being debugged
and extra memory that is allocated only during debugging.
#ifdef DEBUG
# define MALLOC(size) (mm_malloc(size))
#else
# define MALLOC(size) (malloc(size))
#endif
Check bounds even on things that ``can't'' overflow.
A function that writes on to variable-sized storage
should take an argument
maxsize
that is the size of the destination.
If there are times when the size of the destination is unknown,
some `magic' value of
maxsize
should mean ``no bounds checks''.
When bound checks fail,
make sure that the function does something useful
such as abort or return an error status.
/*
* INPUT: A null-terminated source string `src' to copy from and
* a `dest' string to copy to. `maxsize' is the size of `dest'
* or UINT_MAX if the size is not known. `src' and `dest' must
* both be shorter than UINT_MAX, and `src' must be no longer than
* `dest'.
* OUTPUT: The address of `dest' or NULL if the copy fails.
* `dest' is modified even when the copy fails.
*/
char *
copy (char *dest, size_t maxsize, char *src)
{
char *dp = dest;
while (maxsize-- > 0)
if ((*dp++ = *src++) == '\0')
return (dest);
return (NULL);
}
In all, remember that a program that produces wrong answers twice as fast is infinitely slower. The same is true of programs that crash occasionally or clobber valid data.
``C combines the power of assembler with the portability of assembler.''
- Anonymous, alluding to Bill Thacker.
The advantages of portable code are well known. This section gives some guidelines for writing portable code. Here, ``portable'' means that a source file can be compiled and executed on different machines with the only change being the inclusion of possibly different header files and the use of different compiler flags. The header files will contain #defines and typedefs that may vary from machine to machine. In general, a new ``machine'' is different hardware, a different operating system, a different compiler, or any combination of these. Reference [1] contains useful information on both style and portability. The following is a list of pitfalls to be avoided and recommendations to be considered when designing portable code:
type | pdp11 | VAX/11 | 68000 | Cray-2 | Unisys | Harris | 80386-Pentium |
series | family | 1100 | H800 | ||||
char | 8 | 8 | 8 | 8 | 9 | 8 | 8 |
short | 16 | 16 | 8/16 | 64(32) | 18 | 24 | 8/16 |
int | 16 | 32 | 16/32 | 64(32) | 36 | 24 | 16/32 |
long | 32 | 32 | 32 | 64 | 36 | 48 | 32 |
char* | 16 | 32 | 32 | 64 | 72 | 24 | 16/32/48 |
int* | 16 | 32 | 32 | 64(24) | 72 | 24 | 16/32/48 |
int(*)() | 16 | 32 | 32 | 64 | 576 | 24 | 16/32/48 |
Type | Minimum | No Smaller |
# Bits | Than | |
char | 8 | |
short | 16 | char |
int | 16 | short |
long | 32 | int |
float | 24 | |
double | 38 | float |
any * | 14 | |
char * | 15 | any * |
void * | 15 | any * |
void*
type
is guaranteed to have enough bits
of precision to hold a pointer to any data object.
The
void(*)()
type is guaranteed to be able to hold a pointer to any function.
Use these types when you need a generic pointer.
Be sure to cast pointers back to the correct type before using them.
int*
and a
char*
are the same size, they may have different formats.
For example, the following will fail on some machines that have
sizeof(int*)
equal to
sizeof(char*)
.
The code fails because
free
expects a
char*
and gets passed an
int*
.
int *p = (int *) malloc (sizeof(int));
free (p);
int
,
but a long cast into an
int
and back to a
long
may be truncated to 32 bits.
constant
zero may be cast to any pointer type.
The resulting pointer is called a
null pointer
for that type, and is different from any other pointer of that type.
A null pointer always compares equal to the constant zero.
A null pointer might not compare equal with a variable
that has the value zero.
Null pointers are not always stored with all bits zero.
Null pointers for two different types are sometimes different.
A null pointer of one type cast in to a pointer of another
type will be cast in to the null pointer for that second type.
((int *) 2 )
((int *) 3 )
extern int x_int_dummy; /* in x.c */
#define X_FAIL (NULL)
#define X_BUSY (&x_int_dummy)
#define X_FAIL (NULL)
#define X_BUSY MD_PTR1 /* MD_PTR1 from "machdep.h" */
double
may have less range or precision than a
float
.
double
may be a
float
with similar value.
Do not depend on this.
array[c]
won't work if
c
is supposed to be positive and is instead signed and negative.
If you must assume signed or unsigned characters, comment them as
SIGNED
or
UNSIGNED
.
Unsigned behavior can be guaranteed with
unsigned char
Be particularly careful if your program will deal with above-127 character
values (e.g. to use strings containing Greek characters).
Also note that some compilers (e.g. Watcom C 9.1) produce wrong results
when strings containing above-127 characters are passed to the
strcmp
function.
"<ctype.h>"
where possible, but beware that their behavior varies considerably
between C implementations.
For instance, if c is not an upper-case letter,
tolower(c) may return c or garbage.)
If Greek strings will be processed use the system-provided ctype.h may not
work correctly.
If you must assume, document and localize.
Remember that characters may hold (much) more than 8 bits.
"unsigned
int"
are highly compiler-dependent.
If a simple loop counter is being used where either 16 or 32 bits will
do, then use
int
,
since it will get the most efficient (natural)
unit for the current machine.
c = foo (getchar(), getchar());
char
foo (char c1, char c2, char c3)
{
char bar = *(&c1 + 1);
return (bar); /* often won't return c2 */
}
char
might be passed as an
int
,
for instance.
Arguments may be pushed left-to-right, right-to-left,
in arbitrary order, or passed in registers (not pushed at all).
The order of evaluation may differ from the order in which
they are pushed.
One compiler may use several (incompatible) calling conventions.
((char*)0)
"
is treated the same way as a pointer to a null string.
Do not depend on this.
s = "/dev/tty??";
strcpy (&s[8], ttychars);
==
and
!=
comparisons are defined for all pointers of a given type.
It is only portable to use
<
,
<=
,
>
,
or
>=
to compare pointers when they both point in to
(or to the first element after) the same array.
It is likewise only portable to use arithmetic operators on pointers
that both point into the same array or the first element afterwards.
x &= 0177770
x &= ~07
a[i] = b[i++];
b
has not been incremented.
The index into
a
could be the value of
i
either before or after the increment.
struct bar_t { struct bar_t *next; } bar;
bar->next = bar = tmp;
bar->next
''
may be computed before the value is assigned to
``bar
''.
bar = bar->next = tmp;
bar
can be assigned before
bar->next.
Although this appears to violate the rule that
``assignment proceeds right-to-left'', it is a legal interpretation.
Consider the following example:
long i;
short a[N];
i = old
i = a[i] = new;
i
''
is assigned must be a value that is typed as if assignment
proceeded right-to-left.
However,
``i
''
may be assigned the value
``(long)(short)new
''
before
``a[i]
''
is assigned to.
Compilers do differ.
-W3.
On Unix systems use alint.
switch
or
goto
outside the block.
longjmp
,
should be used with caution.
Many implementations ``forget'' to restore values in registers.
Declare critical values as
volatile
if you can or comment them as
VOLATILE
.
Modern C compilers support the ANSI standard C [16]. Write code to run under standard C, and use features such as function prototypes, constant storage, and volatile storage. Standard C improves program performance by giving better information to optimizers. Standard C improves portability by insuring that all compilers accept the same input language and by providing mechanisms that try to hide machine dependencies or emit warnings about code that may be machine-dependent.
Note that under ANSI C, the `#' for a preprocessor directive must be the first non-whitespace character on a line. Use this feature to improve the formatting of your files.
An
``#ifdef NAME
''
should end with either
``#endif
''
or
``#endif /* NAME */
'',
not with
``#endif NAME
''.
The comment should not be used on short #ifdefs,
as it is clear from the code.
ANSI
trigraphs
may cause programs with strings containing
``??
''
may break mysteriously.
The style for ANSI C is the same as for regular C, with two notable exceptions: storage qualifiers and parameter lists.
Because
const
and
volatile
have strange binding rules,
each
const
or
volatile
object should have a separate declaration.
int const *s; /* YES */
int const *s, *t; /* NO */
Prototyped functions merge parameter declaration and definition in to one list. Parameters should be commented in the function comment.
/*
* `bp': boat trying to get in.
* `stall': a list of stalls, never NULL.
* returns stall number, 0 => no room.
*/
int
enter_pier (boat_t const *bp, stall_t *stall)
{
...
Pragmas are used to introduce machine-dependent code in a controlled way. Obviously, pragmas should be treated as machine dependencies. Unfortunately, the syntax of ANSI pragmas makes it impossible to isolate them in machine-dependent headers.
Pragmas are of two classes.
Optimizations
may safely be ignored.
Pragmas that change the system behavior (``required pragmas'')
may not.
Required pragmas should be #ifdeffed so that compilation will abort if
no pragma is selected.
Two compilers may use a given pragma in two very different ways.
For instance, one compiler may use
``haggis
''
to signal an optimization.
Another might use it to indicate that a given statement,
if reached, should terminate the program.
Thus, when pragmas are used,
they must always be enclosed in machine-dependent #ifdefs.
Pragmas must always be #ifdefed out for non-ANSI compilers.
Be sure to indent the `#' character on the
#pragma
,
as older preprocessors will halt on it otherwise.
#if defined(__STDC__) && defined(USE_HAGGIS_PRAGMA)
#pragma (HAGGIS)
#endif
``The `#pragma' command is specified in the ANSI standard to have an arbitrary implementation-defined effect. In the GNU C preprocessor, `#pragma' first attempts to run the game `rogue'; if that fails, it tries to run the game `hack'; if that fails, it tries to run GNU Emacs displaying the Tower of Hanoi; if that fails, it reports a fatal error. In any case, preprocessing does not continue.''
- Manual for the GNU C preprocessor for GNU CC 1.34.
This section contains some miscellaneous do's and don'ts.
float
for a loop counter is a great way to shoot yourself in the foot.
Always test floating-point numbers as <= or >=,
never use an exact comparison (== or !=).
=
''
of the logical compare is a problem.
Use explicit tests.
Avoid assignment with implicit test.
abool = bbool;
if (abool) { ...
while ((abool = bbool) != FALSE) { ...
while (abool = bbool) { ... /* VALUSED */
while (abool = bbool, abool) { ...
register
keyword.
One very useful tool is make [7]. During development, make recompiles only those modules that have been changed since the last time make was used. It can be used to automate other tasks, as well. Some common conventions include:
Individual projects may wish to establish additional standards beyond those given here. The following issues are some of those that should be addressed by each project program administration group.
A set of standards has been presented for C programming style. Among the most important points are:
As with any standard, it must be followed if it is to be useful. If you have trouble following any of these standards don't just ignore them. Talk with an experienced programmer.