``C combines the power of assembler with
the portability of assembler.''
- Anonymous, alluding to Bill Thacker.
The advantages of portable code are well known.
This section gives some guidelines for writing portable code.
Here, ``portable'' means that a source file
can be compiled and executed on different machines
with the only change being the inclusion of possibly
different header files and the use of different compiler flags.
The header files will contain #defines and typedefs that may vary from
machine to machine.
In general, a new ``machine'' is different hardware,
a different operating system, a different compiler,
or any combination of these.
Reference [1] contains useful information on both style and portability.
The following is a list of pitfalls to be avoided and recommendations
to be considered when designing portable code:
Write portable code first,
worry about detail optimizations only on machines where they
prove necessary.
Optimized code is often obscure.
Optimizations for one machine may produce worse code on another.
Document performance hacks and localize them as much as possible.
Documentation should explain how it works and why
it was needed (e.g., ``loop executes 6 zillion times'').
Recognize that some things are inherently non-portable.
Examples are code to deal with particular hardware registers such as
the program status word,
and code that is designed to support a particular piece of hardware,
such as an assembler or I/O driver.
Even in these cases there are many routines and data organizations
that can be made machine independent.
Organize source files so that the machine-independent
code and the machine-dependent code are in separate files.
Then if the program is to be moved to a new machine,
it is a much easier task to determine what needs to be changed.
Comment the machine dependence in the headers of the appropriate
files.
Any behavior that is described as ``implementation defined''
should be treated as a machine (compiler) dependency.
Assume that the compiler or hardware does it some completely screwy
way.
Pay attention to word sizes.
Objects may be non-intuitive sizes,
Pointers are not always the same size as ints,
the same size as each other,
or freely interconvertible.
The following table shows bit sizes for basic types in C for various
machines and compilers.
type
pdp11
VAX/11
68000
Cray-2
Unisys
Harris
80386-Pentium
series
family
1100
H800
char
8
8
8
8
9
8
8
short
16
16
8/16
64(32)
18
24
8/16
int
16
32
16/32
64(32)
36
24
16/32
long
32
32
32
64
36
48
32
char*
16
32
32
64
72
24
16/32/48
int*
16
32
32
64(24)
72
24
16/32/48
int(*)()
16
32
32
64
576
24
16/32/48
Some machines have more than one possible size for a given type.
The size you get can depend both on the compiler
and on various compile-time flags.
The following table shows ``safe'' type sizes on the majority of
systems.
Unsigned numbers are the same bit size as signed numbers.
Type
Minimum
No Smaller
# Bits
Than
char
8
short
16
char
int
16
short
long
32
int
float
24
double
38
float
any *
14
char *
15
any *
void *
15
any *
The
void*
type
is guaranteed to have enough bits
of precision to hold a pointer to any data object.
The
void(*)()
type is guaranteed to be able to hold a pointer to any function.
Use these types when you need a generic pointer.
Be sure to cast pointers back to the correct type before using them.
Even when, say, an
int*
and a
char*
are the same size, they may have different formats.
For example, the following will fail on some machines that have
sizeof(int*)
equal to
sizeof(char*).
The code fails because
free
expects a
char*
and gets passed an
int*.
int*p=(int*)malloc(sizeof(int));free(p);
Note that
the size of an object does not guarantee the precision of
that object.
The Cray-2 may use 64 bits to store an
int,
but a long cast into an
int
and back to a
long
may be truncated to 32 bits.
The integer
constant
zero may be cast to any pointer type.
The resulting pointer is called a
null pointer
for that type, and is different from any other pointer of that type.
A null pointer always compares equal to the constant zero.
A null pointer might not compare equal with a variable
that has the value zero.
Null pointers are not always stored with all bits zero.
Null pointers for two different types are sometimes different.
A null pointer of one type cast in to a pointer of another
type will be cast in to the null pointer for that second type.
On ANSI compilers, when two pointers of the same type access
the same storage, they will compare as equal.
When non-zero integer constants are cast to pointer types,
they may become identical to other pointers.
On non-ANSI compilers, pointers that
access the same storage may compare as different.
The following two pointers, for instance,
may or may not compare equal,
and they may or may not access the same storage.
The code may also fail to compile, fault on pointer creation,
fault on pointer comparison, or fault on pointer dereferences.
((int*)2)((int*)3)
If you need `magic' pointers other than NULL,
either allocate some storage or treat the pointer as
a machine dependence.
externintx_int_dummy;/* in x.c */#define X_FAIL (NULL)#define X_BUSY (&x_int_dummy)
#define X_FAIL (NULL)#define X_BUSY MD_PTR1 /* MD_PTR1 from "machdep.h" */
Floating-point numbers have both a precision and a range.
These are independent of the size of the object.
Thus, overflow (underflow) for a 32-bit floating-point number will
happen at different values on different machines.
Also,
4.9
times
5.1
will yield
two different numbers on two different machines.
Differences in rounding and truncation can give surprisingly
different answers.
On some machines,
a
double
may have less range or precision than a
float.
On some machines the first half of a
double
may be a
float
with similar value.
Do not depend on this.
Watch out for signed characters.
On Intel architectures and some VAXes, for instance,
characters are sign extended when used in expressions,
which is not the case on many other machines.
Code that assumes signed/unsigned is unportable.
For example,
array[c]
won't work if
c
is supposed to be positive and is instead signed and negative.
If you must assume signed or unsigned characters, comment them as
SIGNED
or
UNSIGNED.
Unsigned behavior can be guaranteed with
unsigned char
Be particularly careful if your program will deal with above-127 character
values (e.g. to use strings containing Greek characters).
Also note that some compilers (e.g. Watcom C 9.1) produce wrong results
when strings containing above-127 characters are passed to the
strcmp
function.
Avoid assuming ASCII or a particular character set (ELOT, 437).
(Use
"<ctype.h>"
where possible, but beware that their behavior varies considerably
between C implementations.
For instance, if c is not an upper-case letter,
tolower(c) may return c or garbage.)
If Greek strings will be processed use the system-provided ctype.h may not
work correctly.
If you must assume, document and localize.
Remember that characters may hold (much) more than 8 bits.
Code that takes advantage of the two's complement representation of
numbers on most machines should not be used.
Optimizations that replace arithmetic operations with equivalent
shifting operations are particularly suspect.
If absolutely necessary, machine-dependent code should be #ifdeffed
or operations should be performed by #ifdeffed macros.
You should weigh the time savings with the potential for obscure
and difficult bugs when your code is moved.
In general, if the word size or value range is important,
typedef ``sized'' types.
Large programs should have a central header file which supplies
typedefs for commonly-used width-sensitive types, to make
it easier to change them and to aid in finding width-sensitive code.
Unsigned types other than
"unsignedint"
are highly compiler-dependent.
If a simple loop counter is being used where either 16 or 32 bits will
do, then use
int,
since it will get the most efficient (natural)
unit for the current machine.
Data alignment is also important.
For instance,
on various machines a 4-byte integer may start at any address,
start only at an even address, or start only at a multiple-of-four
address.
Thus, a particular structure may have its elements
at different offsets on different machines,
even when given elements are the same size on all machines.
Indeed, a structure of a 32-bit pointer and an 8-bit character may be
3 sizes on 3 different machines.
As a corollary, pointers to objects may not be interchanged freely;
saving an integer through a pointer
to 4 bytes starting at an odd address
will sometimes work,
sometimes cause a core dump,
and sometimes fail silently (clobbering other data in the process).
Pointer-to-character is a particular trouble spot on machines which
do not address to the byte.
Alignment considerations and loader peculiarities make it very rash
to assume that two consecutively-declared variables are together
in memory, or that a variable of one type is aligned appropriately
to be used as another type.
The bytes of a word are of increasing significance with increasing
address on machines such as the Intel x88 and the VAX (little-endian)
and of decreasing significance with increasing address on other
machines such as the 68000 (big-endian).
The order of bytes in a word and of words in larger
objects (say, a double word) might not be the same.
Hence any code that depends on the left-right orientation of bits
in an object deserves special scrutiny.
Bit fields within structure members will only be portable so long as
two separate fields are never concatenated and treated as a unit. [1,3]
Actually, it is nonportable to concatenate any two variables.
There may be unused holes in structures.
Suspect unions used for type cheating.
Specifically, a value should not be stored as one type and retrieved as
another.
An explicit tag field for unions may be useful.
Different compilers use different conventions for returning
structures.
This causes a problem when libraries return structure values
to code compiled with a different compiler.
Structure pointers are not a problem unless you have specified
an non-default structure packing option to your compiler.
Do not make assumptions about the parameter passing mechanism.
especially pointer sizes and parameter evaluation order, size, etc.
The following code, for instance, is very nonportable.
c=foo(getchar(),getchar());charfoo(charc1,charc2,charc3){charbar=*(&c1+1);return(bar);/* often won't return c2 */}
This example has lots of problems.
The stack may grow up or down
(indeed, there need not even be a stack!).
Parameters may be widened when they are passed,
so a
char
might be passed as an
int,
for instance.
Arguments may be pushed left-to-right, right-to-left,
in arbitrary order, or passed in registers (not pushed at all).
The order of evaluation may differ from the order in which
they are pushed.
One compiler may use several (incompatible) calling conventions.
On some machines, the null character pointer
"((char*)0)"
is treated the same way as a pointer to a null string.
Do not depend on this.
Do not modify string constants.
Some libraries attempt to modify and then restore read-only
string variables.
Programs sometimes won't port because of these broken libraries.
The libraries are getting better.
One particularly notorious (bad) example is
s="/dev/tty??";strcpy(&s[8],ttychars);
The address space may have holes.
Simply computing the address
of an unallocated element in an array
(before or after the actual storage of the array)
may crash the program.
If the address is used in a comparison,
sometimes the program will run but clobber data, give wrong answers,
or loop forever.
In ANSI C, a pointer into an array of objects may legally point to
the first element after the end of the array; this is usually safe
in older implementations.
This ``outside'' pointer may not be dereferenced.
Only the
==
and
!=
comparisons are defined for all pointers of a given type.
It is only portable to use
<,
<=,
>,
or
>=
to compare pointers when they both point in to
(or to the first element after) the same array.
It is likewise only portable to use arithmetic operators on pointers
that both point into the same array or the first element afterwards.
Word size also affects shifts and masks.
The following code will clear only the three rightmost bits of an
int on some 68000s.
On other machines it will also clear the upper two bytes.
x&=0177770
Use instead
x&=~07
which works properly on all machines.
Bitfields do not have these problems.
Side effects within expressions can result in code
whose semantics are compiler-dependent, since C's order of evaluation
is explicitly undefined in most places.
Notorious examples include the following.
a[i]=b[i++];
In the above example, we know only that
the subscript into
b
has not been incremented.
The index into
a
could be the value of
i
either before or after the increment.
In the second example, the address of
``bar->next''
may be computed before the value is assigned to
``bar''.
bar=bar->next=tmp;
In the third example,
bar
can be assigned before
bar->next.
Although this appears to violate the rule that
``assignment proceeds right-to-left'', it is a legal interpretation.
Consider the following example:
longi;shorta[N];i=oldi=a[i]=new;
The value that
``i''
is assigned must be a value that is typed as if assignment
proceeded right-to-left.
However,
``i''
may be assigned the value
``(long)(short)new''
before
``a[i]''
is assigned to.
Compilers do differ.
Be suspicious of numeric values appearing in the code (``magic
numbers'').
Become familiar with existing library functions and defines.
(But not too familiar.
The internal details of library facilities, as opposed to their
external interfaces, are subject to change without warning.
They are also often quite unportable.)
You should not be writing your own string compare routine,
terminal control routines, or making
your own defines for system structures.
``Rolling your own'' wastes your time and
makes your code less readable, because another reader has to
figure out whether you're doing something special in that reimplemented
stuff to justify its existence.
It also prevents your program
from taking advantage of any microcode assists or other
means of improving performance of system routines.
Furthermore, it's a fruitful source of bugs.
If possible, be aware of the differences between the common
libraries (such as ANSI, POSIX, and so on).
Always run your compiler with the switches set for maximum warnings.
As an example run the Microsoft compilers with
-W3.
On Unix systems use alint.
Suspect labels inside blocks with the
associated
switch
or
goto
outside the block.
Use explicit casts when doing arithmetic
that mixes signed and unsigned values.
The inter-procedural goto,
longjmp,
should be used with caution.
Many implementations ``forget'' to restore values in registers.
Declare critical values as
volatile
if you can or comment them as
VOLATILE.
Some linkers convert names to lower-case
and
some only recognize the first six letters as unique.
Programs may break quietly on these systems.
Beware of compiler extensions.
If used, document and
consider them as machine dependencies.
A program cannot generally execute code in the data
segment or write into the code segment.
Even when it can, there is no guarantee that it can do so reliably.