Portability

``C combines the power of assembler with the portability of assembler.''
- Anonymous, alluding to Bill Thacker.

The advantages of portable code are well known. This section gives some guidelines for writing portable code. Here, ``portable'' means that a source file can be compiled and executed on different machines with the only change being the inclusion of possibly different header files and the use of different compiler flags. The header files will contain #defines and typedefs that may vary from machine to machine. In general, a new ``machine'' is different hardware, a different operating system, a different compiler, or any combination of these. Reference [1] contains useful information on both style and portability. The following is a list of pitfalls to be avoided and recommendations to be considered when designing portable code:

Write portable code first, worry about detail optimizations only on machines where they prove necessary. Optimized code is often obscure. Optimizations for one machine may produce worse code on another. Document performance hacks and localize them as much as possible. Documentation should explain how it works and why it was needed (e.g., ``loop executes 6 zillion times'').
Recognize that some things are inherently non-portable. Examples are code to deal with particular hardware registers such as the program status word, and code that is designed to support a particular piece of hardware, such as an assembler or I/O driver. Even in these cases there are many routines and data organizations that can be made machine independent.
Organize source files so that the machine-independent code and the machine-dependent code are in separate files. Then if the program is to be moved to a new machine, it is a much easier task to determine what needs to be changed. Comment the machine dependence in the headers of the appropriate files.
Any behavior that is described as ``implementation defined'' should be treated as a machine (compiler) dependency. Assume that the compiler or hardware does it some completely screwy way.

Pay attention to word sizes. Objects may be non-intuitive sizes, Pointers are not always the same size as ints, the same size as each other, or freely interconvertible. The following table shows bit sizes for basic types in C for various machines and compilers.

type	pdp11	VAX/11	68000	Cray-2	Unisys	Harris	80386-Pentium
	series		family		1100	H800
char	8	8	8	8	9	8	8
short	16	16	8/16	64(32)	18	24	8/16
int	16	32	16/32	64(32)	36	24	16/32
long	32	32	32	64	36	48	32
char*	16	32	32	64	72	24	16/32/48
int*	16	32	32	64(24)	72	24	16/32/48
int(*)()	16	32	32	64	576	24	16/32/48

Some machines have more than one possible size for a given type. The size you get can depend both on the compiler and on various compile-time flags. The following table shows ``safe'' type sizes on the majority of systems. Unsigned numbers are the same bit size as signed numbers.

Type	Minimum	No Smaller
	# Bits	Than
char	8
short	16	char
int	16	short
long	32	int
float	24
double	38	float
any *	14
char *	15	any *
void *	15	any *

The void* type is guaranteed to have enough bits of precision to hold a pointer to any data object. The void(*)() type is guaranteed to be able to hold a pointer to any function. Use these types when you need a generic pointer. Be sure to cast pointers back to the correct type before using them.
Even when, say, an int* and a char* are the same size, they may have different formats. For example, the following will fail on some machines that have sizeof(int*) equal to sizeof(char*). The code fails because free expects a char* and gets passed an int*.
```
int *p = (int *) malloc (sizeof(int));
free (p);
```
Note that the size of an object does not guarantee the precision of that object. The Cray-2 may use 64 bits to store an int, but a long cast into an int and back to a long may be truncated to 32 bits.
The integer constant zero may be cast to any pointer type. The resulting pointer is called a null pointer for that type, and is different from any other pointer of that type. A null pointer always compares equal to the constant zero. A null pointer might not compare equal with a variable that has the value zero. Null pointers are not always stored with all bits zero. Null pointers for two different types are sometimes different. A null pointer of one type cast in to a pointer of another type will be cast in to the null pointer for that second type.
On ANSI compilers, when two pointers of the same type access the same storage, they will compare as equal. When non-zero integer constants are cast to pointer types, they may become identical to other pointers. On non-ANSI compilers, pointers that access the same storage may compare as different. The following two pointers, for instance, may or may not compare equal, and they may or may not access the same storage. The code may also fail to compile, fault on pointer creation, fault on pointer comparison, or fault on pointer dereferences.
```
((int *) 2 )
((int *) 3 )
```
If you need `magic' pointers other than NULL, either allocate some storage or treat the pointer as a machine dependence.
```
extern int x_int_dummy;		/* in x.c */
#define X_FAIL	(NULL)
#define X_BUSY	(&x_int_dummy)
```
```
#define X_FAIL	(NULL)
#define X_BUSY	MD_PTR1		/* MD_PTR1 from "machdep.h" */
```
Floating-point numbers have both a precision and a range. These are independent of the size of the object. Thus, overflow (underflow) for a 32-bit floating-point number will happen at different values on different machines. Also, 4.9 times 5.1 will yield two different numbers on two different machines. Differences in rounding and truncation can give surprisingly different answers.
On some machines, a double may have less range or precision than a float.
On some machines the first half of a double may be a float with similar value. Do not depend on this.
Watch out for signed characters. On Intel architectures and some VAXes, for instance, characters are sign extended when used in expressions, which is not the case on many other machines. Code that assumes signed/unsigned is unportable. For example, array[c] won't work if c is supposed to be positive and is instead signed and negative. If you must assume signed or unsigned characters, comment them as SIGNED or UNSIGNED. Unsigned behavior can be guaranteed with unsigned char Be particularly careful if your program will deal with above-127 character values (e.g. to use strings containing Greek characters). Also note that some compilers (e.g. Watcom C 9.1) produce wrong results when strings containing above-127 characters are passed to the strcmp function.
Avoid assuming ASCII or a particular character set (ELOT, 437). (Use "<ctype.h>" where possible, but beware that their behavior varies considerably between C implementations. For instance, if c is not an upper-case letter, tolower(c) may return c or garbage.) If Greek strings will be processed use the system-provided ctype.h may not work correctly. If you must assume, document and localize. Remember that characters may hold (much) more than 8 bits.
Code that takes advantage of the two's complement representation of numbers on most machines should not be used. Optimizations that replace arithmetic operations with equivalent shifting operations are particularly suspect. If absolutely necessary, machine-dependent code should be #ifdeffed or operations should be performed by #ifdeffed macros. You should weigh the time savings with the potential for obscure and difficult bugs when your code is moved.
In general, if the word size or value range is important, typedef ``sized'' types. Large programs should have a central header file which supplies typedefs for commonly-used width-sensitive types, to make it easier to change them and to aid in finding width-sensitive code. Unsigned types other than "unsignedint" are highly compiler-dependent. If a simple loop counter is being used where either 16 or 32 bits will do, then use int, since it will get the most efficient (natural) unit for the current machine.
Data alignment is also important. For instance, on various machines a 4-byte integer may start at any address, start only at an even address, or start only at a multiple-of-four address. Thus, a particular structure may have its elements at different offsets on different machines, even when given elements are the same size on all machines. Indeed, a structure of a 32-bit pointer and an 8-bit character may be 3 sizes on 3 different machines. As a corollary, pointers to objects may not be interchanged freely; saving an integer through a pointer to 4 bytes starting at an odd address will sometimes work, sometimes cause a core dump, and sometimes fail silently (clobbering other data in the process). Pointer-to-character is a particular trouble spot on machines which do not address to the byte. Alignment considerations and loader peculiarities make it very rash to assume that two consecutively-declared variables are together in memory, or that a variable of one type is aligned appropriately to be used as another type.
The bytes of a word are of increasing significance with increasing address on machines such as the Intel x88 and the VAX (little-endian) and of decreasing significance with increasing address on other machines such as the 68000 (big-endian). The order of bytes in a word and of words in larger objects (say, a double word) might not be the same. Hence any code that depends on the left-right orientation of bits in an object deserves special scrutiny. Bit fields within structure members will only be portable so long as two separate fields are never concatenated and treated as a unit. [1,3] Actually, it is nonportable to concatenate any two variables.
There may be unused holes in structures. Suspect unions used for type cheating. Specifically, a value should not be stored as one type and retrieved as another. An explicit tag field for unions may be useful.
Different compilers use different conventions for returning structures. This causes a problem when libraries return structure values to code compiled with a different compiler. Structure pointers are not a problem unless you have specified an non-default structure packing option to your compiler.
Do not make assumptions about the parameter passing mechanism. especially pointer sizes and parameter evaluation order, size, etc. The following code, for instance, is very nonportable.
```
	c = foo (getchar(), getchar());

char
foo (char c1, char c2, char c3)
{
	char bar = *(&c1 + 1);
	return (bar);			/* often won't return c2 */
}
```
This example has lots of problems. The stack may grow up or down (indeed, there need not even be a stack!). Parameters may be widened when they are passed, so a char might be passed as an int, for instance. Arguments may be pushed left-to-right, right-to-left, in arbitrary order, or passed in registers (not pushed at all). The order of evaluation may differ from the order in which they are pushed. One compiler may use several (incompatible) calling conventions.
On some machines, the null character pointer "((char*)0)" is treated the same way as a pointer to a null string. Do not depend on this.
Do not modify string constants. Some libraries attempt to modify and then restore read-only string variables. Programs sometimes won't port because of these broken libraries. The libraries are getting better. One particularly notorious (bad) example is
```
s = "/dev/tty??";
strcpy (&s[8], ttychars);
```
The address space may have holes. Simply computing the address of an unallocated element in an array (before or after the actual storage of the array) may crash the program. If the address is used in a comparison, sometimes the program will run but clobber data, give wrong answers, or loop forever. In ANSI C, a pointer into an array of objects may legally point to the first element after the end of the array; this is usually safe in older implementations. This ``outside'' pointer may not be dereferenced.
Only the == and != comparisons are defined for all pointers of a given type. It is only portable to use <, <=, >, or >= to compare pointers when they both point in to (or to the first element after) the same array. It is likewise only portable to use arithmetic operators on pointers that both point into the same array or the first element afterwards.
Word size also affects shifts and masks. The following code will clear only the three rightmost bits of an int on some 68000s. On other machines it will also clear the upper two bytes.
```
x &= 0177770
```
Use instead
```
x &= ~07
```
which works properly on all machines. Bitfields do not have these problems.
Side effects within expressions can result in code whose semantics are compiler-dependent, since C's order of evaluation is explicitly undefined in most places. Notorious examples include the following.
```
a[i] = b[i++];
```
In the above example, we know only that the subscript into b has not been incremented. The index into a could be the value of i either before or after the increment.
```
struct bar_t { struct bar_t *next; } bar;
bar->next = bar = tmp;
```
In the second example, the address of ``bar->next'' may be computed before the value is assigned to ``bar''.
```
bar = bar->next = tmp;
```
In the third example, bar can be assigned before bar->next. Although this appears to violate the rule that ``assignment proceeds right-to-left'', it is a legal interpretation. Consider the following example:
```
long i;
short a[N];
i = old
i = a[i] = new;
```
The value that ``i'' is assigned must be a value that is typed as if assignment proceeded right-to-left. However, ``i'' may be assigned the value ``(long)(short)new'' before ``a[i]'' is assigned to. Compilers do differ.
Be suspicious of numeric values appearing in the code (``magic numbers'').
Become familiar with existing library functions and defines. (But not too familiar. The internal details of library facilities, as opposed to their external interfaces, are subject to change without warning. They are also often quite unportable.) You should not be writing your own string compare routine, terminal control routines, or making your own defines for system structures. ``Rolling your own'' wastes your time and makes your code less readable, because another reader has to figure out whether you're doing something special in that reimplemented stuff to justify its existence. It also prevents your program from taking advantage of any microcode assists or other means of improving performance of system routines. Furthermore, it's a fruitful source of bugs. If possible, be aware of the differences between the common libraries (such as ANSI, POSIX, and so on).
Always run your compiler with the switches set for maximum warnings. As an example run the Microsoft compilers with -W3. On Unix systems use alint.
Suspect labels inside blocks with the associated switch or goto outside the block.
Use explicit casts when doing arithmetic that mixes signed and unsigned values.
The inter-procedural goto, longjmp, should be used with caution. Many implementations ``forget'' to restore values in registers. Declare critical values as volatile if you can or comment them as VOLATILE.
Some linkers convert names to lower-case and some only recognize the first six letters as unique. Programs may break quietly on these systems.
Beware of compiler extensions. If used, document and consider them as machine dependencies.
A program cannot generally execute code in the data segment or write into the code segment. Even when it can, there is no guarantee that it can do so reliably.