The advantages of portable code are well known.
This section gives some guidelines for writing portable code.
Here, "portable" means that a source file
can be compiled and executed on different machines
with the only change being the inclusion of possibly
different header files and the use of different compiler flags.
The header files will contain #defines and typedefs that may vary from
machine to machine.
In general, a new "machine" is different hardware,
a different operating system, a different compiler,
or any combination of these.
Reference [1] contains useful information on both style and portability.
The following is a list of pitfalls to be avoided and recommendations
to be considered when designing portable code:
-
Write portable code first,
worry about detail optimizations only on machines where they
prove necessary.
Optimized code is often obscure.
Optimizations for one machine may produce worse code on another.
Document performance hacks and localize them as much as possible.
Documentation should explain how it works and why
it was needed (e.g., "loop executes 6 zillion times").
-
Recognize that some things are inherently non-portable.
Examples are code to deal with particular hardware registers such as
the program status word,
and code that is designed to support a particular piece of hardware,
such as an assembler or I/O driver.
Even in these cases there are many routines and data organizations
that can be made machine independent.
-
Organize source files so that the machine-independent
code and the machine-dependent code are in separate files.
Then if the program is to be moved to a new machine,
it is a much easier task to determine what needs to be changed.
Comment the machine dependence in the headers of the appropriate
files.
-
Any behavior that is described as "implementation defined"
should be treated as a machine (compiler) dependency.
Assume that the compiler or hardware does it some completely screwy
way.
-
Pay attention to word sizes.
Objects may be non-intuitive sizes,
Pointers are not always the same size as ints,
the same size as each other,
or freely interconvertible.
The following table shows bit sizes for basic types in C for various
machines and compilers.
type pdp11 VAX/11 68000 Cray-2 Unisys Harris 80386
series family 1100 H800
_________________________________________________________________
char 8 8 8 8 9 8 8
short 16 16 8/16 64(32) 18 24 8/16
int 16 32 16/32 64(32) 36 24 16/32
long 32 32 32 64 36 48 32
char* 16 32 32 64 72 24 16/32/48
int* 16 32 32 64(24) 72 24 16/32/48
int(*)() 16 32 32 64 576 24 16/32/48
Some machines have more than one possible size for a given type.
The size you get can depend both on the compiler
and on various compile-time flags.
The following table shows "safe" type sizes on the majority of
systems.
Unsigned numbers are the same bit size as signed numbers.
Type Minimum No Smaller
# Bits Than
_____________________________
char 8
short 16 char
int 16 short
long 32 int
float 24
double 38 float
any * 14
char * 15 any *
void * 15 any *
-
The
void*
type
is guaranteed to have enough bits
of precision to hold a pointer to any data object.
The
void(*)()
type is guaranteed to be able to hold a pointer to any function.
Use these types when you need a generic pointer.
(Use
char*
and
char(*)(),
respectively, in older compilers).
Be sure to cast pointers back to the correct type before using them.
-
Even when, say, an
int*
and a
char*
are the same size, they may have different formats.
For example, the following will fail on some machines that have
sizeof(int*)
equal to
sizeof(char*).
The code fails because
free
expects a
char*
and gets passed an
int*.
int *p = (int *) malloc (sizeof(int));
free (p);
-
Note that
the size of an object does not guarantee the precision of
that object.
The Cray-2 may use 64 bits to store an
int,
but a long cast into an
int
and back to a
long
may be truncated to 32 bits.
-
The integer
constant
zero may be cast to any pointer type.
The resulting pointer is called a
null pointer
for that type, and is different from any other pointer of that type.
A null pointer always compares equal to the constant zero.
A null pointer might not compare equal with a variable
that has the value zero.
Null pointers are not always stored with all bits zero.
Null pointers for two different types are sometimes different.
A null pointer of one type cast in to a pointer of another
type will be cast in to the null pointer for that second type.
-
On ANSI compilers, when two pointers of the same type access
the same storage, they will compare as equal.
When non-zero integer constants are cast to pointer types,
they may become identical to other pointers.
On non-ANSI compilers, pointers that
access the same storage may compare as different.
The following two pointers, for instance,
may or may not compare equal,
and they may or may not access the same storage
(6).
((int *) 2 )
((int *) 3 )
If you need 'magic' pointers other than NULL,
either allocate some storage or treat the pointer as
a machine dependence.
extern int x_int_dummy; /* in x.c */
#define X_FAIL (NULL)
#define X_BUSY (&x_int_dummy)
#define X_FAIL (NULL)
#define X_BUSY MD_PTR1 /* MD_PTR1 from "machdep.h" */
-
Floating-point numbers have both a precision and a range.
These are independent of the size of the object.
Thus, overflow (underflow) for a 32-bit floating-point number will
happen at different values on different machines.
Also,
4.9
times
5.1
will yield
two different numbers on two different machines.
Differences in rounding and truncation can give surprisingly
different answers.
-
On some machines,
a
double
may have less range or precision than a
float.
-
On some machines the first half of a
double
may be a
float
with similar value.
Do not depend on this.
-
Watch out for signed characters.
On some VAXes, for instance,
characters are sign extended when used in expressions,
which is not the case on many other machines.
Code that assumes signed/unsigned is unportable.
For example,
array[c]
won't work if
c
is supposed to be positive and is instead signed and negative.
If you must assume signed or unsigned characters, comment them as
SIGNED
or
UNSIGNED.
Unsigned behavior can be guaranteed with
"unsigned char"
-
Avoid assuming ASCII.
If you must assume, document and localize.
Remember that characters may hold (much) more than 8 bits.
-
Code that takes advantage of the two's complement representation of
numbers on most machines should not be used.
Optimizations that replace arithmetic operations with equivalent
shifting operations are particularly suspect.
If absolutely necessary, machine-dependent code should be #ifdeffed
or operations should be performed by #ifdeffed macros.
You should weigh the time savings with the potential for obscure
and difficult bugs when your code is moved.
-
In general, if the word size or value range is important,
typedef "sized" types.
Large programs should have a central header file which supplies
typedefs for commonly-used width-sensitive types, to make
it easier to change them and to aid in finding width-sensitive code.
Unsigned types other than
"unsigned int"
are highly compiler-dependent.
If a simple loop counter is being used where either 16 or 32 bits will
do, then use
int,
since it will get the most efficient (natural)
unit for the current machine.
-
Data alignment is also important.
For instance,
on various machines a 4-byte integer may start at any address,
start only at an even address, or start only at a multiple-of-four
address.
Thus, a particular structure may have its elements
at different offsets on different machines,
even when given elements are the same size on all machines.
Indeed, a structure of a 32-bit pointer and an 8-bit character may be
3 sizes on 3 different machines.
As a corollary, pointers to objects may not be interchanged freely;
saving an integer through a pointer
to 4 bytes starting at an odd address
will sometimes work,
sometimes cause a core dump,
and sometimes fail silently (clobbering other data in the process).
Pointer-to-character is a particular trouble spot on machines which
do not address to the byte.
Alignment considerations and loader peculiarities make it very rash
to assume that two consecutively-declared variables are together
in memory, or that a variable of one type is aligned appropriately
to be used as another type.
-
The bytes of a word are of increasing significance with increasing
address on machines such as the VAX (little-endian)
and of decreasing significance with increasing address on other
machines such as the 68000 (big-endian).
The order of bytes in a word and of words in larger
objects (say, a double word) might not be the same.
Hence any code that depends on the left-right orientation of bits
in an object deserves special scrutiny.
Bit fields within structure members will only be portable so long as
two separate fields are never concatenated and treated as a unit.
[1],[3]
Actually, it is nonportable to concatenate any two variables.
-
There may be unused holes in structures.
Suspect unions used for type cheating.
Specifically, a value should not be stored as one type and retrieved as
another.
An explicit tag field for unions may be useful.
-
Different compilers use different conventions for returning
structures.
This causes a problem when libraries return structure values
to code compiled with a different compiler.
Structure pointers are not a problem.
-
Do not make assumptions about the parameter passing mechanism.
especially pointer sizes and parameter evaluation order, size, etc.
The following code, for instance, is very nonportable.
c = foo (getchar(), getchar());
char
foo (c1, c2, c3)
char c1, c2, c3;
{
char bar = *(&c1 + 1);
return (bar); /* often won't return c2 */
}
This example has lots of problems.
The stack may grow up or down
(indeed, there need not even be a stack!).
Parameters may be widened when they are passed,
so a
char
might be passed as an
int,
for instance.
Arguments may be pushed left-to-right, right-to-left,
in arbitrary order, or passed in registers (not pushed at all).
The order of evaluation may differ from the order in which
they are pushed.
One compiler may use several (incompatible) calling conventions.
-
On some machines, the null character pointer
((char *)0)
is treated the same way as a pointer to a null string.
Do not depend on this.
-
Do not modify string constants (7).
One particularly notorious (bad) example is
s = "/dev/tty??";
strcpy (&s[8], ttychars);
-
The address space may have holes.
Simply computing the address
of an unallocated element in an array
(before or after the actual storage of the array)
may crash the program.
If the address is used in a comparison,
sometimes the program will run but clobber data, give wrong answers,
or loop forever.
In ANSI C, a pointer into an array of objects may legally point to
the first element after the end of the array; this is usually safe
in older implementations.
This "outside" pointer may not be dereferenced.
-
Only the
==
and
!=
comparisons are defined for all pointers of a given type.
It is only portable to use
<,
<=,
>,
or
>=
to compare pointers when they both point in to
(or to the first element after) the same array.
It is likewise only portable to use arithmetic operators on pointers
that both point into the same array or the first element afterwards.
-
Word size also affects shifts and masks.
The following code will clear only the three rightmost bits of an
int on some 68000s.
On other machines it will also clear the upper two bytes.
x &= 0177770
Use instead
x &= ~07
which works properly on all machines.
Bitfields do not have these problems.
-
Side effects within expressions can result in code
whose semantics are compiler-dependent, since C's order of evaluation
is explicitly undefined in most places.
Notorious examples include the following.
a[i] = b[i++];
In the above example, we know only that
the subscript into
b
has not been incremented.
The index into
a
could be the value of
i
either before or after the increment.
struct bar_t { struct bar_t *next; } bar;
bar->next = bar = tmp;
In the second example, the address of
"bar->next"
may be computed before the value is assigned to
"bar".
bar = bar->next = tmp;
In the third example,
bar
can be assigned before
bar->next.
Although this appears to violate the rule that
"assignment proceeds right-to-left", it is a legal interpretation.
Consider the following example:
long i;
short a[N];
i = old
i = a[i] = new;
The value that
"i"
is assigned must be a value that is typed as if assignment
proceeded right-to-left.
However,
"i"
may be assigned the value
"(long)(short)new"
before
"a[i]"
is assigned to.
Compilers do differ.
-
Be suspicious of numeric values appearing in the code ("magic
numbers").
-
Avoid preprocessor tricks.
Tricks such as using
/**/
for token pasting
and macros that rely on argument string expansion will break reliably.
#define FOO(string) (printf("string = %s",(string)))
...
FOO(filename);
Will only sometimes be expanded to
(printf("filename = %s",(filename)))
Be aware, however, that tricky preprocessors may cause macros to break
accidentally on some machines.
Consider the following two versions of a macro.
#define LOOKUP(chr) (a['c'+(chr)]) /* Works as intended. */
#define LOOKUP(c) (a['c'+(c)]) /* Sometimes breaks. */
The second version of
LOOKUP
can be expanded in two different ways
and will cause code to break mysteriously.
-
Become familiar with existing library functions and defines.
(But not too familiar.
The internal details of library facilities, as opposed to their
external interfaces, are subject to change without warning.
They are also often quite unportable.)
You should not be writing your own string compare routine,
terminal control routines, or making
your own defines for system structures.
"Rolling your own" wastes your time and
makes your code less readable, because another reader has to
figure out whether you're doing something special in that reimplemented
stuff to justify its existence.
It also prevents your program
from taking advantage of any microcode assists or other
means of improving performance of system routines.
Furthermore, it's a fruitful source of bugs.
If possible, be aware of the differences between the common
libraries (such as ANSI, POSIX, and so on).
-
Use lint when it is available.
It is a valuable tool for finding machine-dependent constructs as well
as other inconsistencies or program bugs that pass the compiler.
If your compiler has switches to turn on warnings, use them.
-
Suspect labels inside blocks with the
associated
switch
or
goto
outside the block.
-
Wherever the type is in doubt,
parameters should be cast to the appropriate type.
Always cast NULL when it appears in non-prototyped function calls.
Do not use function calls as a place to do type cheating.
C has confusing promotion rules, so be careful.
For example, if a function expects a 32-bit
long
and it is passed a 16-bit
int
the stack can get misaligned, the value can get promoted wrong, etc.
-
Use explicit casts when doing arithmetic
that mixes signed and unsigned values.
-
The inter-procedural goto,
longjmp,
should be used with caution.
Many implementations "forget" to restore values in registers.
Declare critical values as
volatile
if you can or comment them as
VOLATILE.
-
Some linkers convert names to lower-case
and
some only recognize the first six letters as unique.
Programs may break quietly on these systems.
-
Beware of compiler extensions.
If used, document and
consider them as machine dependencies.
-
A program cannot generally execute code in the data
segment or write into the code segment.
Even when it can, there is no guarantee that it can do so reliably.