Está en la página 1de 181

Common Sense C - Advice & Warnings

for C and C++ Programmers

(Publisher: 29th Street Press)


Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Preface
About the Author

Chapter 1—Introduction
What's the Problem?
"Real Programmers" And C
A Better C
Conquering C

Chapter 2—Common Mistakes and How to Avoid Them


Lazy Logic
Precedence Without Precedent
No Such Number, Address Unknown
It Hurts So Good
Sidebar 1 — C Coding Suggestions

Chapter 3—Foolproof Statement and Comment Syntax


Brace Yourself
Follow This Advice, or Else
Give Me a Break
One Last Comment
From C to Shining C
(Sidebar 1) — C Coding Suggestions

Chapter 4—Hassle-free Arrays and Strings


String Symphony
Sidebar 1 — C Coding Suggestions
Chapter 5—Simplified Variable Declarations
Chapter 6—Practical Pointers
Finger Pointing
C’s a Real Nowhere, Man
You Can’t Get There from Here
Amnesia
One Blankety-Blank Trap After Another
Letting the Cat Out of the Bag
Sidebar 1 — Pulling a “Fast” One
Sidebar 2 — C Coding Suggestions

Chapter 7—Macros and Miscellaneous Pitfalls


Chapter 8—Working with C++
Starting on the Right Foot
Your Constant Companion
The Calm Before the Storm
New and Improved
Merrily Down the Streams
Non-Plused
OOP, Not Oooops!
Weighing the Pluses and Minuses
C Coding Suggestions

Chapter 9—Managing C and C++ Development


Discipline Has Its Rewards
How Big Is the World?
Getting Started With Standards
The Evolution of Standards
No Train, No Gain
The Right Tool For the Job
Debugging Is a Waste of Time
Order Out of Chaos
Reuse It Or Lose It
Principles Of Reuse

Bibliography
Appendix
Index
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Table of Contents

COMMON-SENSE C -- ADVICE AND WARNINGS FOR C


AND C++ PROGRAMMERS

C is a powerful programming language, but not without risks.


Without help, even experienced C programmers can find
themselves in trouble, despite "careful" programming, lint filters
and good debuggers. And managers of programming projects
can discover too late that using C carelessly can lead to delayed
and defect-ridden software. This book helps avoid problems by
illuminating the dangers of C and describing specific
programming techniques to make C programming both faster
and safer.

Paul Conte draws on more than 15 years of software


development, including writing commercial products using C, to
warn you of C and C++ features that trip up even the best C
programmers. This book is unique in that it takes a critical look
at C's deficiencies, but offers tried-and-proven techniques to
minimize the chances that common C coding mistakes will lead
to serious or hard-to-find software defects. Managers will find
Paul's descriptions of C pitfalls and hard-hitting assessment of
the language invaluable in deciding when -- or whether -- to use
C for programming projects. No other book on C programming
combines the depth of specific technical information and the
strategic assessment of C's capabilities and risks that you'll find
in Common Sense C.

Table of Contents
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Table of Contents

About the Author

Paul Conte is a senior technical editor for NEWS 3X/400 and


pesident of Picante Software of Eugene, Oregon, which develops
workstation-based applications development tools for S/36 and
AS/400 programmers. Paul has published numerous articles on
the AS/400, programming languages, software engineering, and
database design. His interest in programming languages led to
the development of RPG/free, the widely used free-format
version of RPG. During his career, Paul has developed
applications on a variety of platforms, including the S/38,
AS/400, S/370, DEC, and PCs. His language expertise covers a
wide range: C/C++, COBOL, RPG, Pascal, FORTRAN, Awk,
and SNOBOL, to name a few.

Paul has a B.A. in psychology from Georgia State University


and an M.S. in computer science from the University of Oregon.
He served on the University of Oregon faculty for eight years
and has run his own consulting firm, prior to starting Picante
Software, Inc. Paul has received several awards for his writing,
including a Society for Technical Communication's International
Award of Excellence for an article about C pitfalls.

Acknowledgments
Several people played a key role in creating this book. Jennifer
Hamilton pressed the case for C and C++ and stimulated my
analysis of where C's problems lie. Arguing with her over C
facilities and programming style helped me refine my own side
of the debate. Mike Otey provided invaluable technical review.
Trish Faubion helped turn the original rough style into one that
retained its bite, but was much more polished. Katie McCormick
Tipton, Barb Gibbens, and Kathy Blomstrom all helped refine
my writing. And Dave Bernard and Sharon Hamm wielded just
the right mix of encouragement and threat to make the book
actually happen. My sincere thanks to all.

Dedication

To my parents, Theodore and Sybil Conte, who've always been


my example of lives well-lived

Table of Contents
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 1
Introduction

C and C++ are widely promoted as ideal portable, fast, and — in


the case of C++ — "object-oriented" languages. This
characterization is deserved when C is considered for systems-
level programs such as compilers, or for mass-market products
such as word processing or spreadsheet programs. C was
designed as a reasonably transportable replacement for assembly
language that would add some high-level language constructs,
but would retain almost all the low-level procedural capabilities
found at the machine instruction level. C++ follows in that
tradition, adding object-oriented capabilities (encapsulation and
inheritance) to improve productivity while retaining C's original
features and its philosophy of "bare metal" performance.

But C is increasingly being considered as the best replacement


for outdated commercial languages such as COBOL, RPG, and
Basic. And many proponents also recommend C and C++ as
superior alternatives to the Pascal family of languages (including
Modula-2 and other successors to Pascal); to object-oriented
languages such as Smalltalk, Eiffel, and Actor; and to the
general-purpose language, Ada. C has its place, but in many
cases — especially business programming — C can be a poor
choice.

What's the Problem?

The fundamental problem with C is that it doesn't hide enough


machine-level details. A good example is the central role that
pointer variables play in C programs. C pointers were designed
to provide machine-independent address arithmetic; and, for the
most part, pointers do make it easier to write system programs
that transport across machines. (Even this advantage is qualified,
however, because pointers don't always transport easily between
machines with flat addresses — e.g., Vax — and machines with
segmented addresses — e.g., Intel 808x.)

But at an application level, C pointers are a burden and a danger.


They're burdensome because the programmer has to attend to
details that a compiler can readily handle. For example, in C, to
use a function (procedure) parameter as an output parameter (i.
e., one that changes a value in the calling function), you have to
pass the address of the variable that is to receive the value. This
mechanism requires special attention when calling a function to
code an argument as arg when it's passed to a function that
defines the corresponding parameter as the same type as arg, but
as &arg when the argument is passed to a function that defines
the corresponding parameter as a pointer. In the called function,
normal parameters are referenced as arg, whereas the value of
parameters declared as pointers must be referenced as *arg. In
all of these cases, a simple miscoding that incorrectly omits or
adds a * or & can be fatal during program execution. By
contrast, in languages like Pascal and Ada, you simply specify
whether a parameter is passed by value (input only) or reference
(allowing output) and all references are simple variable names,
such as arg.
It's true that C++ adds references as a simpler way to implement
output parameters. But C++ still retains the error-prone use of
pointer parameters. And, as a good example of the damage that
can be done by conventional C/C++ advice, Bjorne Stroustrup,
the author of C++, goes so far as to discourage the use of
references as parameters and suggests pointer parameters
instead!

Pointers are often viewed as essential building blocks for


dynamic data structures, such as sets and lists, and C proponents
point to COBOL's (and other older languages') lack of pointers
as a good reason to switch to C. But there are two ways to
implement pointers: as addresses (as C does) or as "handles" (as
Pascal does). The two implementations serve two distinctly
different purposes. Address pointers let you directly manipulate
a pointer variable to create a new pointer value (i.e., a new
address). This ability is essential in many systems-level
programs where access of specific memory locations (or even
registers) is required. The downside of address pointers is that
there's no guarantee that a computed pointer value will be the
intended — or even a valid — address. As a result, a common
experience in C programming is to have a program write over
memory that contains the wrong data — the program's own
instructions, or even the operating system's code — all due to an
incorrect pointer value.

Handle pointers contain system-defined values (which may even


be addresses) that cannot be directly manipulated by arithmetic
operations, and which the system can check for validity before
using to reference storage. Thus, handle pointers provide support
for dynamic data structures, but protect the programmer from the
dangers of machine-level address manipulations. A similar
argument applies when comparing C's approach to storage
allocation (e.g., with the malloc() function) in explicit bytes
versus other languages' built-in new and delete operations to
allocate memory based on variable declarations, leaving the
storage size allocations to the compiler.

This discussion of pointers introduces a theme that is repeated


throughout the book — C was designed and is well-suited as a
replacement for assembly language. But most software
developers today agree that assembly language — even a great
version of assembly language — isn't the right tool for most non-
systems programming. Programmers who don't understand that
programming with C pointers (and many other C features) is
very close to assembly language programming are in trouble
from the beginning. Unfortunately, most C programmers don't
seem to get it.

"Real Programmers" And C

The problems with C itself would be more manageable if the


culture and practices that have grown up around C weren't also
rooted in machine-level, systems programming. Consider
something as simple as adding a new element to an array. Two
favorite C idioms for this operation are:

array[++top] = item;
array[next++] = item;

The first example increments top, then adds item to array[top].


The second example adds item to array[next], then increments
next. In C, arrays are really just synonyms for pointers; and this
coding style follows an assembly language practice of
combining an address increment with a memory reference to the
address. But in a high-level language, there's no reason to code
these operations in a single statement. (You might, of course,
want to create a procedure, such as add_item(array, item) so that
a single, meaningful statement can be used to add an item. But
that's not the point here, since both the increment and
assignment operations are coded explicitly in the example.)

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

The most important problem with this condensed coding style is


that, when you're reading volumes of code, it's easy to overlook
statements where the ++ increment has been placed on the
wrong end of the index identifier. The following alternatives use
the code's visual layout to show the critical sequence of
operations:

++top;
array[top] = item;
array[next] = item;
++next;

These alternatives also eliminate the need to use post increment


(and post decrement) operations, removing one more piece of
syntactic clutter and a potential source of coding errors from the
program. Note also that the two-statement alternatives are just as
easy to write and, with most optimizing C compilers, will
execute as fast as the one-statement approach.

To most people, even non-C programmers, the difference in


clarity in these isolated one- or two-line examples is small.
However, in large programs or more complex statements, the
differences mount up. As the examples in the rest of this book
point out, conventional C style — much of it based on assembly
language programming techniques — can also lead to subtle, but
fatal, program errors.

A Better C

Many claims have been made for C++, but one thing seems
certain: C++ is a "better" — if more complex — version of C. C+
+ adds some important language features missing in C; for
example, reference parameters, inline functions, and templates to
define generic functions and classes. These features aid clearer
programming and can reduce — but not eliminate — the need
for macros in C++ programs.

What C++ doesn't do is eliminate any of C's traps. C++ was


intentionally designed to be an almost complete superset of C;
that is, almost any ANSI C program — even one using
dangerous C techniques that have better alternatives in C++ —
will compile as a C++ program. Thus, you can still be burned by
typing = instead of == in a C++ program (I discuss this in
Chapter 2). C++ also continues the heavy use of special
characters, rather than keywords, in its syntax. The problems
that arise from C's use of * for "pointer" or "contents of" and &
for "address of" are compounded by new C++ notations, such as
a trailing & for "reference."

C++ also introduces facilities for object-oriented programming


(OOP). The primary new C++ concept is the "class," which is a
facility to package functions and variable declarations together
so that new data types can be defined and used in C++ programs.
C++ also provides for "inheritance," a facility for deriving a new
class definition from an existing class. The OOP capabilities of C
++ are quite powerful, and when you work with a well-designed
C++ class library, many implementation details can safely be
ignored. But creating new classes is a different matter; and, if
you write many non-trivial programs in C++, eventually you'll
have to construct some of your own classes. As Chapter 8 points
out, there are some very slippery slopes to climb as you write C+
+ classes.

The question frequently arises of whether a programmer who


doesn't know either C or C++ should learn C or C++ first. That's
a hard call, and the best answer may be to learn one of the object-
oriented extensions of Pascal or Smalltalk first, the idea being to
learn the OOP concepts with a language not so laden with
assembly language baggage, then learn how to do it in C++. In
any case, you can't completely skip over learning about problem
areas in C because most of these still exist in C++. As a result,
much of this book is directed at problem areas common to both
languages.

Conquering C

To pick the right projects for C or C++, and then use the
language effectively, you have to ignore a lot of conventional
attitudes towards C and C programming practices. Many of these
attitudes and practices are rooted in a time and place 15 years
ago when C was a major step forward for systems programmers.
Today there are good alternatives to C for many applications,
and programming practices have changed considerably. One of
the most important differences between 15 years ago and today
is that businesses are placing much more emphasis on
controlling software development costs than on modest
improvements in performance. Thus, developers trying to
control costs want to avoid language features such as address
pointers and coding practices such as folding a sequence of
distinct operations into a single statement.

If you do find yourself (or your staff) programming in C, the


attitude with which you approach the task has a lot to do with
whether you conquer C or it conquers you. To successfully
program in C, you can't just memorize more C rules, code more
carefully, and keep the debugger close at hand. You have to start
with an awareness of what types of languages C and C++ are,
and plan your strategy for preventing accidents. With some
forewarning, and the right attitude, it's not terribly difficult to do,
although compared to other languages, C can remain a
frustratingly primitive — and C++ an agitatingly complex —
way to write software.

There are some bright spots in the world of C programming,


however. If you don't succumb to the "this is the way all C
programmers do it" method of programming, you can enjoy the
benefits of an enormous collection of C and C++ source and
executable libraries, and a large set of C-related tools, such as
"C-aware" editors and programmer workbenches. And there's no
question that the fierce competition among C compiler vendors,
especially on the PC, has produced excellent and affordable C
compilers. The performance of well-designed C programs
compiled with one of the good optimizing compilers is usually
excellent, too.

So don't fear that the only result of programming in C is


spending large amounts of time chasing wild pointers. With the
right amount of respect for the language and not too much
respect for C "traditions," you can enjoy the advantages of the
broad C compiler and tools market. All it takes is going into it
with your eyes wide open and programming with a little
"common sense."

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 2
Common Mistakes and How to Avoid Them

Deck: Get a clear look at some classic surprises you'll want to


avoid in C programs
by Paul Conte

Why do some programmers think C is such a hot language? It


must be because it has burned them so many times. Unless
you're from the "no flame, no gain" school of programming, you
need to watch out when you start using C. In this book, I point
out some of the "hot" spots you really want to avoid.

Let's start by firing up an example.

if (x = y)
printf("Equal values");

Simple enough. If y is not zero, print "Equal values". And, by


the way, replace the value of x with the value of y. Isn't it nice
that C lets you do assignment within an if statement expression?

But maybe you thought this code really meant: If x is equal to y,


print "Equal values"? No, the code for that is
if (x == y)
printf("Equal values");

If this example tripped you up, don't worry. Typing =


(assignment) instead of == (equality) occasionally gets the best
C programmers, too. The problem isn't in comprehending the
different meanings of = and ==. The problem is that it's easy to
mistype = when you mean ==, especially because = is the
standard mathematical symbol for equality, and = represents
equality in many other widely used programming languages (e.
g., PL/I, COBOL, and Pascal). Unfortunately, C treats this easy-
to-make typo as an intentional assignment operation. The
resulting code will execute, and the error may be hard to
diagnose.

Hard-core C programmers may try to convince you it's your


inexperience, not C's syntax, that causes this type of coding
error. But there's a booming market in C source-code checkers
(known as "lint" filters) to help experienced C programmers
protect themselves from just these kinds of sneaky problems. If
C's pitfalls weren't so pervasive, lint utility vendors would be out
of business.

All programmers are not created "equal equal," so if you want to


be an A++ C programmer (why be just a C++ programmer?), the
first rule is don't use assignment in an if statement expression,
unless it is absolutely necessary. In addition, use a compiler
warning level or a lint utility that will catch = in if statement
expressions. Be forewarned, however, that you may never be
acknowledged as a "real" C programmer unless you're willing to
take some risks to speed up your code by a few nanoseconds.
Another good technique -- if you can handle accusations of
"wimp" programmer -- is to define the macro
#define EQ ==

and never use == at all. Instead, you can write logical


expressions, such as

if (x EQ y)
printf("Equal values");

In addition to = and ==, C also has & (bitwise AND), &&


(logical AND), | (bitwise OR), and || (logical OR). The bitwise
and logical operators work the same, when their operands are 0
or 1. In other cases, however, the results are different. For
example,

2 && 4

is 1, which is considered "true" in an if statement, whereas

2 & 4

is 0, which is "false." Because, in many cases, & and | produce


the same effect as && and || in if statement expressions (i.e.,
zero or non-zero), incorrect use of the bitwise operators can
cause infrequent and hard-to-diagnose errors. If you'd rather rely
on something more than luck for correct programs, you may
want to define the following four macros and use them instead of
&, |, &&, and ||.

#define and( a, b ) ( ( a ) & ( b ) )


#define or( a, b ) ( ( a ) | ( b ) )

#define AND &&


#define OR ||

Lazy Logic
Yes, C is a devilishly clever little language. It's quick to write,
too. Suppose you've written a function, get_customer, to return
either an integer customer ID or zero if no customer is input.
Why ywaste time with "verbose" code like

custid = get_customer();
if (custid > 0) {
/* Process the customer */
}

when you can simply write:

if (custid = get_customer()) {
/* Process the customer */
}

With the original definition of get_customer, this code works. In


C, an if statement evaluates the expression within parentheses,
and, if the expression's value is non-zero, the subordinate code is
executed. In this example, the variable custid is set to the return
value of get_customer. Because the value of a C assignment
operation is the same as the value assigned to the target variable,
when custid is assigned a non-zero value, the subordinate code
to process the customer is executed.

You'll see "simplified" if statement expressions like this all over


C programs. But suppose you and your fellow programmers
have been using get_customer for a while; say you have a dozen
or so programs that call it. Then one day you get an I/O error
that zaps one of your programs, and you decide you had better
add to get_customer a return value of -1 for an I/O error.
Problem solved? No, problems are created. Every

if (custid = get_customer())

statement will still execute the subordinate code when there's an


error because the value of the if statement expression is non-
zero. On the other hand, if you follow the first rule and keep the
assignment operation separate, your code will work properly
with the new error return value.

C is a "truth-or-consequences" language. You'll experience less


of the latter if you use only logical expressions (ones that
evaluate to 0 or 1) in if statements. You can define the following
simple macros to implement Boolean variables and functions
that return a Boolean value.

#define BOOL int


#define TRUE 1
#define FALSE 0

You should also use only Boolean variables and functions with
the logical operators && and ||. Following this practice
eliminates problems caused by accidentally using the bitwise
operators & and | in logical expressions.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Precedence Without Precedent

In our get_customer example, you might think the following


alternative would be safe and still have a nice "C-food" flavor.

if (custid = get_customer() > 0) {


/* Process the customer */
}

Now the code guards against negative, as well as zero, return


values. Or does it? Something's fishy here. This code simply
assigns 0 or 1 to custid because the > comparison operator has
higher precedence (i.e., binds more tightly) than the assignment
operator. This code is equivalent to

if (custid = (get_customer() > 0)) {


/* Process the customer */
}

What you really need is:

if ((custid = get_customer()) > 0) {


/* Process the customer */
}
C has 15 levels of operator precedence (so much for C being a
"simple" language). Two other easy rules will keep you from
floundering at C. Do all assignments as separate statements, not
as a part of a more complex expression. And use parentheses
liberally to explicitly define the order of evaluation.

No Such Number, Address Unknown

Understand one thing about C, and all its mysteries are revealed.
C was -- and is -- a language meant as a portable replacement for
machine-dependent assembly languages. Keep this in mind
when you consider the following example.

Suppose you code an array of part numbers and their names and
a few lines to display a list of parts, as shown in Figure 2.1. If
you remember C is for machine-level programming, you won't
be suprised to find there's no part number 11. In C, 011 is not 11;
it's 9! Integer constants that begin with 0 are octal (Get it? The 0
looks like O for Octal.)

Figure 2.1 Sample C Code

struct part {
int part_number;
char description[30];
}

main() {
int i;

/* Array of part numbers and descriptions


*/
struct part part_table[100] =
{
{011, "Wrench" },
{067, "Screwdriver" },
{137, "Hammer" },
{260, "Pliers" },
/* etc. */
{0, "sentinel" }
};

for (i=0; i<100; i++); /* Print the list


of parts */
{
if (part_table[i].part_number == 0)
break;
printf("%i %s", part_table[i].
part_number,
part_table[i].description);
}
}

There are even more subtle ways octal constants can sneak up on
you. Suppose you want to read an integer part from the standard
input and then output it, using

int part_number;

scanf("%i", part_number);
printf("Number %i", part_number);

If you enter 011, the program outputs 9. Unsurprisingly, the


format %i specifies an integer input and output field.
Surprisingly, if the input is 011, the value of part_number is 9.
You see, on input, %i means "decimal, hexadecimal, or octal
integer," whereas on output, %i means simply "decimal integer."
Unless this is an application for PDP-11 system programmers,
the code should be

int part_number;

scanf("%d", part_number);
printf("Number %d", part_number);

The format %d means "decimal integer" for both input and


output. Unless your magic number is 8, you should use a lint
utility or your editor to ferret out all %i format specifications and
numbers that begin with 0.

The previous two examples actually have a much bigger


problem than octal numbers. I miscoded the scanf function
argument as part_number instead of &part_number. So instead
of supplying scanf with the address where I want the input
stored (i.e., the address of part_number), I supplied the
uninitialized value of part_number. C is powerful, so powerful in
fact, that scanf will trash some location in memory pointed to by
whatever garbage is in part_number. If you're lucky, the trashed
memory will be part of the debugger or operating system code,
and you'll earn a C programming purple heart. To avoid winning
too many battle ribbons, however, always double-check that
you've supplied valid addresses for arguments to scanf and
similar functions.

It Hurts So Good

If you're new to C, you may think I'm blowing its problems out
of proportion. You may wonder whether C's flaws significantly
hamper the work-a-day C programmer. The answer is yes, most
C programmers do suffer from C's flaws; but like some
mainframe COBOL programmers and some midrange RPG
programmers, C programmers sometimes take pride in their
ability to overcome the language's deficiencies. And after
enough years chasing errant pointers, many C programmers
become numb to the pain of using a language that can crash the
debugger and freeze their PC.

Conversations I have had with technical staff of several large


microcomputer software companies illustrate what, I think, is the
prevalent viewpoint in the C programming culture. I asked
experienced C programmers whether they regularly encountered
the kinds of problems I've described so far, and they all said, in
effect, "Of course, it's just part of programming in C." Then I
asked them how they handled a couple of the most common
problems, and with one exception, they said they relied on
"careful programming," lint filters, and good debuggers. The one
exception said he built his own layer of abstract data types to
insulate himself from C. (In following chapters, I offer
techniques along these lines.)

The most insightful reflection about C I've heard is from the


computer scientist Bertrand Meyer, who designed the Eiffel
programming language. He said, "How could I even try to teach
systematic algorithm construction when I knew the bulk of the C
students' time was spent fighting tricky pointer arithmetic,
chasing memory allocation bugs, trying to figure out whether an
argument was a value or a pointer, making sure the number of
asterisks was right, and so on. I'm afraid it will be hard to
recover from the damage caused by C to an entire generation of
programmers."

The same assessment was put more briefly by the programmer


who said, "C's a double-edged sword—without any handle."

C can do you harm, and not just if you're inexperienced. In this


book, I will try to give you a handle on C so you can wield it as
safely as possible. In future chapters, I will describe other ways
that C can trip you up and suggest programming practices to
avoid common problems. If you're considering C, either for
workstation or AS/400 development, you'll gain a better
understanding of some of the risks you face. If you're already
using C, these tips will help you minimize your risks.

*****

To experienced C programmers only: Did you catch the other


errors I made in coding Figure 2.1? I'll point them out in Chapter
3.

Sidebar 1 — C Coding Suggestions

* Don't use = in an if statement expression, unless it is absolutely


necessary.
* Define a macro EQ for ==, and never use ==.
* Define macros for &, |, &&, and ||.
* Define macros for BOOL, TRUE, and FALSE.
* Use only Boolean-valued expressions in if statements.
* Use only Boolean variables with the logical operators && and ||.
* Do all assignments as separate statements, not as part of a more
complex expression.
* Use parentheses in expressions to explicitly define order of
evaluation.
* Don't use %i format specifications or numbers that begin with 0.
* Be sure to code addresses for arguments to scanf and similar
functions.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 3
Foolproof Statement and Comment Syntax

Deck: As you learn the language, learn its pitfalls as well


by Paul Conte

C is not really a bad language; it's just too often misused. As a


language for writing low-level device drivers or operating
system kernels, C is superb. It's also a great language for
torturing student programmers. But for business applications or
other software above the operating system level, C is a
minefield: "Explosive" results await the unwary C programmer's
misstep. Here's an example that requires you to pick your way
carefully:

if (xcnt < 2)
return
date = x[0];
time = x[1];

This code appears to guard references to array x by checking the


count of its elements first. But a semicolon is missing after the
return, so the code really means:
if (xcnt < 2) {
return date = x[0];
}
time = x[1];

C's "flexibility" lets you freely combine most expressions and


statements, such as this assignment expression within a return
statement. Unfortunately, this flexibility also means C compilers
can't detect many errors caused by simple typos.

Brace Yourself

The problem occurs because of a missing semicolon. Many "old


hand" C programmers would say the solution is simply to add
the semicolon (after a few hours of debugging!). But does the
following correction give us a safe program?

If (xcnt < 2)
return;
date = x[0];
time = x[1];

What if we decide to add an error message?

if (xcnt < 2)
printf("Timestamp array is too small\n");
return;
date = x[0];
time = x[1];

Indeed, this is the ultimate "safe" program — for valid arrays, it


never does anything but return, terminating program execution!
The code's execution is identical to

if (xcnt < 2) {
printf("Timestamp array is too small\n");
}
return;
date = x[0];
time = x[1];

For quick coding, C lets you omit the { } around a conditional


statement, a shortcut most published C programs take advantage
of. You will be tempted to take this shortcut, too. Don't! Ever!
Always enclose conditional code in braces. The errors
introduced by incorrectly matched conditions and subordinate
statements are very hard to ferret out. Our original example is
better coded like this:

if (xcnt < 2) {
return;
}
date = x[0];
time = x[1];

Note that using braces also lets the compiler catch a missing
semicolon, so you get lots of protection by following this simple
rule.

Unwinding this example also suggests a rule I mentioned in


Chapter 2: Do all your assignments as separate statements, not as
part of a more complex expression. Another helpful rule is: Use
parentheses around expressions in return statements. For
example, if you really did want to return the date after assigning
it to a global variable, you might code

if (xcnt < 2) {
date = x[0];
return (date);
}

This doesn't solve the original problem we looked at, but it does
show how to code return statements so you're less likely to be
tripped up by other problems with complex expressions.
Follow This Advice, or Else

Another problem related to if statements is the improper


matching of else clauses. (This problem is not unique to C;
COBOL programmers have been bit by the same type of
"bugs.") Suppose we change our previous example so that array
x must either have at least two elements to be assigned to date
and time or be empty, in which case the program should do
nothing. Other conditions should cause a return. The following
fragment seems to do what we require:

if (xcnt < 2)
if (xcnt != 0) return;
else {
date = x[0];
time = x[1];
}

But C associates an else with the closest unmatched if inside the


same pair of braces. The compiler executes the above code the
same as the following:

if (xcnt < 2) {
if (xcnt != 0) {
return;
}
else {
date = x[0];
time = x[1];
}
}

In other words, nothing at all happens when xcnt is 2 or greater.


Again, using braces for all conditional statements comes to the
rescue:

if (xcnt < 2) {
if (xcnt != 0) {
return;
}
}
else {
date = x[0];
time = x[1];
}

Although full use of braces increases the number of lines of


source code, braces make future program modifications much
easier and less error-prone. With braces delimiting segments of
conditional code, adding and deleting subordinate statements
requires less careful checking of how else clauses and
subordinate statements match up with the if statements.

Another, elegant, solution is to define the following macros:

#define IF { if (
#define THEN ) {
#define ELSE } else {
#define ELSEIF } else if (
#define ENDIF } }

We could then code our previous example as:

IF xcnt < 2
THEN IF xcnt != 0
THEN return;
ENDIF
ELSE date = x[0];
time = x[1];
ENDIF

Once the macros are replaced with their corresponding


definitions, the code is executed the same as the previous
example. This style is guaranteed to cause traditional C
programmers apoplexy, but they'll forget about it when they
chase down their next bug. Meanwhile, your code can be
readable and reliable.

Give Me a Break

As W. A. Wulf put it, "More computing sins are committed in


the name of efficiency (without necessarily achieving it) than for
any other single reason — including blind stupidity." C's switch
statement could be the all-time award winner in the "stupid
efficiency" category. The error in the following code fragment
may be obvious outside the context of a larger program; but in
real programs, such errors are easy to make and hard to find.

switch (color) {
case 1: printf("red\n");
case 2: printf("blue\n");
}

Given this code, when color is 1, both "red" and "blue" are
printed. The proper code is

switch (color) {
case 1: printf("red\n");
break;
case 2: printf("blue\n");
}

Of course, when you add another color, you'd better add another
break after the second case. C's switch is not what is generally
recognized as a "case" multiway conditional control structure;
it's nothing more than a jump table. The compiler evaluates the
switch expression simply to determine the target of a jump (i.e.,
go to) operation into the code that follows. Unless you code a
break, execution will continue sequentially through the code for
cases that follow the case that matches the switch expression
value.
Previous Table of Contents Next
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

The switch statement may be a honey of an operation for writing


small, fast device drivers, but it can be a sticky mess for typical
business programming. Code that intentionally jumps into the
middle of a sequence of operations is extremely tricky to
maintain, and you should avoid it except in low-level systems
programs. Fortunately, there's a very simple rule for business
(and most other) programmers to follow: Never use the C switch
statement.

This stricture is not at all burdensome. As I indicated, most


instances of C's switch should have a break after every case. For
such multiway conditions, you can simply use an "else if"
structure instead:

if (color == 1) {
printf("red\n");
}
else if (color == 2) {
printf("blue\n");
}
else {
printf("Invalid color\n");
}

Better yet, you can use the EQ macro described in the last
installment and the macros presented above (IF, THEN,
ELSEIF, ELSE, and ENDIF) to code the tests as in Figure 3.1.
This solution has a compact, table-oriented layout and avoids the
hazards of raw C.

Figure 3.1 - Coding an ELSEIF in C with Macros

IF color EQ 1 THEN printf("red\n");


ELSEIF color EQ 2 THEN printf("blue\n");
ELSE printf("Invalid color\n");
ENDIF

One Last Comment

Most programmers know that "comments lie," which is why


high-level languages should let you directly express what your
program does rather than force you to comment unclear code. C
programmers will find that comments can also make their code
lie! Read the fragment in Figure 3.2 carefully. A comment warns
anybody reading the code about an important condition that
changes at this point in the program flow. The comment tells us
that here is where the variable prv_opcode changes from the
previous opcode to the current opcode. A look at the C code
seems to verify that the comment doesn't lie. But the C code (or
what looks like C code) itself lies. The statement

strcpy(prv_opcode, op_code);

doesn't copy op_code to prv_opcode. It doesn't do anything —


it's part of a multiline comment, not executable code. The
comment ends with the */ on the last line in the figure, making
all of Figure 3.2 one long comment.

Figure 3.2 - Sample C Code


/* IMPORTANT NOTE: prv_opcode is set here,
after handling vendor-specific translations and
blank opcodes. After this section, opcode may be
modified. You should _not_ test prv_opcode after
this point because it now holds the current opcode.

strcpy(prv_opcode, opcode);

setnull(stk_opcode);
setnull(op_symbol);
setnull(op_suffix);

op_is_ctlop = FALSE;

/* Control opcodes are ones that cause


indentation:
BEGSR, IFxx, DO, DOUxx, and DOWxx.*/

[Note call-outs — see magazine version, p. 114, Sept. 91]

C uses /* and */ to delimit comments. C also implicitly


continues open comments across multiple lines until the ending
*/ is encountered. This makes it easy to have "runaway"
comments that encompass what's intended as executable code.
Unintentionally commented-out code, especially if it's
initialization code, can cause mysterious program behavior. You
see the program fail, you look at the code, and it "can't do that!"
Only when, on your tenth look, you finally catch that the
comment a page up has no closing */ do you unfold the mystery.

No foolproof way exists to avoid runaway C comments. (Newer


languages such as Ada let you prevent this problem by using —
to start comments that end at the end of the line.) Two rules can
help: Place the opening /* and closing */ for comments on lines
by themselves, and use a vertical bar to begin each line of
comment text. For example,
/*
|Comment lines
|are here
*/

This practice avoids the most common cause of a missing */ —


editing the last line of a comment and accidentally deleting the
*/ at the end of the line. It's also easier to check visually for
matching comment delimiters when they appear at the same
indentation level in the source. In addition, some C "lint"
utilities can catch occurrences of /* inside a comment, which
usually indicates a missing */.

From C to Shining C

If C's pitfalls somewhat tarnish its image, remember that your C


programs can still shine if you polish your programming
techniques. The most important thing you can do to improve C
programs is take "C-riously" the dangers of writing C code in the
traditional (some would say, "C-eat of the pants") manner. Don't
try to make your C code "do a lot with just a few statements,"
and don't hesitate to use source macros to lift yourself to a
higher, safer language level than C primitives. Sure, your code
will look foreign to old-style C programmers, but your running
programs will look a lot better to your end users.

*****

Answer to Chapter 2's puzzle: Figure 3.3 shows corrections to


the three errors in the original code. The structure definition was
missing the final semicolon (A), causing the compiler to treat the
definition as the return type of the main function. The for loop
had an extra semicolon immediately after the parentheses (B), so
the body of the for loop was the null statement instead of the
code within the braces. And the printf format string was missing
(C) a newline character (\n), causing the results to be printed in a
continuous stream, rather than one item per line.

Figure 3.3 - Corrected C from Chapter 2

struct part {
int part_number;
char description[30];
};
main() {
int i;
/* Array of part numbers and descriptions
*/
struct part part_table[100] =
{
{011, "Wrench" },
{067, "Screwdriver" },
{137, "Hammer" },
{260, "Pliers" },
/*etc. */
{0, "sentinel" }
};
for (i=0; i<100; i++) /* Print the list
of parts */
{
if (part_table[i].part_number == 0)
break;
printf("%i %s\n", part_table[i].
part_number,
part_table[i].description);
}
}

(Sidebar 1) — C Coding Suggestions

* Always enclose conditional code in braces.


* Do all assignments as separate statements, not as part of a more
complex expression.
* Use parentheses around expressions on return statements.
* Never use the C switch statement.
* Place the opening /* and closing */ for comments on lines by
themselves, and use a | to begin each line of comment text.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 4
Hassle-free Arrays and Strings

Deck: C Arrays can cause disarray, and C Strings can tie you up
in knots
by Paul Conte

When you learn that, in C, the first element of an array is


referenced as x[0], you appreciate C's reputation for being both
efficient and hard to read. The natural way to number the first
element in a series, of course, is with 1. But, as a replacement for
assembler, C was designed to start arrays with 0 to improve
performance. (Arrays that use 0 instead of 1 for the first element
permit simpler and faster calculation of memory addresses for
subscripted references.) The rules for using C arrays are simple,
but don't let that lull you into thinking you won't encounter
problems.

Suppose you need to store the number of orders for each month
(1 to 12). A "good" C programmer might declare an array and
reference an array element as in the following code:

int month;
int orders[12];
...
++orders[month - 1];

This works fine, so long as you never forget to subtract 1 from


month when you use month as a subscript. You also need to be
careful when you code for loops:

for (month = 0; month < 12; month++) {


printf("Total for month %d is %d\n",
(month + 1), orders[month]);
}

This example suggests I should clarify my previous caution to


remember to subtract 1. When you use month as a loop variable
that runs across the array's range (0 to 11), you shouldn't make
the adjustment in subscripted array references, but rather in
printing the loop variable. Also remember that, to cover the
array's range, the for loop must start at 0 and run to 11, not 12,
so use < instead of <= in the limit test.

There, see how simple arrays that start at 0 are! There are at least
a dozen other ways to code this example, but not one of them
overcomes the conflict between natural counting systems, which
begin with 1, and C's arrays, which begin with 0. This
"impedance mismatch" between the natural world and C
increases the likelihood of "off by 1" array and loop errors.

An easy solution exists, however: Declare C arrays with one


extra element, and don't use the element with subscript 0. Look
how much simpler the code becomes:

int month;
int orders[12 + 1];
...
++orders[month];
...
for (month = 1; month <= 12; month++) {
printf("Total for month %d is %d\n",
month, orders[month]);
}

Now you don't have to selectively adjust subscripts, and for


loops can have their range expressed clearly. You might wonder
at my profligate waste of memory for the unused array element;
but in many cases, simplified subscripts require less machine
code, so you get smaller programs.

If you want to refine this approach further by expressing your


array declarations using the highest valid subscript (i.e., the
upper bound), rather than one greater, you can call on our old
friends, source macros. Figure 4.1 shows one way to simplify C
array declarations by using a source macro. (In this and other
macros in this chapter, I use the term "table" to emphasize the
distinction between 0-based and 1-based arrays.) You can build
similar macros for tables of two or more dimensions.

Figure 4.1 Table Definition Macros

#define TABLE( tname, ttype, ttop ) \


int tname##_upper_bound = ( ttop ); \
ttype tname[ ( ttop ) + 1 ]

#define upper_bound( tname )


tname##_upper_bound

Note: ## is the macro concatenation (or "token-


pasting") operator. For example, if you use the
macro upper_bound( orders ), the macro
preprocessor will paste orders to _upper_bound to
generate orders_upper_bound.
Now we can write our first example as in Figure 4.2. And
because we often want to do for loops across the entire range of
a table, the macros in Figure 4.3 are handy. Using these macros,
we can simplify printing the monthly counts to the code in
Figure 4.4.

Figure 4.2 Using Table Macros

int month;
TABLE( orders, int, 12 );
.
.
.
++ orders[ month ];

for ( month = 1; month <= upper_bound


( orders ); month ++ ) {
printf( "Total for month %d is %d\n",
month, orders[ month ] );
}

Figure 4.3 Table Loop Macros

#define OVER_TABLE( tname,


idx ) \
{int
idx; \
for ( idx = 1; idx <= upper_bound
( tname ); idx ++ ) {

#define ENDOVER }}

Figure 4.4 Using Table Loop Macros

OVER_TABLE( orders, month )


printf( "Total for month %d is %d\n",
month, orders[ month ] );
ENDOVER
By now, you might reasonably ask, "Why bother creating all
these macros to make C look like some other language; why not
just use another language?" Good question, and if you have a
good alternative, such as Pascal or Modula-2, you should use it
instead of C. But if you're stuck with C, well-designed macros
can add substantial safety and clarity to your programs. And, I
should add, well-written macros don't hurt runtime performance,
because they are translated into ordinary C code before
compilation.

String Symphony

C doesn't have built-in support for variable-length strings;


instead, C "fakes" strings by using static character arrays,
character pointers, and a library of string functions that take
pointers as arguments. Because strings inherently can be any
length, jamming strings into C's fixed-length arrays leads to
especially perverse pitfalls. Take a simple assignment of one
string variable to another:

strcpy(b, a);

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

This innocent statement has probably reduced the average life


expectancy of C programmers by five years — stress is not good
for a programmer's health. What's wrong with this statement? I
don't know ... maybe nothing. Maybe when this statement
executes, the string in a will be no longer than b can hold. If so,
everything is all right. If not, everything is all wrong. The strcpy
function is a primitive memory-to-memory copy that is not
limited by the target's declared size. In this example, if the string
in a is longer than the size of b, whatever is next to b in memory
will be trashed. On PCs, this might even be operating system
code, leaving your system frozen solid.

The typical way in C to avoid string operations that overwrite


memory is to "be careful." That works for experienced
programmers — most of the time. It's not a pretty sight,
however, when this strategy doesn't work. The only effective
strategy is: Always guard a string assignment against
overwriting the target variable.

Figure 4.5 shows one way to guard a string copy, using the
sizeof operator. If the source string (including the '\0' terminator)
fits in the target, the whole string is copied; otherwise, only as
much as will fit is copied, and a null terminator is added. Even
though this technique may truncate some strings, your program
will continue its proper execution flow, rather than take some
wild path caused by overwriting part of the program's
instructions.

Figure 4.5 Guarding a String Copy

if ( strlen( source ) < sizeof( target ) )


{
strcpy( target, source );
}
else {
strncpy( target, source, ( sizeof
( target ) - 1 ) );
target[ ( sizeof( target ) - 1 ) ] =
'\0';
}

Of course, this technique is a prime candidate for a macro, using


source and target as parameters. You can use similar macros for
the other C library string assignment functions, such as strcat.
You can also add warning messages to your macros to make
error diagnosis even easier.

Unfortunately, macros using the sizeof operator won't work for


target strings that are function parameters, because the size of a
string parameter is not automatically passed to a function (C
passes just a pointer to the first character in the string). To
handle string parameters whose value you want to change (i.e.,
output or update parameters), you must explicitly pass the
string's declared size (or its maximum length, which is one less
than the string's declared size) to the function. Figure 4.6a shows
simple macros to implement "safe" strings. The STRING macro
declares a string and an associated variable to hold the string's
maximum length. STRING_TABLE declares a table (base-1
array) of strings. The cpystr macro simplifies a call to the
strcpymax function shown in Figure 4.6b.
Figure 4.6a Macros for Safe Strings

#define STRING( sname, smaxlen ) \


size_t sname##_maxlen = ( smaxlen ); \
char sname[ ( smaxlen ) + 1 ]

#define STRING_TABLE( tname, ttop,


smaxlen ) \
int tname##_upper_bound =
( ttop ); \
size_t tname##_maxlen =
( smaxlen ); \
char tname [ ( ttop ) + 1 ]
[ ( smaxlen ) + 1 ]

#define strmaxlen( sname ) sname##_maxlen

#define cpystr( target, source ) \


strcpymax( target, source,
target##_maxlen )

Figure 4.6b Safe String Copy Function

char * strcpymax( char target[],


const char source[],
const size_t target_maxlen ) {
if ( strlen( source ) <=
target_maxlen ) {
strcpy( target, source );
}
else {
strncpy( target, source,
target_maxlen );
target[ target_maxlen ] = '\0';
}
return target;
}

Figure 4.6c shows how to use these macros in your C programs.


Note that the first element in the month_abv table's initialization
list is just a placeholder for the unused element. Also note that
when an element of a string table is the target of a string copy,
you must use the strcpymax function rather than the cpystr
macro because the cpystr macro can't generate the correct name
for the string maximum length variable. These macros and this
function work with strings passed as parameters, as well as with
those declared as local variables. There are still some limitations
with this technique, however. More complex aggregate data
types (e.g., structures containing strings) require additional
macros for declarations, or other techniques.

Figure 4.6c Using the Safe String Macros

int month;

STRING( print_line, 80 );

STRING_TABLE( month_abv, 12, 3 ) =


{ "", "Jan", "Feb", "Mar", "Apr", "May",
"Jun",
"Jul", "Aug", "Sep", "Oct", "Nov",
"Dec" };
.
.
.
cpystr( print_line, month_abv[ month ] );
.
.
.
strcpymax( month_abv[ 9 ], "Spt", strmaxlen
( month_abv ) );
.
.
.
OVER_TABLE( month_abv, month )
printf( "%s is month number %d\n",
month_abv[ month ], month );
ENDOVER
A full description of implementing safe, variable-length strings
in C is beyond the scope of this book, but you can do it using
structures that contain string lengths and pointers to one or more
memory blocks for the string contents. I've seen numerous
programs where the programmer has built such structures "on
the fly." Such programs are often fragile and flaky because they
combine some of C's most treacherous features: pointers and
dynamic memory management. A more structured approach can
reduce your risks.

If you embark on an advanced string implementation, be sure to


build a library of macros and functions that provide a safe, high-
level set of string operations, and use these instead of C's
primitive string functions. Your best bet is probably to use C++
and an existing C++ string class (e.g., ones available from The
Free Software Institute) or switch to Awk, a language that has a
C-like syntax and includes full support for variable-length
strings.

Rough C Coming

Most of the C pitfalls I've covered in the first three chapters are
fairly easily circumvented by avoiding certain language
constructs and using macros. Strings presented the first example
of inherent C constructs for which there is no simple, universal
solution (other than perhaps moving to C++). In Chapter 5, I'll
take up pointers, a feature of C even more difficult than strings
to handle safely.

*****

Sidebar 1 — C Coding Suggestions

* Declare C arrays with one extra element, and don't use the element
with subscript 0.
* Use macros to define tables and loops over them.
* Always guard a string assignment against overwriting the target
variable.
* Create macros and functions to define strings and provide "safe"
string operations.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 5
Simplified Variable Declarations

Deck: Follow these tips for unlimited visibility in your C


declarations

by Paul Conte

An object's scope is something you can't C very clearly in source


code. Or should I say you can't code scope clearly in C source?
If you can't quite C where I'm headed, it's because we're just
beginning a winding tour through the maze of C object visibility.
Follow closely, and by the time we exit the maze, you'll have a
simple map for the shortest route out.

C, like many languages, lets an identifier x refer to different


objects (e.g., storage locations for variables) at different places
in the source code. As a simple case, the two declarations of x
below refer to different variables.

int x;

main( void ) {
int x;
...
}

void func1( void ) {


...
}

Let's say the first x refers to a storage location we'll call S1, and
the second x refers to a storage location we'll call S2. The scope
of the S1 object (storage location) is simply the region (i.e.,
lines) of source code where references to x are references to S1;
likewise, the scope of the S2 object is the region of source code
where references to x are references to S2. In this example, the
scope of S1 (the first x) is everywhere outside the main function,
and the scope of S2 (the second x) is only within the main
function. Obviously, for the program to be clear, these two
regions of the program can't overlap; each reference to x must
refer to just one of the storage locations. Scope is also called
"visibility" because you can "see" an object (e.g., read or change
a storage location) only within its scope. In this chapter, I use
"visibility" for the general concept and "scope" to refer to C's
specific lexical scope attribute.

The C-nic Route

The concept of visibility is simple and useful. Among other uses,


distinct regions of object visibility let you use identifiers in
different parts of your code without worrying about whether the
same identifier, used for different purposes, unintentionally
refers to the same object. But why be merely simple and useful
when you can be clever and brave, too? Take your first left turn
into the C labyrinth.

C splits the single concept of visibility into scope and linkage,


with scope referring to the region of program text within which
an identifier's characteristics are understood. C's linkage term
refers to the connection between identifiers in independently
compiled translation units. Expressing visibility with two
attributes instead of one may help C compiler writers, but it
makes it more difficult for programmers to determine which
object an identifier references. I'll try to map out C's rules while
noting some language flaws that lead to C's problems. Then I'll
mark an easy path to declaring variables and functions with the
desired visibility.

C's rules for function visibility are simple: If you specify static
storage class, a function is visible throughout the source file in
which it's defined, but not in other source files. With extern or
no storage class specifier, a function is visible throughout the
program (i.e., across all files), and you can call it from
anywhere. These rules lead to my first suggestion: Declare
functions static if you intend to call them only from within the
same source file.

C's rules for variable visibility are far more complex than those
for function visibility. I've listed these rules in a table (Figure
5.1) that shows, for any variable declaration, where that variable
is visible. I've also listed C's scope and linkage attributes and
whether the declaration causes storage to be allocated (in C
terminology, whether it is a definition as well as a declaration).
You can use this guide to help you understand some of the
mysterious changes that can occur when a C variable has
unexpected visibility. You also may need it to follow my next
few examples, but later I'll show you a far more useful table for
C programming.

Figure 5.1
Visibility of C Variables
DECLARED OR REFERENCED INSIDE A BLOCK (function
or nested block):
Scope/
Where Storage Initial Storage Visibility
Linkage
specified class value allocation
Within
same
1. Block
(none) Yes Yes block,
Declared scope,
including
all
in block auto No Yes no nested
linkage blocks,
except
register any nested
block (and
its
static nested
blocks)
with an
identical
identifier
without
extern
2. extern No CASE A: Enclosing scope has
Declared identical, visible
in block identifier — same scope, linkage,
allocation, visibility as the
matching identifier
CASE B: Otherwise, same as if
declared extern outside function
(see 7, below)
3.
extern Yes ILLEGAL
Declared
in block declaration
4. Not CASE A: Enclosing scope has
declared identical, visible identifier
but declared in the same source file
referenced prior to the reference — same as if
in block declared extern in block (see 2,
above)
CASE B: Otherwise, ILLEGAL
declaration
EXTERNAL DECLARATIONS (declared outside any
function):
Scope/
Which Storage Initial Storage Visibility
Linkage
declaration class value allocation
5. First (none) Yes Yes File Rest of file
scope except any
declaration No external block (and
its nested
blocks)
with an
identical
identifier
declared
without
extern, and
blocks it
contains
and Other
files with
an identical
external
linkage
identifier
declared in
them (No
other file
may
allocate
(define) an
identical
external
linkage
identifier.)
6. First static Yes Yes File Rest of file
scope, except any
declaration No internal block (and
linkage its nested
blocks)
with an
identical
identifier
declared
without
extern, and
blocks it
contains
7. First extern Yes Yes File Rest of life
scope, except any
declaration external block (and
linkage its nested
blocks)
with an
identical
identifier
declared
without
extern, and
blocks it
contains
and Other
files with
an identical
external
linkage
identifier
declared in
them (One
other file
must have
an identical
external
linkage
identifier
allocated
(defined) in
it.)
8. First extern Yes Yes Same as if declared
outside function
without extern or static
declaration
(see 5, above)
9. Second Same scope, linkage,
Must have same
or allocation, and
visibility as first
later type and linkage
declaration
declaration as first declaration

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Feeling a Little C-sick?

C syntax flies in the face of the "say what you mean, mean what
you say" principle of programming. The keyword extern is
simply an abbreviation for "external" — a word you'd expect to
be related to visibility or scope and to mean something like
"outside the current context." And you'd expect the other
relevant keyword, static, to relate to how storage is managed.
Thus, you might read the declaration

extern int x;

to mean the variable x is visible outside the block or file in


which it's declared. Likewise, for a declaration without extern (C
has no intern keyword), such as

int x;

you'd expect the variable x not to be visible outside the block or


file in which it's declared. For variables declared within a
function or nested block, this sensible interpretation holds. The
problem arises for variables declared outside any function. (In C,
such declarations are called external declarations.) In most cases,
when the two previous examples appear as external declarations,
both the extern and non-extern declarations mean the variable is
visible outside the file in which it's declared. (If the extern
declaration isn't the first declaration of x, however, and the first
declaration of x isn't visible outside the file, the extern
declaration also specifies that x isn't visible outside the file — a
further complexity in C's approach to visibility.) Some
programmers wise in the ways of C might argue, "But both
declarations are external declarations, so it is consistent that they
both are visible `externally' to the file." Nice try. However, when
the declaration

static int x;

is outside any function, it also is an external declaration, yet it


defines a variable that is not visible outside the file.

Not only does C's syntax lack consistency, but it also confuses
things by using the static storage class keyword to specify
visibility. My theory is that Humpty Dumpty was on the original
C design team. As he told Alice, "When I use a word, it means
just what I choose it to mean — neither more nor less."

There are other confusing cases, as in the second declaration of x


below,

static int x;
main( void ) {
extern int x;
...
}

where the use of extern results in internal linkage. I'll spare you
a diversion down the cul-de-sac you enter when you have more
than one external declaration for the same variable (which C
allows). As we pass through the maze, however, notice that it
matters whether you initialize a variable in an external
declaration when you specify extern. (With an initializer, the
declaration allocates storage; without one, it doesn't.) Yet when
you specify static or no storage class, initialization doesn't effect
storage allocation.

Although I've wandered repeatedly around the C maze using


guides such as the SAA C reference manual and Kernighan and
Ritchie's description of C (see the Bibliography), I still can't
guarantee that Figure 5.1 is a perfect map of every C visibility
rule. But there are better ways to simplify C variable
declarations and make your programs more readable than
building a mental map of C's visibility labyrinth.

UnC-eemly Solutions

The fastest way out of C's maze is to decide where you want to
go — that is, what kinds of visibility you want to use. I've found
three "classes" of visibility adequate to define most variables and
functions:

* Local: visible only within a single function or nested block.


* Share: visible in all functions in a file, but not outside the file.
* Export: visible in all functions in a program (all files). Export
visibility implies share visibility. The file that "exports" an object
allocates its storage.

You pick an appropriate one of these visibility classes when you


define a variable. If a variable is local, you generally are done
with your declarations for that object. In C, you define most
local variables at the beginning of a function, and they are
visible throughout that function (but not outside it).

If a variable has share or export visibility, however, you need


one more concept — import declarations — to specify you
intend to reference the variable in a specific function. One use of
the "import" concept is to specify you want to use an export
variable in a file other than the one containing its definition.
Naturally, a file can't import a variable that isn't exported by
some other file, and a file doesn't need to import variables that it
exports — such variables already are available for shared use
within the file. Note that in C programs, only one file can export
a particular variable (this isn't true for all languages). Another
use of the import concept is to specify you want to use a
nonlocal variable in a function or block. After covering a couple
of general rules relating to visibility, I'll explain a simple way to
implement local, share, and export visibility, as well as import
declarations.

The most important rule for declarations in C, or any language,


is: Use the most restricted visibility possible for variables; avoid
shared variables. In C, this most often means defining variables
used in a function as local to that function by putting their
declarations at the beginning of the function and using auto or no
storage class specifier. The same rule and technique apply to
nested blocks within a function.

In the special case of references to variables within a nested


block that aren't local to the nested block, I usually do not
declare the variables in the nested block. This results in implicit
extern declarations for any variable you reference but don't
declare in the nested block. When you follow this and other
guidelines I present below, all references to variables not
declared in a nested block resolve to a local or import
declaration in an outer block. I rarely use nested blocks, except
with "loop" macros, such as those I presented in Chapter 3. And
in those cases, import declarations don't add any real protection
or improve clarity. However, if you prefer to explicitly declare
every variable in a nested block, you can either use the IMPORT
macro (which I'll introduce in a moment) or define a similar
macro to more specifically express your intent.
You should define share and export variables before the first
function definition in a file, although C permits the confusing
and dangerous practice of placing external declarations between
functions, as in the following example:

void func1( void ) {


...
}
int x;
void func2( void ) {
...
}
void func3( void ) {
...
}

This ordering makes x implicitly visible in func2 and func3, but


not visible in func1, unless x is explicitly declared extern within
func1. The intent of such coding usually is to declare x as a
variable shared only by some — not all — of the functions in a
file. If several functions are coupled by sharing data you don't
want other functions to access, put the coupled functions in a file
by themselves, along with the external declarations for their
shared variables.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

C-through Macros

The best way to declare share, export, and import variables is to


define macros for the required C syntax. Figure 5.2 shows three
suggested macro definitions and how to use them (along with
rules for declaring local variables.) For variables you want
visible throughout the file, use the SHARE macro before the
file's first function definition to declare (and define) each
variable. For example,

SHARE int x;

This statement expands to

static int x;

which gives x the desired visibility. Note that you never use
SHARE inside a function because it would just define a local
variable.

Figure 5.2
How to Declare C Variables and Functions
"Visibility" Macros
#define SHARE static
#define EXPORT
#define IMPORT extern

Declaring Variables

Where Storage/visibility
Visibility Storage referenced specifier(s)
Local Automatic In block Define at beginning of
block with no specifier
Local Static In block Define as static at
beginning of block
In nested (No declaration, use
Nonlocal —
block implicit extern)
File Static In function Define as SHARE
before first function in
file
Declare as IMPORT at
beginning of each
function where
referenced
Program Static In function Define as EXPORT
before first function in
file where variable is to
be allocated
Declare as IMPORT
before first function in
file(s) where variable is
used
Declare as IMPORT at
beginning of each
function where
referenced
Declaring Functions
Callable
Visibility specifier
from
Same file
SHARE
(only)
Any file EXPORT

The SHARE macro isn't a silver bullet to slay all of C's visibility
monsters, nor does it boost C's visibility features into a class
with Modula 2's. But it makes a lot more sense to write SHARE
rather than static when you're trying to specify a variable's
visibility.

In functions or blocks where you want to use a SHARE variable,


declare the variable with the IMPORT macro:

void func1( void ) {


IMPORT x;
...
}

This declaration expands to

extern int x;

which, although unnecessary given C's default scoping rules,


makes explicit the function's intention to use x. Unfortunately, C
won't protect you against unintentional references to x in
functions that don't declare x. Reducing the number of nonlocal
variables is the best way to reduce accidental nonlocal
references.

You also should specify SHARE for functions you intend to be


callable only from within the same file. Use EXPORT for
functions you intend to be callable from anywhere. Functions
without a visibility specifier default to EXPORT, but I find
explicit visibility reminds me which functions are callable from
anywhere. (I often notice EXPORT functions that don't need
such broad visibility and that are better defined as SHARE.) In
addition, by grouping all SHARE functions after all EXPORT
functions, you can organize your functions better.

For export variables you want visible throughout all source files
(i.e., the entire program), use the EXPORT macro to declare
(and define) the variable before the first function definition in
the file "owning" the variable. In most cases, this is the file
containing the main function.

The following example shows how to define an export variable:

EXPORT int x = 0;

This statement expands to

int x = 0;

which gives x the desired visibility and allocates storage for x.


It's good programming practice to initialize variables in their
definitions, especially variables for which C's default zero
initialization is not within the variable's domain (valid values).
I'd also recommend you place all EXPORT variable declarations
before any IMPORT declarations. Note that, as with SHARE,
you never use EXPORT inside a function.

In functions where you want to use a nonlocal variable, declare


the variable with IMPORT at the beginning of the function:

void func1( void ) {


IMPORT x;
...
}

The IMPORT declaration adds the necessary extern specifier


and clearly shows that the function uses a nonlocal variable. A
variable declared as IMPORT in a function will resolve to one of
the file's SHARE or IMPORT external declarations.

In files where you want to use a variable exported from another


file, declare the variable with IMPORT once at the beginning of
the file and once in each function or block where you want to
use the variable. For example,

| Before first function


*/

IMPORT x;
...
void func1( void ) {
IMPORT x;
...
}

The first IMPORT declaration adds the necessary extern


specifier and clearly indicates that x is defined in another file
and that x may be used in other files (as well as in the current
file and in the one that defines it). Like the previous example,
the second IMPORT identifies a nonlocal variable used in a
function. In this case, the two IMPORT declarations are clearly
warning that func1 uses a variable that functions in other files
also might use.
Finding Your C Legs

SHARE, EXPORT, and IMPORT let your declarations "say


what they mean and mean what they say." You may prefer
different macro names; almost any descriptive names produce
clearer code than C's standard syntax. I picked up the idea for
the macros from Steve Schustack's suggestion in Variations in C
for similar macros he calls GLOBAL, SEMIGLOBAL, and
IMPORT. I think EXPORT indicates better than GLOBAL the
role of a variable allocated in one file and made available to
other files, and SHARE seems more informative than
SEMIGLOBAL. But whatever names you choose, these simple
"visibility" macros and guidelines for declarations will let you,
and anyone who reads your programs, C clearly now.

C Coding Suggestions

* Declare functions static if you intend to call them only from within
the same source file.
* Use the most restricted visibility possible for variables; avoid
shared variables.
* Put all external declarations before the first function definition in a
file.
* Put functions that must share data and external declarations for
their shared variables in a file by themselves.
* Use EXPORT, SHARE, and IMPORT macros to clarify the
intended visibility of a variable.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 6
Practical Pointers

Deck: Learn some handy techniques for avoiding C pointer


pitfalls
by Paul Conte

If William Tell had used a C compiler for a bow and C pointers


for arrows, he’d have skewered Bill, Jr. — not an apple — to the
tree. Although C pointers are simply memory addresses, they’re
notorious for contributing to tricky programs and hard-to-spot
program bugs. C pointers are easier to use than assembler
address operands, but C’s low-level and unprotected exposure of
addresses creates pitfalls for business application programs.

Perhaps the most common mistake when working with pointers


is forgetting the * (the dereferencing or “contents of”) operator.
The following code shows a fairly harmless but common
example of this mistake:

int x = 25;
int *y;

y = &x;
printf( "y is %d\n", y );

What is printed is the address stored in y, not the value (i.e., 25)
stored at that address. The correct printf statement is:

printf( "y is %d\n", *y );

Not all * omissions are so harmless and easy to detect. Consider


the fees function in Figure 6.1. This function uses age and
income to calculate registration and activity fees, which the fees
function returns via two pointers passed by the caller.

Figure 6.1 Typical Function Returning Two Values

void function fees( int * rfee,


int * afee,
const int age,
const int income ) {
/*
| Calculate fees as base plus adjustment
based on age
| and income
*/

* rfee = 100;
* afee = 50;

rfee += ( age >= 60 ) ? ( income <


50000 ? 0 : 50 ) :
( income < 50000 ? 100 :
200 );

afee += ( age >= 60 ) ? ( income <


50000 ? 10 : 20 ) :
( income < 50000 ? 30 :
40 );
}
The fees function will be compiled and executed without raising
exception. But every call to fees will produce the same $100
registration fee and $50 activity fee, regardless of age or income.
In this example, the third and fourth assignment statements
increment the values of rfee and afee, which are addresses
(pointers), not the integer values stored at these two addresses.
The assignment statements’ targets should be *rfee and *afee.
The compiler, however, can’t tell the original version is wrong
because addition operations are legal on both pointer and integer
variables.

C’s lack of “output” parameters forces C programmers to


explicitly handle addresses and dereferencing (i.e., referencing
the storage pointed to by a pointer) to return more than one value
from a function. Combined with C’s overloading of arithmetic
operators for both integer and pointer arithmetic, dereferencing
can easily trip you up. A good high-level language (HLL) should
support output parameters so you don’t need pointers and
dereferencing to return multiple procedure values. (The C
development community recognizes this C deficiency and has
added references, which can be used for return parameters, to C+
+. But no such facility is planned for C itself.)

HLLs suitable for business programming also should either


prohibit direct address modification (i.e., pointer arithmetic) or
provide distinct functions for modifying addresses so such
operations stand out in the code rather than appear as ordinary
arithmetic operations. As I’ve emphasized in previous chapters,
C was designed as a portable assembly language, and when
you’re programming at the machine level, it’s logical to treat
addresses as integers. At the business application level, however,
machine addresses shouldn’t be visible, much less easily
confused with ordinary numbers.

You won’t find a foolproof way to use dereferenced pointer


parameters. If you try to code operands such as *rfee and *afee
throughout a function, you’ll eventually slip up and omit the *.
Finding the mistake may not be easy. But a simple coding
practice will lead you around the pitfall: For non-array “output”
or “input/output” parameters, use local variables instead of
dereferenced parameters in function calculations.

Figure 6.2 shows the fees function rewritten to use two local
variables in the calculations. The function’s last two statements
assign the calculated values to the locations pointed to by the
pointer parameters. This technique isolates and simplifies
dereferencing and can significantly reduce errors. Figure 6.3
shows how to handle in/out parameters by initializing the local
variables to the dereferenced parameters.

Figure 6.2 Using Local Variables Instead of Dereferenced


Parameters

void function fees( int * rfee,


int * afee,
const int age,
const int income ) {

/*
| Calculate fees as base plus adjustment
based on age
| and income
*/

int reg_fee = 100;


int act_fee = 50;

reg_fee += ( age >= 60 ) ? ( income <


50000 ? 0 : 50 ) :
( income < 50000 ?
100 : 200 );

act_fee += ( age >= 60 ) ? ( income <


50000 ? 10 : 20 ) :
( income < 50000 ?
30 : 40 );

/*
| Return values
*/

* rfee = reg_fee;
* afee = act_fee;
}

Figure 6.3 Using Local Variables with In/Out Parameters

void function fees( int * rfee,


int * afee,
const int age,
const int income ) {
/*
| Adjust fees based on age and income
*/

int reg_fee = * rfee;


int act_fee = * afee;

reg_fee += ( age >= 60 ) ? ( income <


50000 ? 0 : 50 ):
( income < 50000 ?
100 : 200 );

act_fee += ( age >= 60 ) ? ( income <


50000 ? 10 : 20 ):
( income < 50000 ?
30 : 40 );
/*
| Return values
*/

* rfee = reg_fee;
* afee = act_fee;
}
Previous Table of Contents Next
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

A companion to the previous rule is use array notation instead of


pointers and dereferencing when working with arrays. C’s array
notation is really just shorthand for pointer operations, and C lets
you use either in most contexts. For example, if a is declared as
an array, *(a+i), a[i], and i[a] mean exactly the same thing.

But when using an array variable, you should stick with array
notation such as a[i] to keep your code’s meaning obvious. An
added benefit in using such notation is that, in some contexts, the
C compiler can catch mistakes in expressions using array names
that it can’t catch with pointers (e.g., C lets you change an
address in a pointer variable, but you can’t change the address
referred to by an array name). And before you let some “old
hand at C” convince you that direct manipulation of pointers is
“so much faster” than subscripting arrays, read “Pulling a `Fast'
One,” page XX. In business applications and most utility
software, you can freely use array subscripts without
performance concerns.

I’ve read the viewpoint that since C array notation is really just
shorthand for pointer operations, you should use pointer notation
because it more “honestly” shows what’s going on. If you’re
trying to dissuade someone from using C, this argument has
merit. C pointer and dereferencing notation certainly looks
stranger than array notation to most programmers and warns
newcomers that C isn’t your ordinary HLL. But in the long run,
array notation expresses high-level data constructs much better
than pointer notation.

Finger Pointing

You can fall into another C pothole by forgetting that a pointer


isn’t the same as the thing it points to. The following code
appears to save a copy of the current string in a “previous string”
variable and then assign the current string a new value.

char * curstr;
char * prvstr;

curstr = (char *) malloc( 10 );


prvstr = (char *) malloc( 10 );

strcpy( curstr, "abc" );

prvstr = curstr;

strcpy( curstr, "xyz" );

But after these statements are executed, both curstr and prvstr
point to “xyz”. The assignment prvstr = curstr copies the address
stored in curstr to prvstr, not the contents of the memory location
curstr points to.

Using *prvstr = *curstr won’t accomplish what we want either.


It just copies the single byte that curstr points to into the single
byte that prvstr points to. To do a simple “save a copy of this
string” operation, you require code like that in Figure 6.4. As is
often the case in C, high-level operations that should be simple
and safe are neither. I can offer only this caution: When working
with pointers in assignment statements, double-check that you’re
using the right level of indirection. Most C compilers warn you
about incompatible types or different levels of indirection in
assignments, but they give no warning when both sides of an
assignment are compatible types and levels but at the wrong
level (as in the previous example).

Figure 6.4 Saving a Copy of a String

prvstr = (char *) malloc( strlen( curstr )


+ 1 );

if ( prvstr == NULL ) {
printf( "No memory available\n" );
}
else {
strcpy( prvstr, curstr );
}

C’s a Real Nowhere, Man

If you’ve ever watched Wile E. Coyote spinning his legs in thin


air above some canyon floor, you’ve seen what happens when
you try to use a C pointer that doesn’t point to anything. C’s
macro name for this ticket to nowhere is NULL. It isn’t hard to
accidentally create null pointers; in fact, you get one every time
you define a static pointer variable. Look at the following code:

int val = 25;


int *ptr;

*ptr = val;

This code will be compiled but will either blow up or corrupt


memory at runtime. Although ptr is defined as a pointer, it’s
value is initially NULL (or some undefined value). Thus, the
assignment statement’s target doesn’t have a valid address. The
two correct alternatives are:
ptr = &val;

which assigns the address of val to ptr, or

ptr =
(int *) malloc( sizeof( val ) );
*ptr = val;

which (usually, as I explain in the next section) allocates


memory to store the value 25. If you’re counting on one of my
“magic macros” to avoid this pitfall, you may be disappointed
that I can offer only the shopworn C programming dictum: Be
careful! Unless you’re positive a pointer has been initialized,
check it for NULL, as shown below, before using it:

if ( ptr NE NULL ) {
*ptr = val;
}
else {
printf( "ptr is NULL\n");
}

Some compilers let you generate checks for referencing


uninitialized variables or NULL pointers, and you may want to
use this defense both during development and for production
applications.

You Can’t Get There from Here

In C, you can also create “dangling” pointers — ones that point


somewhere, but not where you’d expect. Figure 6.5 shows a
function intended to return a month’s name, given the month’s
number. This function will be compiled, will run, and will return
a non-null character pointer. But the returned pointer will point
to memory allocated only temporarily to the names array. When
the month_name function returns, the names array will be
deallocated, and another function’s local (automatic) variables
may reuse its memory. Using the pointer value returned by
month_name may result in a month Pope Gregory never
contemplated — or worse, another frozen machine. Functions
patterned after Pascal’s built-in new pointer function offer some
help. To conveniently and safely allocate a new string and assign
it a value, first use C’s typedef feature to make a meaningful
name for C’s “string” type:

typedef char * string_t;

Here, I follow the common convention of ending typedef names


with “_t”. C typedef names are synonyms for explicit C type
specifications and help make your C programs more readable.
(In Chapter 4, I describe a more comprehensive way to deal with
strings in C. In this chapter, I cover only conventional C string-
handling. You can combine suggestions from both chapters to
build your own complete string facility.)

Figure 6.5 Creating a Dangling Pointer

char* month_name( const int month ) {

/*
| Return month name
*/

char names[10][12] = {
"January", ... "December" };

return names[ month - 1 ];


}

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Figure 6.6 uses string_t to declare parameters for the function


new_string. (A similar function, strdup, is available in Microsoft
C but not in ANSI C.) With this function, it’s easy to write a
correct version of month_name, as shown in Figure 6.7. Like
Figure 6.5’s incorrect version, the corrected version returns a
character pointer. However, the pointer returned by the corrected
version points to memory that remains allocated and that
contains the desired month name.

Figure 6.6 new_string Function

string_t function new_string( const


string_t val ) {

/*
| Allocate and load storage for string
val
|
| Return pointer or NULL, if error
*/

string_t p;
if ( val == NULL ) {
printf( "Invalid NULL value pointer
\n" );
return NULL;
}
p = (string_t) malloc( strlen( val ) +
1 );

if ( p == NULL ) {
printf( "No memory for %s\n", val );
return NULL;
}
else {
strcpy( p, val );
return p;
}
}

Figure 6.7 Corrected month_name Function

char * month_name( const int month ) {

/*
| Return month name
*/

char names[10][12] = {
"January", ... "December" };
return new_string( names[ month - 1 ] );
}

Amnesia

In large or long-running programs that explicitly allocate


memory by calls to C’s malloc memory allocation function, you
may try to allocate more memory than is available. If malloc
can’t allocate the amount of memory you request, it returns
NULL. Always check for a NULL return value after calling
malloc. By doing this, you can avoid the problems that occur
when you use a NULL pointer. Most C programmers follow this
rule — most of the time. But that’s not good enough. Even if
you allocate only one byte as the very first statement in a trivial
main block (an operation you think can “never” fail), check what
malloc returns. The test takes 30 seconds to code and practically
no time to be executed, and you’ll never be unpleasantly
surprised when the “it couldn’t happen” does.

If a program does much explicit memory allocation, you also


need to guard against C’s form of amnesia — unexpected
memory loss or memory “leakage.” This quaint term refers to
the situation where memory you’ve allocated isn’t available for
reuse when you’re done with it.

Figure 6.8 shows how leakage can occur. The first malloc
operation allocates a memory block and stores a pointer to it in
ptr. After using this memory to hold a character string, the code
reuses ptr to point to memory containing a different string. This
code will be executed fine, but the memory originally allocated
to hold the first string will remain marked “in use,” even though
it can’t be referenced or deallocated after the second malloc
operation (assuming the pointer value isn’t copied to another
pointer variable).

Figure 6.8 How Memory Leakage Occurs

string_t val1 = "abc";


string_t val2 = "xyz";
string_t ptr;

ptr = (string_t) malloc( strlen( val1 ) +


1 );

strcpy( ptr, val1 );


.
.
.
ptr = (string_t) malloc( strlen( val2 ) +
1 );

strcpy( ptr, val2 );


.
.
.

A simple solution is the allocate macro in Figure 6.9, which uses


malloc when the pointer is NULL and realloc when it’s not. (In a
subsequent chapter, I’ll explain why allocate uses so many
parentheses. Simply put, the parentheses prevent unintended
changes in the generated code’s evaluation order.) Figure 6.10
shows how to reuse allocated memory with the allocate macro.
Note that allocate requires the pointer to be NULL or a value
returned by one of the memory allocation functions. Another
rule, always initialize pointers in their definitions, partially
satisfies this requirement. Although C initializes static pointer
variables to NULL, it doesn’t initialize automatic pointer
variables. Coding an explicit NULL initializer covers all cases
and emphasizes pointer declarations.

Figure 6.9 allocate Macro

/*
| ptr MUST be NULL or address of block
allocated
| by calloc, malloc, or realloc functions
|
| The value of allocate is either a valid
pointer
| of the specified ptr_type, or NULL if no
memory
| can be allocated.
*/

#define allocate( ptr, ptr_type,


alloc_size ) \
\
( ( ptr ) = ( ptr_type ) ( ( ( ptr ) ==
NULL ) ? \
malloc
( ( alloc_size ) ) : \
realloc( ( ptr ),
( alloc_size ) ) ) )

Figure 6.10 Avoiding Memory Leakage

string_t val1 = "abc";


string_t val2 = "xyz";
string_t ptr = NULL;

allocate( ptr, string_t, ( strlen( val1 )


+ 1 ) );

strcpy( ptr, val1 );


.
.
.
allocate( ptr, string_t, ( strlen( val2 )
+ 1 ) );

strcpy( ptr, val2 );


.
.
.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

By using typedef, local variables instead of dereferenced


parameters, array notation, and various macros and functions,
you can reduce the number of places where you must code *,
whether in declarations or as the dereferencing operator.
Because * isn’t a very intuitive symbol for “contents of” (in
most 3X/400 programmers’ experience, * stands for
multiplication, a multicharacter wildcard, or the start of a S/38 or
AS/400 special value), you can improve your C programs’
readability by minimizing its use. To further improve
readability, you may even want to define the three macros in
Figure 6.11 — PTR, contents_of, and address_of.

Figure 6.11 Pointer-Related Macros

#define PTR *
#define contents_of( x ) ( * ( x ) )
#define address_of( x ) ( & ( x ) )

Figure 6.12 shows a revised version of the fees function, using


PTR and contents_of. (Instead of using PTR, you could define
an int_ptr_t typedef for integer pointers.) Figure 6.13 shows how
you can use the address_of macro to emphasize what are being
passed as arguments to fees.
Figure 6.12 Improved fees Function Using PTR and contents_of
Macros

void function fees( int PTR rfee,


int PTR afee,
const int age,
const int income ) {

int reg_fee = contents_of( rfee );


int act_fee = contents_of( afee );

reg_fee += age >= 60 ? ( income <


50000 ? 0 : 50 ) :
( income < 50000 ? 100 :
200 );
act_fee += age >= 60 ? ( income <
50000 ? 10 : 20 ) :
( income < 50000 ? 30 :
40 );

/*
| Return values
*/

contents_of( rfee ) = reg_fee;


contents_of( afee ) = act_fee;

Figure 6.13 Calling fees Function Using address_of Macro

int reg_fee;
int act_fee;
int age;
int income;
.
.
.
reg_fee = 100;
act_fee = 50;
.
.
.
fees( address_of( reg_fee ), address_of
( act_fee ), age, income );

These macros don’t provide a complete safety net for your C


programs, but they can make your programs clearer and easier to
check for pointer-related mistakes. If you’re not sure you need
such macros, or if you’re worried your C code will look
“nonstandard” with the macros and other techniques I’ve
suggested in this book, stop by your local bookstore and peruse
the code listings in the latest C Gazette. The examples in this C
programmer’s magazine will give you a wide sampling of how
impenetrable and varied typical C code is. After reading some of
the published code, you’ll appreciate both the necessity and the
benefits of efforts to improve C readability.

One Blankety-Blank Trap After Another

It helps to know about C’s traps and pitfalls, and it never hurts to
know a few arcane C rules to amaze your programming friends
and baffle your boss. The next exercise serves both purposes.
Read the following code slowly and carefully. Then before
reading the solution, write down what you think the value of x is
at the end of this sequence of statements.

float x=4.0;
float y=2.0;
float *z;

z=$amp;x;
x=y++/*z;

Did you remember that the post-increment ++ adds 1 to y after


getting the value of y (2.0) to use in the expression? Did you
also notice that z points to the same memory location as x and
thus has the same value, 4.0? Did you come up with 2.0/4.0, or
0.5, as the result? Or did you find, deep in your C manual, that C
uses “greedy” lexical analysis (i.e., tries to make the next
identifier or operator in the input stream as long as possible) and
thus treats /* as the beginning of a comment, not as the division
operator followed by the dereferencing operator? The resulting
expression is the same as:

x=y++ /*z
... */

So the value of x is 2.0, and the compiler digests most of the rest
of the program as a comment. One absent blank (between / and
*) makes a world of difference. This may seem like a contrived
problem, but consider several seemingly simpler, and more
likely, assignment statements:

z = &y;
x = y + *z;
x = ++ *z;
x = *z ++;

The second statement adds the value of y (2.0) and the contents
of the location z points to (also 2.0) and puts the result in x. No
surprises here. The third statement increments the contents of the
location z points to (changing the value from 2.0 to 3.0) and
assigns the new value to x. No surprises here either. But the final
statement, which looks similar to the others, is actually quite
different. This statement assigns 3.0 (the new value of *z) to x
— so far so good — and then increments the address stored in z,
rather than the contents of the location z points to. Because
unary increment operators (++ and —) bind more tightly than
the dereferencing operator (*), you have to use parentheses to
apply post-increments to a dereferenced pointer:

x = (*z) ++;
Of course, if we rewrite the previous expressions using the
contents_of macro,

x = y++/contents_of(z);
x = y + contents_of(z);
x = ++ contents_of(z);
x = contents_of(z) ++;

the lack of blanks and C’s tricky operator precedence aren’t a


problem — another bonus of using readability macros.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Letting the Cat Out of the Bag

Although C has plenty of built-in traps, C’s design is probably


less to blame for pointer problems than C programmers’
fascination with clever coding. Consider the cat function in
Figure 6.14. This function places the concatenation of two null-
terminated strings (passed as parameters str1 and str2) in the
result string. I wrote cat using a typical C idiom. The “beauty” of
this style lies in how much work gets done in the two short while
expressions. Each execution of

while(*result++ = *str1++)

works like this:

1. Get the value of str1 (an address).


2. Increment the value of str1 (bump the address one byte).
3. Get the character stored at the address obtained in step 1 (before
incrementing the address).
4. Get the value of result (an address).
5. Increment the value of result (bump the address one byte).
6. Store the character obtained in step 3 at the address obtained in
step 4 (before incrementing the address).
7. Compare the binary value of the character stored in step 6. If it’s
zero, quit the while loop; otherwise, repeat steps 1 through 7 with the
incremented address values of str1 and result.

Figure 6.14 cat Function

void cat fees( char * result, const char *


str1, const char * str2 ) {
/* Concatenate str1 and str2 and return
in result */

while(*result++ = *str1++);
—result
while(*result++ = *str2++);
}

When the first while loop is completed, result contains the


address one beyond the terminating null of the target string. The
—result statement shifts this address back one, so the second
while loop repeats the process just described, starting at the byte
just after the last non-null character in the target.

If you’re an inexperienced C programmer, this code probably


looks odd to you. You may not even be sure it works reliably.
But if you aspire to be a competent C programmer, much of the
advice you’ll receive will try to help you master C’s
concentrated syntax so that you will know how this code works
and that it does do what I said. Unfortunately, too many C
programmers concentrate so much on coding details (such as
when the post-increment operator takes effect and the relative
precedence of the * and ++ operators) that they deliver tight
while loops but miss larger problems. Such misplaced attention
can lead to functions such as cat that are fast, but explosive.

Figure 6.15a shows a sample call to cat that prints “Hello


world!” — just what you’d expect. Figure 6.15b shows another
call to cat that freezes your PC and requires a reboot. Since the
second example passes string b as both the target string (result
parameter) and the second source string (str2 parameter), the
second while loop in cat chases b to infinity. On every iteration,
we advance the pointer one more byte toward the end of str2
(which is b), but by adding a byte to the tail of result (which is
also b), we make the end of str2 just one byte farther away.

Figure 6.15a A Successful Call of cat

char a[20] = "Hello ";


char b[20] = "world!";
char x[20];

cat(x, a, b);
printf("%s\n", x); /* Prints: Hello
world! */

Figure 6.15b An Unsuccessful Call of cat

char a[20] = "Hello ";


char b[20] = "world!";
char x[20];

cat(b, a, b); /* Infinite loop! */


printf("%s\n", b);

Similar problems can occur in RPG, COBOL, and any other


language that passes addresses for procedure arguments. But
because RPG and COBOL programmers don’t have to
concentrate so much on low-level coding details (or maybe
because RPG and COBOL programmers don’t so readily dare to
“boldly go where no one has gone before”), they don’t seem to
write as much self-destructing code as C programmers.

That brings me to my last suggestion in this chapter for avoiding


C’s pitfalls: avoid popular, but tricky, C idioms for business
application programming. Instead, use code that is readily
understood and easily checked for validity.
Previous Table of Contents Next
[an error occurred while processing this directive]
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Sidebar 1 — Pulling a “Fast” One

When you cut through all the ivory-tower software engineering


stuff, what you really get down to in any C programming
showdown is speed. “My loop is faster than your loop!” Who
cares if it freezes the system sometimes? So how are you going
to survive as a new C programmer if you let yourself be
hamstrung by such practices as using array subscripts instead of
pointers? Some C programmers would tell you that using array
subscripts is like showing up at The Bonneville Salt Flats in a go-
cart when you really should bered-lining a methanol-powered,
turbo-charged, pointer-driven speed demon. But before
accepting any widely held C rules of the road, you may want to
look at some actual data.

Figures 6.A, 6.B, and 6.C show three implementations of an


upper_case function that loops over a character string,
converting lowercase characters to upper case. Figure 6.A’s
version is about as lean and mean as you can get in C. Figure 6.
B’s function uses direct pointer manipulation, but in a less tricky
loop than A’s “speed-demon” version, and Figure 6.C’s version
uses array subscripting. All three versions produce identical
results — most of the time (see the twist at the end of this
sidebar). I compiled all three versions with Microsoft C 6.0a,
using the large memory model and maximum (/Ox)
optimization. For each version, I executed 10,000 calls to
upper_case on a string with 100 letters in it. I used the C time
function to bracket the test loop and ran all three versions
repeatedly on an IBM PS/2 55SX (16 MHz 80386SX). The
average times, measured in milliseconds per call, were:

Figure 6.A (speed-demon) 1.2

Figure 6.B (pointers) 1.3

Figure 6.C (subscripts) 1.4

Figure 6.A “Speed-Demon” upper_case Function

void upper_case( char *str ) {


while ( *str++ = (char) toupper
( *str ) );
}

Figure 6.B Pointer Implementation of upper_case Function

void upper_case( char *str ) {

while ( *str != '\0' ) {


*str = (char) toupper( *str );
++ str;
}
}

Figure 6.C Subscript Implementation of upper_case Function

void upper_case( char str[] ) {


int i = 0;
while ( str[ i ] != '\0' ) {
str[ i ] = (char) toupper( str[ i ] );
++ i;
}
}

Although the differences aren’t exactly dramatic, they’re no


doubt large enough to fuel a “true believer’s” insistence that the
code in Figure 6.A is best. If you find yourself on the losing end
of such a debate, here’s what you do. Bet your opponent you can
revise the upper_case function so it still doesn’t directly use
pointers but runs faster than the speed-demon version. Even
offer to spot your opponent a half-millisecond handicap, and bet
heavily. Then deliver the code in Figure 6.D. Using Microsoft’s
strupr library function, upper_case takes only 0.3 milliseconds
— a gain that dwarfs the best improvement possible by any hand-
coded C iteration over the string. As often happens, changing a
program’s approach yields greater performance improvements
than diddling with code. Armed with this strategy for writing C
programs, you’ll be able to pull off additional “fast ones.”

Figure 6.D Library Function Implementation of upper_case


Function

void upper_case( char *str ) {


strupr( str );
}

A Final Twist

When I ran the original benchmarks for this sidebar, I used the
Microsoft C /qc (quick compile) option. Subsequently, I
repeated the tests without the /qc option, and the code in Figure
6.A no longer worked. A call to Microsoft revealed another C
pitfall: There’s no standard order for evaluating the left and right
sides of an assignment expression. With the /qc option, MS-C
follows the intuitive approach and evaluates the right side (i.e.,
(char) toupper( *str )) before the left side (*str++). This
approach produces the expected results. But without /qc, MS-C
evaluates the left side first, incrementing the address in str before
it’s used in the right-hand expression. This causes each
invocation of upper_case to chop the leading, non-null character
from the string, eventually wiping out the string altogether. This
discovery inspired a new guideline for avoiding another C
pitfall: Don’t use ++ or — in assignments. This rule isn’t limited
to expressions involving pointers. The expression

x = (i) + (i++)

is also ambiguous because the compiler may evaluate the first i


before or after the post-increment of i occurs.

Sidebar 2 — C Coding Suggestions

• For non-array “output” or “input/output” parameters, use local


variables instead of dereferenced parameters in function calculations.
• Use array notation instead of pointers and dereferencing when
you’re working with arrays.
• When working with pointers in assignment statements, double-
check that you’re using the right level of indirection.
• Unless you’re positive a pointer has been initialized, check it for
NULL before using it.
• Use a new_string function to return new strings from functions.
• Always check for a NULL return value after calling malloc.
• Use the allocate macro to prevent memory “leakage.”
• Always initialize pointers in their definitions.
• Use typedef and PTR, contents_of, and address_of macros to
improve program readability.
• Avoid popular, but tricky, C idioms for business application
programming.
• Don’t use ++ or — in assignments.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 7:
Macros and Miscellaneous Pitfalls

A bad macro can drive a good programmer mad. Imagine the


frustration when an unsuspecting programmer codes:

x = 3;
y = cube( x + 1 );
z = 5 * double( x );

thinking that cube( x + 1 ) will produce the value 64 (43), only to


find that y is set to 10; and thinking that 5 * double( x ) will
produce the value 30, only to find that z is set to 18. The mystery
becomes clear when the programmer examines the cube and
double macro definitions and finds:

#define cube( x ) x*x*x


#define double( x ) x+x

Given these definitions, the compiler expands cube( x + 1) as:

x + 1*x + 1*x + 1

which, because in C the * operator binds more tightly than the +


operator, is equivalent to:

x + ( 1 * x ) + ( 1 * x ) + 1

When x is 3, the value of this expression is:

3 + ( 1 * 3 ) + ( 1 * 3 ) + 1

or 10.

Similarly, the compiler expands 5 * double( x ) as:

5 * x+x

which is equivalent to:

( 5 * x ) + x

When x is 3, this expression evaluates to 18.

Macros aren’t “magic”—the compiler simply replaces a macro


reference with expanded text according to the macro’s
definition. This simple text expansion requires care that the
context of a macro expansion doesn’t cause the resulting
expression to have an unexpected meaning. You can avoid many
problems with macros by following a simple rule: Put
parentheses (or other explicit delimiters) around the macro text
and around each macro argument within the macro text.

Following this rule, the cube and double macros can be defined
as:

#define cube( x ) ( ( x ) * ( x ) * ( x ) )
#define double( x ) ( ( x ) + ( x ) )

The two assignment statements above will then expand to:


y = ( ( x + 1 ) * ( x + 1 ) * ( x + 1 ) )
z = 5 * ( ( x ) + ( x ) )

which will do what the programmer originally expected.

You may remember that some macros I presented in earlier


chapters don’t have so many parentheses. For example, in
Chapter 4 I defined the cpystr macro as:

#define cpystr( target, source )\


strcpymax( target, source, target##_maxlen)

In this case, the expansion text is always delimited by the


strcpymax function name and closing parenthesis, so there’s no
need for parentheses around the entire text. The target and
source arguments are delimited by the commas that separate
function arguments. It still wouldn’t hurt to add additional
parentheses as a matter of good macro programming habits,
however.

Even with the protection of parentheses, a simple macro, such as


double, can cause unexpected results. The second statement
below is intended to increment x (to 4) and put double the new
value (8) in y.

x = 3;
y = double( ++x );

What it actually does is increase x to 5 and set y to 9. This


results from the expanded code:

y = ( ( ++x ) + ( ++x ) );

which evaluates the macro argument twice. In this case, the


argument ++x has the side effect of incrementing x, and the
expanded macro does this twice instead of once, as intended.
You can avoid such problems by following another rule: Never
pass an expression that has side effects as a macro argument.

This example also provides additional evidence that C’s ++ and


— operators, which seem so simple and “innocent,” are often the
culprits in causing unintended side effects. You may recall that
in Chapter 6 I showed how ++ and — can cause problems in
assignment statements. The unary increment and decrement
operators themselves are not really to blame; rather, it’s the
common C programming practice of embedding an increment or
decrement operation within a larger expression. C programmers
frequently code

next = ary[ ++i ];

instead of

++i;
next = ary[ i ];

Within simple array subscripts, using ++ or — is a safe and


generally comprehensible technique. You must be careful,
however, to use the correct pre- or post-increment alternative. In
contrast, with separate statements to increment the index and
reference the array, you can always use pre-increment (e.g., ++i)
because the statement order makes clear whether you are
incrementing the index before or after referencing the array. In
general, I recommend the use of separate statements for
incrementing and decrementing array indexes because the code
layout more strongly expresses the sequence of operations. This
is not typical C style, but then much of what’s considered
“standard” C style stems more from habit and fashion than good
programming practices.

Most problems I’ve seen in C programs stem from many C


programmers’ attitude that a simple ++i statement by itself is
somehow “wasteful” (of what, I’m not sure), and a way must be
found to embed all increment and decrement operations into
adjacent statements. It’s a pity for those C programmers who
don’t follow the general guideline: Place simple increment and
decrement operations in separate statements, because this
guideline frees you from concerns about when ++, and — side
effects can cause trouble and lets you use these otherwise nice
syntactic elements of C. (For systems programming, a careful
embedding of ++ or — may provide better performance in some
cases. But in business programming, any potential advantages of
such techniques are inconsequential and should not influence the
way you use the ++ and — operators.)

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Once is Enough

When you create your own macros, you should try to avoid
evaluating a macro argument more than once, if possible. This
practice reduces the problem of unintended side effects. For
example, an obvious improvement to the double macro
definition is:

#define double( x ) ( 2 * ( x ) )

Not all macros can be defined to avoid multiple references to


their arguments (consider the problem with a max( x, y ) macro).
If you want to avoid any chance of problems caused by multiple
evaluation of arguments, use a function rather than a macro.

Macros can contain almost any kind of source, including


complete statements. When defining a macro, be sure to consider
all the contexts in which the macro may be used. One difficult
area is when a macro includes conditional logic. Suppose you
have a macro to print messages only when a “trace” variable is
on:

#define ptrace( sts, str ) \


if ( sts ) printf( "%s\n", str )
A reference to ptrace might be:

if ( x < 0 ) ptrace( traceon, "Negative


input" );
else ptrace( traceon, "OK input" );

which, when expanded (and indented to show the logical


structure) is:

if ( x < 0 )
if ( traceon ) printf( "%s\n",
"Negative input" );
else
if ( traceon ) printf( "%s\n", "OK
input" );

This code will not print a message when x is non-negative,


regardless of the setting of traceon. This unintended result stems
from the “dangling else” pitfall I described in Chapter 3. You
can avoid the problem by always using braces for conditional
statements, as I recommended. The following statements
evaluate properly:

if ( x < 0 ) {
ptrace( traceon, "Negative input" );
}
else {
ptrace( traceon, "OK input" );
}

But when you’re creating macros, you shouldn’t assume that the
person using the macro will follow similar guidelines.
Correcting this problem isn’t a simple matter of adding braces to
the macro definition because you would then have to not place a
semicolon after ptrace(…) when you used the macro — an
unacceptable exception to normal C syntax. Instead, drawing on
a suggestion by Andrew Koenig, you can restructure the macro
as an expression instead of a statement:

#define ptrace( sts, str ) \


( (void) ( ( ! ( sts ) ) || printf( "%s
\n", str ) ) )

The “trick” to this macro is the C standard that logical


expressions are always evaluated using left-to-right, “short-
circuit” evaluation. Thus, ( ! ( sts ) ) is evaluated first, and if sts
is zero (false), the whole logical expression is true, and the
second part (the printf) is never evaluated. If sts is non-zero
(true), the printf is invoked as part of the expression evaluation.
The (void) provides a generic type cast so ptrace can be used in
expressions. When things get this complicated, however, it’s
probably a good time to switch to a function or use C’s
conditional compilation (#if…#endif) facilities.

Although you can encounter some “gotcha’s” using macros,


properly used they offer an essential means of insulating
yourself from many of C’s other danger zones. Don’t hesitate to
use macros, but don’t use them as a “lazy person’s” alternative
to typedef’s, enumerations and functions, when one of these
alternatives provides a better solution. Also, take care when you
define macros not to set traps for the unwary programmer (who
may be yourself) that uses your macros.

The “Impossible” Dream

Sometimes, it takes real character to program in C. For instance,


suppose you compiled and ran the following code:

unsigned char c;

c = '\xff';
if ( c != '\xff' ) print( "Impossible!
\n" );
would it seem impossible to print “Impossible!”? Not with some
C compilers. The C standard lets compiler writers decide
whether the default char type means signed char or unsigned
char. The default sign of the char type affects how char values
are converted in mixed-type expressions. If the default is signed,
the compiler will convert the character constant '\xff' to a signed
integer by extending the high-order bit. (Oddly enough, C
defines character constants as int type.) Thus, '\xff' would have a
16-bit integer value of 0xffff. To evaluate c != '\xff', the
compiler will convert the explicitly declared unsigned character
c to the integer value 0x00ff, thus making it unequal to the value
of the character constant '\xff'.

It might seem this problem could be fixed by casting the


character constant to an unsigned integer, as in

if ( c != (unsigned) '\xff' )

but this cast simply converts 0xffff to an unsigned, rather than


signed, int type. The immediate solution to this problem is to use
the following cast:

if ( c != (unsigned char) '\xff' )

The general rule is: Carefully cast any operation that involves a
char variable and any operand other than another char variable.

C attracts some odd “characters,” one of them being the manifest


constant EOF, which is not really a character — it's an integer
with a value of -1 — but which is returned by the getchar and
other C functions. If you try the following loop with a compiler
that uses unsigned char as the default for char variables:

char c;
while ( ( c = getchar() ) != EOF ) ...
you’ll wait a long time before the loop ends. Because the value
of c will always be treated as an unsigned integer, it will never
equal -1. With a compiler that uses signed char as the default for
char variables, the loop may end before the last character is read,
since a character with a value that converts to an integer value of
-1 may be read from the input stream.

Why did the C library designers name a function “get character,”


when the function actually returns an integer, and may cause
your program to fail if you actually store the return value in a
character variable? Maybe they were making a veiled suggestion
that mastering this kind of C inconsistency was a good way for
wimp programmers to “get some character.” In any case, don’t
let something as “meaningless” as a function name trip you up.
Always use int (not char) variables to store return values from
fgetc, getc, getchar, putc, putchar, and ungetcfunctions.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

All Functions Normal

Most C programmers have adopted the good programming


practice: Always declare a function prototype at the beginning of
any file in which you use the function.

This practice prevents accidentally treating a function’s return


value as int (the type the compiler assumes when no prototype is
declared) when it is some other type. It also lets the compiler
check that the proper type of arguments are specified when the
function is used. For standard C library functions this principle
implies the rule: Always include the header file for any standard
library function you use. Following this example for your own
functions, you should also define a header file with function
prototypes for every file that has global functions that may be
referenced in other files. Then you can include these user-
defined header files as a simple — and foolproof — way to
declare function prototypes for all shared functions.

Another practice that’s available with recent C compilers is to


declare formal parameters to functions as const, if they should
not be changed. C passes function arguments by value, so you
can never really change the variable that’s passed by the calling
program anyway. For example, in the following code, the value
of arg1 is not changed in main, even though the corresponding
parameter parm1 is changed in the function. A copy of arg1,
stored in a temporary location, is what’s changed by the
function.

main(...) {
int arg1;
f( arg1 );
...
}
void f( int parm1 ) {
parm1 = 10;
return;

So why bother with declaring function arguments as const, as in


the following example?

void f( const int parm1 );

The advantage of this type of declaration is that you’ll be warned


if you inadvertantly try to modify the argument, thinking the
new value will be reflected in the calling function. This type of
error is easily made by programmers used to languages, such as
Pascal and COBOL, that let you pass arguments by reference
and modify the value in the calling program by changing a
parameter. Declaring a parameter as const also makes clear the
parameter is meant as an “input-only” parameter.

Don’t be lured into avoiding a const parameter specification


because of the common C practice of using input-only
parameters as if they were local variables. Although C’s “pass-
by-value” handling of arguments allows this techinique, the only
potential advantages are saving a trivial amount of automatic
storage (for a local variable) and execution time (for automatic
storage allocation and an assignment).
What you give up is the added protection the compiler can
provide against improper use of the input-only parameter.

In Chapter 6, I suggested you use array notation, such as x[],


instead of pointer notation, such as *x, for clarity. This practice
has an added benefit with array parameters because a function
declaration like int strlen( const char str[] ); specifies that no
element of the array argument can be changed. And, since array
names are not names of pointer variables, no statement in the
function can attempt to modify str itself. With pointer notation,
you can also specify that no modifications be allowed via
indirect references that use a pointer parameter:

int strlen( const char * str );

But this doesn’t prevent inadvertant changes to the copy of the


pointer itself:

++ str;

Only the const keyword, used with array notation, specifies that
both the array address and it’s contents must be treated as read-
only within the function.

Do What I Mean, Not What I Say

Like a typical house cat, C programs sometimes seem to ignore


direct commands. As an example, the following code appears to
clearly say when it’s time to leave.

if ( x < o ) {
printf( "Invalid value.\n" );
exit;
}

But no matter how negative x is, this program continues. In C, a


function name without the argument list parentheses is simply
evaluated as the function’s address. It’s perfectly legal, yet the
function isn’t actually invoked. Be sure you’ve coded
parentheses after all function invocations.

“Gently Down the Stream…”

C stream I/O is the model of simplicity, yet it has some tricky


areas, too. A defensive programmer might code

c = getchar();
if ( errno != 0 ) {
/* handle error */
}

But this code may report false errors because most of C’s library
functions set the library-defined variable errno to a non-zero
value only when an error occurs.

Otherwise, they leave errno unchanged. Simply initializing errno


before the call isn’t an adequate solution, because a C library
function may set errno,even if no error exists! Thus, the only
safe approach to using errno is shown in the following example
of using fopen

errno = 0;
fileptr = fopen( ... );
if ( fileptr == NULL ) {
/* An error occurred in fopen()
| Now it's valid to examine errno
*/
if ( errno != 0 ) {
/* handle error */
}
}

The rule for using errno is: Set errno to 0 before a function call,
and use errno only after a function returns a value indicating the
function failed.

New Dimensions

Programmers moving to C from some other languages can be


tripped up when they use multidimensional arrays. In many
languages, a subscripted reference to a two-dimensional array
has a form like x[ i, j ]. C is different, and a reference like the
second statement below

int x[10][10];
y = x[ ++i, ++j ];

does not indicate that two subscripts are used in the reference to
array x. Instead, ++i, ++j is a comma-separated sequence of
expressions, and the expression value is the value of the last sub-
expression in the sequence (i.e., j, after j is incremented). C
doesn’t actually have true multi-dimensional arrays. Recall that
for the most part, C array notation is really a variation on pointer
notation, and that a[i] is equivalent to *(a+i). To get the effect of
a multi-dimensional array in C, you declare “arrays of arrays” (i.
e., two levels of pointer-based addressing). The notation a[i][j]
means *((*(a+i))+j). In the incorrect example above, the value of
x[ ++i, +j&43;# ] is the same as *(x+(++j)), which is an address,
not an integer. In C, always use one pair of [] for each level of
array subscripting.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Order in the Court

In Chapters 2 and 6, I pointed out some of the problems that


arise from C’s rules (or lack of rules) for operator precedence
and order of evaluation of expressions. I won’t go through all the
rules or unusual results that can occur, but observe that an
expression like

r = x * y * z;

may be evaluated as

tmp = x * y;
r = tmp * z;

or as

tmp = y * z;
r = x * tmp;

Note that even parentheses will not guarantee the ordering, and
even (x * y) * z may be evaluated as

tmp = y * z;
r = x * tmp;

In many cases, it may not matter what the order of evaluation is,
but if it does, you should use separate statements to specify
order-dependent operations.

The Name Game

As if C didn’t offer enough problems on its own, the C


programming culture sometimes seems to strive to create more
traps for the unwary. One example is the widely used
“Hungarian” naming convention, which uses partial
capitalization for identifiers. Because C is case-sensitive, a
variable hDlg is different than the variable hdlg. Woe to the
programmer who has identifiers that differ only in case. Not only
is there the obvious potential for elusive errors caused by typing
mistakes, but some link editors change all global symbols to
uppercase when linking multiple files, causing both hDlg and
hdlg to be treated as HDLG.

You won’t be able to avoid Hungarian notation when you work


with some vendor-supplied libraries, such as the Microsoft
Windows interface. But for your own code, especially global
variables: Avoid identifiers that differ only in the case (i.e.,
upper and lower) of some letters. I recommend the simple, less
error-prone, standard of using all lowercase identifiers, except
for manifest constants. You should also be careful with some
older link editors that may truncate global identifiers (the C
standard requires only that the first 6 characters of an external
identifier be used), causing the potential for additional collisions.

Although I’ve covered lots of C danger zones in the last 6


chapters, there are more waiting. Among the areas to watch
carefully are: casting pointers; using C signals (Koenig has an
enlightening — and alarming — discussion on using signals);
using floating-point variables to approximate decimal values
(such as currency); and portability problems, such as character
representations and byte ordering. The books listed in Appendix
A provide additional material on these topics. The principle that
underlies all these rules is: Tread carefully in C; stick to simple
and well-understood techiniques; and avoid “clever”
programming. The truly clever C programmer is also an
extremely cautious one.

C Coding Suggestions

* Put parentheses (or other explicit delimiters) around the macro text
and around each macro argument within the macro text.
* Never pass an expression that has side effects as a macro argument.
* Place simple increment and decrement operations in separate
statements.
* Avoid evaluating an argument more than once, if possible.
* When defining a macro, be sure to consider all the contexts in
which the macro may be used.
* Carefully cast any operation that involves a char variable and any
operand other than another char variable.
* Always use int (not char) variables to store return values from
fgetc, getc, getchar, putc, putchar, and ungetc functions.
* Always declare a function prototype at the beginning of any file in
which you use the function.
* Define a header file with function prototypes for every file that has
global functions that may be referenced in other files.
* Declare formal parameters to functions as const, if they should not
be changed.
* Be sure you’ve coded parentheses after all function invocations.
* Set errno to 0 before a function call, and use errno only after a
function returns a value indicating the function failed.
* In C, always use one pair of [] for each level of array subscripting.
* Use separate statements to specify order-dependent operations.
* Avoid identifiers that differ only in the case (i.e., upper and lower)
of some letters.
* Tread carefully in C; stick to simple and well-understood
techiniques; and avoid “clever” programming.
Previous Table of Contents Next
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 8
Working with C++

Deck: Avoid “overloading” on object-oriented programming

Why is C’s successor called “C plus-plus”? One rationale is that


C++ is a “better C” (the first “plus”) and adds object-oriented
programming (OOP) features (the second “plus”). But C++
brings with these beneficial additions some devilish problems.
Maybe they should have called it “C plus-plus-plus” or “C plus-
and-minus.” Whatever you call it, C++ requires careful attention
to reap its advantages and avoid its problems.

Starting on the Right Foot

Right off the bat, C++ simplifies comments and avoids the
danger of the “runaway comments” I described in Chapter 3. If
you use // for comments everywhere but in macro definitions,
you won’t have to worry about where the comment ends — it’s
always at the end of the same source line. And look how clean
comments appear:

strcpy( title, name ); // Build title


Unfortunately, some C++ compilers’ preprocessors may not strip
comments from macro definitions, so the following sequence
can create problems:

#define MAX_FILES 10 // Limit to open


files
...
if ( file_cnt < MAX_FILES ) {
...
}

The problem arises if the preprocessor stores the replacement


text for MAX_FILES as “10 // Limit to open files”. The
expanded if statement then becomes

if (file_cnt < 10 // Limit to open files )


{
...
}

which won’t compile.

Your Constant Companion

Another one of C++’s “pluses” helps here. In most cases, you


can—and should—use const variables instead of macros to
define mnemonics for constant values. In the example above, if
you use

const int MAX_FILES = 10; // Limit to open


files

instead of the #define, the comment poses no problem. Another


advantage of const variables over macros is that the compiler
parses the variable name and places it in the program’s symbol
table, which allows the variable to be type-checked when
referenced and to be used by cross-reference and debugger tools.
Also, like other variables, const variables can have restricted
visibility, thus avoiding name clashes between different sections
of code.

One place where you have to use a C++ “trick” instead of const
is when you want to declare a constant within the scope of a
class. The following syntax is illegal because you can’t assign an
initial value to a static class member:

class file_list {
static const int MAX_FILES = 10; //
Illegal!
char * file_name[ MAX_FILES ];
};

A workaround can be used for integer constants by defining an


enumeration containing the symbol and its value:

class file_list {
enum { MAX_FILES = 10 }; // Legal
char * file_name[ MAX_FILES ];
};

As the above example shows, C++ has some helpful refinements


over C, but maintains the C tradition of complex usage rules. If
you aren’t convinced that C++ adds complexity, as well as
capability, consider the rest of the story on static class members.
Because a C++ class is a type, not a data object, and only one
copy of a static class member exists (unlike non-static class
members, which have one instance per object of the class), you
have to define and initialize static members outside the class
definition.

class classX {
static int objX_cnt; // Can't
initialize here!
};
...
int classX::objX_cnt = 0; // Initialize
here.

And you can’t use static in the objX_cnt variable definition


because that would conflict with the use of static for global (not
member) objects. If these rules seem burdensome, prepare
yourself for the full force of C++, because this is just the
beginning.

The Calm Before the Storm

Before examining the less pleasant side of C++, let’s consider


some of the other advantages it offers over C. You can reduce
the use of function-like macros, and thus avoid many of the
pitfalls I described in Chapter 7, by using in-line functions and
templates instead.

Functions defined inside a class or with the in-line specifier can


be compiled into in-line code, rather than a normal function call
(the decision is left to the compiler). This technique lets you use
in-line functions instead of equivalent macros — avoiding macro
pitfalls but keeping their performance. Because you can control
whether a member function that’s defined outside its class
declaration is in-line or not, it’s good practice to define all
member functions outside their respective class declaration.
Following this rule also keeps your class definitions more
compact and readable. For example, use

class classX {
inline int f( void );
};
inline int classX::f( void ) {
...
}
rather than

class classX {
int f( void ) {
...
}
};

In general, use in-line functions sparingly. The performance


savings from eliminating a function call can easily be lost as
code size expands.

C++ templates are another one of its true bright spots. They
provide a way to implement a generic piece of code that can
work on different data types. This eliminates many of the places
in C where a complex macro would be used instead of a
function, so that the code can work with more than one type.
Templates are also much simpler to use than some of the
advanced C++ techniques for writing functions that can handle
multiple types.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

New and Improved

Macros have always been one of C’s most flexible tools, and I
pointed out in previous chapters how useful they are for paving
over some of C’s rough spots. The power of macros — and their
pitfalls — motivated some of the best new features of C++.
Where a better C++ alternative exists, use it instead of a macro.
On the other hand, you’ll still find some important uses of
macros in C++; for example, using PTR and contents_of macros
instead of *, as described in Chapter 6. You can benefit from
extending these C macros to cover new C++ features, such as
references. The following code uses a simple REF macro to
produce easily read code:

#define REF &


...
int REF benefit_age = spouse_age;

This sure beats the non-intuitive

int& benefit_age = spouse_age;

style of coding you’ll find in most C++ books.


C++ offers some other clear improvements over C.Use the C++
new and delete operators instead of the malloc() and free()
functions because new allocates memory based on the type of its
argument, rather than on an explicit number of bytes. You can
also create your own function for new operations on objects of
user-defined classes.

Merrily Down the Streams

The new C++ “streams” I/O package provides a safer way to do


I/O because the compiler will automatically generate a valid
format based on the type of data being read or written. In most
cases, using streams is also simpler than calling the standard C I/
O functions. For both safety and convenience, where possible,
use stream I/O instead of the standard C library routines. The
following example shows how to write a label and value to the
standard output:

cout << "x + y = " << x + y << '\n';

If x is 2 and y is 5, the output is

x + y = 7

Unfortunately, the C++ designers apparently couldn’t stand the


thought of introducing a useful new feature without building in
at least one trap door. The following statement may look as
innocent as the former, but it’s not:

cout << "x & y = " << x & y << '\n';

Because & has lower precedence than <, this expression is


equivalent to

(cout << "x & y = " << x) & (y << '\n');


The right way to code this statement is

cout << "x & y = " << ( x & y ) << '\n';

This reinforces a C suggestion from Chapter 2 that also applies


to C++: In a complex expression, use parentheses to explicitly
define how the expression is evaluated. Be thankful for one
thing, however. The C++ designers originally considered using <
and > as the put and get operators — as if there weren’t already
enough confusion caused by the = and == operators!

Non-Plused

Now let’s turn to the darker side of C++. Suppose, in the spirit
of OOP, you decide to “advance” from standard C programming
techniques to C++ techniques. One of the first ways you might
try this new approach is by putting an object “wrapper” around
calls to system functions. The example I use (based on an
example in C++ Programming Style, by Tom Cargill) assumes
there are system functions to iterate over a list of output files in a
specified output queue. This example uses an OUTQ type that is
simply a system “handle” for an output queue control block
allocated by the system when open_outq is called. The example
also uses the following OUTQ_ENTRY structure, which
describes the layout of a static area filled by a call to the system-
supplied next_outq_entry function:

struct OUTQ_ENTRY {
char ofile_name[NAMESIZE];
// ... Rest of OUTQ_ENTRY fields
};

Figure 8.1 shows a fragment of the OutQ class definition and the
nextname member function. The following code shows how you
might declare two OutQ objects and then attempt to print the
first entry from both, side-by-side:

OutQ q1("PRINTER1");
OutQ q2("PRINTER2");
printf("%s\t%s\n", q1.nextname(),q2.
nextname());

If the first entry in q1 was “LISTINGA,” and the first entry in q2


was “LISTINGB,” you might expect to print

LISTINGA LISTINGB

But what actually would print is

LISTINGA LISTINGA

The problem arises because the system’s next_outq_entry


function returns a pointer to a static area that is overwritten by
the most recent call. Another way to look at it is that, even
though this example uses C++’s class facility to declare separate
q1 and q2 objects, these objects implicitly share common storage
via the next_outq_entry function.

Figure 8.2 addresses this problem by providing an OutQ member


to hold the most recent output file name for each output queue.
Although the code in the previous example will now work, the
following code still fails to do what you might expect:

OutQ q1("PRINTER1");
printf("%s\t%s\n", q1.nextname(),q1.
nextname());

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Instead of printing the first two output files in q1, this code
prints the second output file twice. Although the HoldName
member avoids memory conflicts between separate objects, it
does not avoid memory conflicts between multiple invocations
of the same object’s state-changing member functions. The
solution to this problem is much more complex, requiring either
some form of dynamic memory management by the OutQ class,
or special coding techniques for multiple references to the same
OutQ object. As this example illustrates, object-oriented
programming in C++ is no simple panacea to all your
programming ills. It’s often the case that half-complete class
definitions introduce sneaky traps for the unwary. And creating
robust classes for what appear at first to be simple objects is
often much more difficult than you first imagine.

This example suggests a guideline that applies to both the


implementation and use of C++ objects: Understand the lifetime
of object data.

OOP, Not Oooops!

It might seem that the problems we’ve just examined arose


because we tried to apply OOP techniques to a non-OOP system
interface. Would we avoid problems by starting from scratch?
Well, consider a textbook example of OOP—defining a “string”
object type. Figure 8.3 shows part of a String class definition. A
String object contains the string data in memory allocated by
new, and the current string length in an integer member. The
code shows one of the constructors, which allocates memory and
copies a normal C null-terminated string to the memory pointed
to by the object’s strdata member. The example also shows an
assignment operation, which overloads the = operator. Both the
constructor and assignment member functions use the
MakeString utility function to allocate memory and copy string
data.

One of the intended advantages of this type of string class is to


allow code such as the following:

String a("abc");
String b("");
...
b=a;

This code is more straightforward than using C’s standard string


functions, and the String class takes care of managing the
necessary memory to hold a string’s data, regardless of its size.
This “automated” memory management is done in the operator=
function by releasing memory assigned to the target of the
assignment (e.g., b, above), then allocating enough memory to
hold the result of the string expression on the right-hand side of
the assignment (e.g., a, above).

But what happens in the following code, where b is defined as a


reference to a?

String a("abc");
String & b = a; // a and b refer to
// the same string
b=a;

Because both a and b refer to the same object, the release of b’s
memory actually releases a’s memory as well—before the copy
takes place—and the assignment fails. Note that a similar
problem could occur in a plain C (or other language) function
that didn’t guard against modifying an output argument that
might also be an input argument. But common goals of creating
C++ classes are to simplify assignment and expressions for new
object types and to “hide” memory management. Thus, you’ll
encounter more occasions where you have to watch out for
unexpected ways that member functions, including overloaded
operators, can get you in trouble.

In this simplified example, the solution is to avoid releasing the


target string’s memory, if it points to the same location as the
source string. The revised member function is shown in Figure
8.4. To avoid problems like this, remember to implement
assignment functions so that the same object can appear on both
sides of the = operator. This rule can be quite challenging for
classes that overload other operators in addition to =. Which
leads to another way to minimize trouble: avoid overloading
operators other than =.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Weighing the Pluses and Minuses

The previous examples are by no means the only way you can
get yourself in trouble with C++’s OOP facilities. In fact, I chose
some of the simpler cases to fit within the scope of this book. In
Further Reading, I describe several excellent books that delve
into the topic more deeply. The main point I want to emphasize
is that C++ objects can either simplify or complicate
programming, depending on how thoughtfully their classes are
implemented. C++ objects are supposed to make working with
complex program structures as straightforward as working with
C’s built-in types. Keeping that in mind, I recommend that you
avoid using C++ classes unless they’ve been implemented in a
way that makes them safe and simple to use. A program is
weakened, not strengthened, by using classes that produce
surprising results under some conditions.

The other lesson to be learned from examining the complexities


of C++ is that great care and lots of experience is necessary
before you can implement complex classes that are safe and
simple to use. As a result, it’s better to enter this realm of C++
programming slowly and cautiously. And, even after developing
some mastery over OOP, it’s a good idea to resist the seduction
of C++ features, like operator overloading, that look clever, but
increase—rather than reduce—program complexity. following
these guidelines can help the pluses of C++ outnumber the
minuses.

C Coding Suggestions

* Use // for comments everywhere but in macro definitions.


* Use const variables or enumerations, instead of macros, to define
mnemonics for constant values.
* Use in-line functions and templates instead of function-like macros.
* Define all member functions outside their respective class
declaration.
* Use in-line functions sparingly.
* Use REF and other macros to define & and other C++ symbols.
* Use the C++ new and delete operators instead of the malloc() and
free() functions.
* Where possible, use C++ stream I/O instead of the standard C
library routines.
* In a complex expression, use parentheses to explicitly define how
the expression is evaluated.
* Understand the lifetime of object data.
* Implement assignment functions so that the same object can appear
on both sides of the = operator.
* Avoid overloading operators other than =.

Figure 1. Partial definition of an OutQ class

class OutQ {
OUTQ * hOutQ; // Handle for output queue
public:
OutQ(char * outqname) {// Constructor
hOutQ = open_outq(outqname);
}
const char * nextname() {
OUTQ_ENTRY * tmp_qentry = next_outq_entry
(hOutQ);
return (tmp_qentry ? tmp_qentry-
>ofile_name : NULL);
}
};

Figure 2. Revised OutQ class

class OutQ {
OUTQ * hOutQ; // Handle for output queue
char HoldName[NAMESIZE];
public:
OutQ(char * outqname) { // Constructor
hOutQ = open_outq(outqname);
}
const char * nextname() {
OUTQ_ENTRY * tmp_qentry = next_outq_entry
(hOutQ);
if (! tmp_qentry ) return NULL;
strcpy(HoldName, tmp_qentry->ofile_name);
return HoldName;
}
};

Figure 3. Partial definition of a String class

class String {
char * strdata;
int strlength;
public:
String(char * cs); // Constructor using
standard
String& operator=(const String& ss); //
Assignment
};
c
har * MakeString(const char * cs) {
char * tmpstr = new char[strlen(cs) + 1];
strcpy(tmpstr, cs);
return tmpstr;
}
String::String(char * cs) {
strdata = cs ? MakeString(cs) : NULL;
strlength = cs ? strlen(cs) : 0;
}
String& String::operator=(const String&
ss) {
delete [] strdata;
strdata = ss.strdata ? MakeString(ss.
strdata) : NULL;
strlength = ss.strdata ? ss.
strlength : 0;
return * this;
}

Figure 4. Revised operator= member function

String& String::operator=(const String&


ss) {
if (this != &ss) { // Source is not same
as target
delete [] strdata;
strdata = ss.strdata ? MakeString(ss.
strdata) : NULL;
strlength = ss.strdata ? ss.
strlength : 0;
}
return * this;
}

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Chapter 9
Managing C and C++ Development

DECK: Properly managing C/C++ development requires


discipline

For a programmer working on a single program, a project leader


coordinating several people building a complex application, or
an MIS director responsible for a large staff and many projects,
the key to managing C and C++ development is simple —
discipline. Without discipline, a programmer easily loses control
of his or her code’s reliability and maintainability. At higher
levels, a lack of discipline reduces productivity and increases
maintenance costs. Although the need for discipline applies to
programming in all languages, using C and C++ amplifies its
importance.

As this book demonstrates, C has scores of traps that can ensnare


an unwary programmer. Without invoking discipline to stay
clear of C’s traps, sooner or later — usually sooner — a
programmer will get caught. And because C provides great
freedom in combining low-level operations, there often are
dozens of techniques for implementing a particular application
function. If, through discipline, a consistent standard isn’t
applied across programs and projects, maintenance becomes
difficult and dangerous because a programmer can’t depend on
his or her understanding of standard techniques. The potential
for slip-ups is high when each section of code has to be learned
from scratch.

Using C++ does not lessen the need for discipline. C++ retains
almost all of C’s low-level operations and adds several
additional layers of language features. Although classes,
inheritance, templates, and other C++ language features allow a
programmer to work at a higher level of abstraction, these
features can also increase the complexity of writing programs, if
they’re not used in a well-thought-out and highly consistent
manner.

Discipline Has Its Rewards

Discipline also is the key to achieving gains in productivity and


quality from code reuse — the major motivation behind the
creation of C++. No matter how powerful C++ facilities are for
object-oriented programming, a slapdash approach to creating
classes won’t result in substantial code reuse. Because it’s too
expensive and risky to reuse poorly organized, poorly
documented, or unreliable classes, programmers using C++ will
continue to write mostly from scratch, unless a class library
makes their jobs easier, rather than harder.

By now it may sound like the only way to work with C or C++ is
to keep a cane switch handy to whip programmers into
submission. That’s not the case. The “discipline” I’m speaking
of is more akin to the discipline good athletes demonstrate in the
preparation and execution of their sport. Like successful athletes,
top-flight programmers, project leaders, and managers have
learned that although discipline takes effort, it brings satisfaction
as well. Discipline can free developers from the crush of
problems caused by little, unavoidable mistakes and can make
possible the use of more powerful software development
techniques.

In programming, discipline applies to creating and following


programming standards and a well-defined development process.
Many standards and process steps should be formal, written
ones, but a programmer also must be disciplined enough to
follow unwritten principles of good programming practice. For
example, no simple written standard can say, for all
circumstances, when to reuse an existing class and when to
create a new one. In the absence of any written standard, an
undisciplined programmer might choose whichever is easiest at
the moment; a disciplined programmer will examine the
alternatives and carefully consider the balance between short-
and long-term costs and benefits. In general, discipline involves
thinking about more than just a single program, and thinking
beyond just today’s problems.

How Big Is the World?

As a C programmer, project leader, or manager, you can’t


control the whole world of C programming. So what’s a
reasonable domain to concentrate on? Single programs, and even
projects, are too narrow in scope. If a programmer has to learn
new styles, techniques, and tools when he or she moves among
programs or projects, the benefits of knowledge and code reuse
are small. On the other hand, for most important C and C++
programming issues, there are no industry-wide standards. The
various standards for lexical style (e.g., “K&R”) are only a
minor aspect of C/C++ programming standards and, in many
cases, are simply not very good.

The target to shoot for is company- or site-wide standards. Based


on the size and geographical distribution of the programming
staff, try to cover a domain that meets two criteria:

* A person covered by the standards has a reasonable opportunity to


participate in the development and revision of the standards.
* A person who continues to work for the organization is likely to
remain within the domain covered by the standards (i.e., in most
cases, if they change assignments, they’ll still follow the same
standards).

As you develop standards, keep in mind that nobody can include


in written standards a rule to cover every situation a programmer
will encounter. There’s no way to turn C or C++ programming
into a mechanical task. Standards should record a sensible set of
guidelines so programmers can use consistent styles and
techniques — standards aren’t laws.

Here’s an example that involves indenting. On PCs, it’s


convenient to set eight-character tab-stop intervals (i.e., 9, 17,
25, etc.) because the MS-DOS TYPE and PRINT commands,
and many tools, use this default. Consequently, adopting the
same interval for indenting in C programs is a reasonable
standard. In some cases, however, a section of a program may
have enough levels of nesting that statements begin very far to
the right when displayed by an editor or when printed. As a
result, statements may either need to be “chopped” up and
continued across multiple lines, or they will be inconvenient to
display or print.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

The best solution to this occasional problem is neither chopping


up lines nor changing the overall indenting standard to a smaller
interval. Instead, a programmer can simply use a different indent
interval, say four characters, for just the deeply nested section of
code. This may involve a little more editing effort for those
statements indented to a level not an even multiple of eight
characters (although with good editors, this is quite easy to do),
but the code remains readable when displayed or printed and the
convenience of the eight-character interval (in the MS-DOS
environment) is retained.

This example illustrates how “master” programmers approach


standards. They focus on the underlying principles, of which
consistency is a very important one. But consistency should
apply to the principles, not just simplistic rules. Here the
principles underlying the choice of an eight-character
indentation interval were to produce a readable layout and to
allow convenient manipulation (with the editor and other tools).
There’s nothing sacred about an eight-character interval,
however, and there may be circumstances where using some
other size better serves the underlying principles.

Getting Started With Standards


Some of the areas that might be covered by C programming
standards include:

* Lexical style (e.g., naming conventions, indentation, placement of


{}, comment style, etc.)
* Standard macros that extend or alter the “base” language (e.g., use
of an EQ macro for ==)
* Program and function documentation, including the content and
layout
* Structure of header files
* Main program organization
* Error signaling and handling
* Message format and specific standard messages

Facing a list of standards like this may be discouraging. How


can you ever expect to cover all the areas, and how will you deal
with all the existing code that doesn’t follow the standards? The
answer: one step at a time. If you haven’t already developed a
set of standards, start by creating a document, “C Programming
Standards,” with sections following the list above. Preferably,
use a word processing package that lets you mark index entries
and automatically produce a document index. The index will be
an invaluable part of the printed standards as they grow.

Then start adding guidelines and examples. The following shows


a sample entry:

Follow each ( and [ with a space, and precede each ) and ] with a
space.

Example: x = myfunc( item_table[ i ] );

One of the major barriers to instituting programming standards is


disagreement over what the “best” standard is. A group of
programmers might agree on some broad principle, such as
indenting conditional code to show its structure, but on specific
guidelines such as the size of each level of indentation there
often are many defensible points of view. To avoid “analysis
paralysis,” discuss the merits of each alternative and select one
guideline as “good enough” to follow “most of the time.” This
may sound like a wishy-washy way to embark on standards, but
it’ll work fine if the programmers who develop and use the
standards understand that, in most cases, following consistent
guidelines across projects is more important than determining
the absolute “best” rule. It’s also important that programmers
faithfully follow the standards, and that they don’t view “most of
the time” as an escape clause that lets them go their own way
whenever they have a personal preference that’s not consistent
with the standards.

Sometimes, differing opinions about alternative guidelines may


run so deep that even a “good enough” standard is hard to agree
on. Consider a suggestion I made in Chapter 2, that macros EQ,
LE, etc., be used instead of ==, <=, and the other relational
symbols. The principle underlying this suggestion is to avoid the
slippery errors caused by mistakenly using = (assignment)
instead of == (equality). Unfortunately, this guideline conflicts
with the principle that you should express program constructs in
ways that are consistent with ways you express similar
constructs in other contexts. The < and > symbols are widely
used in other programming languages and general math
discourse. The <=, and >= symbols are also widely used in other
programming languages and are very similar to the and math
symbols. So using LT, GT, LE, and GE instead of these symbols
could be considered a step backward. Note that it’s this same
principle that C’s use of == violates, since = is the commonly
used symbol for equality in other programming languages and in
math. Here we have a conflict that originates in a C flaw, and
whose solution may introduce a different, if lesser, problem.

Reasonable arguments might be marshaled for either of three


guidelines:

1. Use all of the standard C symbols


2. Use EQ instead of == (and possibly NE instead of <>), but use the
standard C symbols for the rest of the operators
3. Use EQ, LE, etc., and don’t use any of the standard C symbols

The first approach is risky and would only make sense if it were
combined with some other guidelines, such as always checking a
program with a compiler or “lint” utility that could detect the use
of = where == was expected.

The second approach minimizes the amount of non-typical C


code (it’s safe to say most C programmers today use the standard
C symbols) and eliminates the dangerous ==, but it introduces an
inconsistency among the various relational operators. The third
approach is easy to follow and is consistent, but it is not typical
for either C programs or common math expressions.

If possible, one of the latter two guidelines should be adopted as


“good enough.” And, for this rule, followed all of the time
because there are no potential conflicts with other lexical rules.
If passions run deep, however, you might have to settle for a
guideline that says: In any program use either Rule 2 or Rule 3
throughout the whole program. This lets a programmer use a
reasonable alternative that he or she is comfortable with and
maintains consistency at least at the program level. After a year
or so of experience, you may be able to determine a strong
preference for one or the other guidelines and be able to settle on
a single “good enough” standard.

Permitting alternative coding styles costs a significant amount in


consistency and reduces a programmer’s ability to understand
immediately the notation of a new program. Consequently, it’s a
tactic to use sparingly. But for some guidelines, it’s the best way
to move forward and at least reduce the number of variations in
coding style.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

The Evolution of Standards

Standards shouldn’t be frozen in time. A “standard” that changes


weekly isn’t much use, but over time good standards evolve as
you discover more effective approaches. Some newly discovered
approaches may be improvements in lexical style, but the main
improvements will be in the reuse of macro, function, and class
libraries. As these libraries grow and improve, old methods may
be replaced by better ones and the standards should be modified
to reflect the changes.

Evolution of the standards is easier than evolution of existing


code. This is the place for good editors and translation tools.
Most programming editors today provide both a search-and-
replace feature that supports advanced “regular expression”
pattern matching, as well as an “extension” (or “macro”)
language for writing editing functions or scripts. Together these
facilities allow you to make file-wide revisions more easily. For
quick standalone translations, a language like Awk is an easy-to-
use, powerful tool. Awk provides regular expression pattern
matching, dynamic table handling and simplified I/O, and it uses
a syntax similar to C, so it’s an ideal “workbench” tool for C
programming.
Changing code to satisfy evolving standards is only one reason
to use the best programming editor you can find for C
programming. Among the capabilities you can expect from a
good editor: multiple files and windows, mouse-driven text
selection and file positioning, auto-indenting of C source code,
block indent and unindent, finding matching delimiters, one- and
two-keystroke “skeleton” code expansion, compiler execution
from within the editor, pop-up display of compiler errors, and
point-and-shoot navigation to the source of errors.

With a good editor, it’s so easy to manipulate text that it’s not a
burden to pay careful attention to programming standards,
especially ones that deal with lexical style. By contrast, “dumb
terminal” and line-oriented editors, or editors without mouse and
window support or an extension language, are poor choices to
encourage programmers to “polish” code.

No Train, No Gain

C and C++ aren’t for amateurs. They are powerful and can be
dangerous. Programmers should have a solid background in
programming methods and some experience writing “throw-
away” programs in C or C++ before attempting their first
mission-critical applications. Don’t learn C by writing a
production program!

As I mentioned in Chapter 1, a common question is whether to


learn C or C++ first. I suggested the best answer may be
“neither” — learn Pascal or another “cleaner” language first.
This advice is especially important for programmers who have
only worked in COBOL, RPG, or other older business
languages. C introduces many new concepts, such as data types,
scoped variables, static and automatic storage, functions, and
dynamic data structures. Unfortunately, in C these concepts are
intermixed with machine-level constructs so they are harder to
learn and more error-prone than with some other languages.
Learning the concepts first in a language like Pascal provides a
much better foundation with which to tackle C. In addition, it
sets the right frame of mind for avoiding some of the common C
idioms that are poor programming practices.

Another language that provides a more rational introduction to C


is Awk. Awk has a C-like syntax (it was derived from C), but
eliminates some troublesome C elements — most importantly,
pointers — and adds easy-to-use, powerful, built-in facilities for
string and table handling. One nice aspect of learning Awk, in
addition to making the transition to C, is that even when C is
used for production programs, Awk provides an excellent
“workbench” tool.

Similar advice applies to learning C++ object-oriented


programming (OOP) facilities. Learning a language like
Smalltalk or Actor, which was designed from the start to support
OOP, provides a better introduction than simultaneously trying
to learn OOP concepts and C++’s often arcane implementation.

To begin C programming, it’s probably advisable to start with a


C++ compiler and treat it as “a better C.” C++ is rapidly
replacing C, and it makes sense to learn the newer C++ facilities
(e.g., new instead of malloc(), and the Streams class library
instead of the standard C I/O library). On the other hand, a
programmer new to C should write several production programs
before trying to write any non-trivial C++ classes.

Before a programmer can write complex classes well, he or she


needs a deep understanding of the issues that arise when classes
are used to define new variables or as super-classes from which
member data and functions are inherited. And it requires mastery
of both OOP concepts and C++ details before a programmer can
write classes that will be shared by other programmers and
projects. Certainly any programmer wanting to write C++
classes should have read and fully comprehended the principles
in C++ Programming Style by Tom Cargill.

The Right Tool For the Job

A variety of other tools in addition to good editors can make a


big difference in C and C++ programming. Some of these tools
help with “programming in the small” (working on a single
program); others help with “programming in the large” (working
on systems that comprise many programs).

One set of tools documents or analyzes C or C++ source code. A


variety of “pretty printers” can format code and produce cross-
reference listings of variables, functions, and classes; flowcharts;
and module structure charts. These are especially useful tools for
dealing with legacy code that may be poorly laid out and
documented. Unfortunately, most formatters do not provide
much flexibility in specifying the output’s layout. For new code,
you’re better off using a good editor that lets you apply human
judgment as you work with the code, so you get easy-to-read
displays and listings without needing a separate formatting tool.

Code analyzers range from fairly simple “lint” utilities to tools


that check code compliance against standards and calculate
various measures of code “complexity.” Lint utilities examine C
source for constructs that may be acceptable to the compiler but
that are likely to be unintentional and incorrect. For example,
most lint utilities will report the use of = within conditional
expressions.

As an example of a standards-checking code analyzer, the


CodeCheck product from Abraxus Software will check C source
code for both lexical style and specific usage rules. The rules are
specified external to the analyzer program and can be modified
or extended to meet specific organizational standards.

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Debugging Is a Waste of Time

The C and C++ world is awash with slick debuggers. But before
you place a debugger high on your list of C/C++ tools, consider
this fact: Every minute spent using the debugger is a waste of
time — nothing is being produced. The place to control errors in
C programs is during production, not by debugging.

This is not to say you shouldn’t have a good C debugger — most


compiler products come with one anyway. But don’t plan to use
the debugger as a tool to improve your C productivity. That’s a
self-defeating strategy.

Order Out of Chaos

With C and C++ development, some form of source and binary


code version management is essential. Both languages
encourage building applications by combining many small
components, either source “header” files, independently
compiled modules, or executable libraries. Without some means
of control, C development can rapidly get out of hand.

The broad area of software configuration management (SCM)


covers a variety of tools and practices for tracking changes to
software, especially version management. The most important
practice is rigorous “check-out/check-in” of source code that’s
being modified. A number of tools (e.g., the Unix Source
Version Control System — SVCS, and Intersolv’s PVCS
Version Manager product) let you place all source code in a
shared “archive” and control the check-out of source files. When
a source file is checked out to be modified, a “lock” is recorded
in the archive so that no one else can simultaneously modify the
same file. (Most tools provide a facility for “branching” into two
parallel revision tracks, if necessary.) Subsequently, a revised
version of the source file can be checked in, possibly after
undergoing additional testing or other quality assurance review.

Controlling source is only part of what’s required when code is


shared among people and projects. A facility to control versions
of binary code (object or executable) is important so that
programmers have a stable base from which to reuse shared
code. With proper tools and procedures, it’s relatively
straightforward to revise the implementation of shared binary
code, as long as the interfaces don’t change or are upwardly
compatible. Coordinating incompatible changes to interfaces is
much more difficult, however.

When an implementation changes but the interface remains the


same, the most that’s usually required is to recompile programs
that depend on the revised code. Automated “make” utilities can
simplify this process. With a typical make utility, object
dependencies are stored in a file. When the make utility reads a
file of dependencies, it checks the time stamp of each object and
re-creates those objects that have an earlier time stamp than any
object upon which they are dependent. “Make-make” utilities
automate the rebuild process further by scanning C source code
to derive the dependencies and then building the dependency file
used by the make utility.
When an interface changes, modules using the interface often
have to be revised, too. “Impact analysis” tools (a fancy word for
a type of cross-reference tool) help by listing all modules that
use a particular interface. It’s then up to the programmer to make
the necessary revisions. The final step is often the most difficult:
synchronizing the introduction of a new interface with the
implementation of revised older programs that use the new
interface. Make utilities can help simplify the production of new
versions, but scheduling tests and upgrades of production
systems remains largely a manual management task.

SCM tools are generally available on most platforms that have C


and C++ compilers. Often, these tools are packaged with the
compiler products themselves. These tools shouldn’t be viewed
as optional C programming “goodies”; they’re essential for any
production C/C++ development. The time to acquire and start
using these tools is the day you get the C or C++ compiler, if not
sooner. Don’t wait until you have hundreds of source files before
you start managing them.

Reuse It Or Lose It

Although there’s no “silver bullet” for software development,


code reuse is about as close as you can come to a technique that
dependably reduces development time and increases software
reliability. The reason is simple — you will save time, if you can
supply a component of an application by reusing an existing one
rather than building a new one from scratch. And if the reused
component has proven its reliability in previous use, its
performance in a new application is likely to be better than a
newly created component that undergoes its first production tests
in your application.

Most programmers and managers know the value of reusing


software components; the problem is they usually don’t have a
rich set of suitable components available when they start a new
project, so they’re forced to build what they need. More often
than not, deadline pressures and limited resources keep newly
built components from being implemented so they can be reused
on future projects &151; and the problem propagates through
another project cycle.

The fundamental barrier to reuse is economic: Reuse pays off


only across projects and over time, but requires investment in the
short-term. It’s expensive to build and catalog reusable
components, so unless an organization has some financial
practices that distribute the component supplier’s costs to the
consumers of the components, nobody will build for reuse.

The economics of reuse suggest an easy solution: Buy reusable


components, if suitable ones are available. One of the undeniable
advantages of developing applications in C/C++ is the large
number of source and executable libraries available as
commercial products. For standard system functions, such as file
systems, user interfaces, and communications and connectivity,
commercially available products are often of higher quality and
lower cost than if you developed them from scratch. These
packages are so inexpensive for MS-DOS and Windows
environments that a reasonable strategy is to buy multiple
products that cover the same areas, try them out in your own
program development, and select the product (or parts of each)
that satisfies your requirements best.

Although developing your own reusable components is more


expensive, it is sometimes necessary for application areas not
covered by commercial products. As I mentioned earlier, it’s
essential that the organization recognize the cost and value of
building reusable components. This may take the form of an
explicit method of project accounting, or it may be implicit; for
example, by assigning some staff the primary responsibility of
“component builder.”

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Previous Table of Contents Next

Principles Of Reuse

A programmer won’t reuse a component unless three conditions


are met:

* The programmer can find a suitable component


* The component can be used without too much work
* The component is reliable and performs adequately

The first condition requires a good “catalog” of components.


Components should be indexed by various functional
characteristics. For example, C++ “container” classes for various
types would be listed in an index by container type (set, bag,
ordered list, etc.) and by element type (integer, string, etc.). In
addition, the catalog should contain both an abbreviated and a
complete interface description for each component. (The need
for explicit interface specifications may shock some
programmers, but then they probably aren’t the best
programmers to write reusable components anyway.)

Unlike functions or classes that are used only in one program,


reusable components have to be carefully designed so they are
general enough to be used in a variety of applications (otherwise
there may not be a large enough base of reuse to recover
development costs), but specific enough to be easily used. In
practice, this is a challenging goal, often requiring mastery of the
art of programming to do well.

The last condition for reuse is that a component be


“trustworthy.” No programmer wants to reuse a component if
that increases the chance of an application failure or slows down
the application too severely. This means that reusable
components must be carefully designed and implemented, and
thoroughly tested before they become “public.”

Certain practices can increase the chances for successful reuse.


First, be sure somebody plays the role of “librarian” for all
purchased and homegrown components. The librarian’s
responsibility is to keep the catalog of components up-to-date
and easily accessible, either on-line or in hard copy. The
librarian is also a good person to receive requests from other
staff for various types of components that may not already be
available.

You should maintain the components in the “production quality”


library; treat them as you would the code in any production
application. Don’t put “almost complete” components in the
“production quality” library. If you choose to make them
available, isolate them in “as is” libraries with no “warranty.”

In general, don’t assign novices to write reusable components


until they’ve had substantial experience writing application code
and reusing components in their code. This is especially true of C
++ classes, which can be very tricky to get right the first time.

Finally, reward programmers who build and use reusable


components — these are the people who keep you out of the
maintenance graveyard. Successful application development
groups work cohesively, with attention to the overall results of
the whole group, not just to individual achievements. “Team
players” have always been more valuable than “lone cowboys”;
but with the increasing importance of code reuse, a “team
player” approach to development becomes especially critical.

You Can C Clearly Now

Just as C programming doesn’t have to be a mysterious practice,


neither does managing C development. Principles that apply to
all software development are doubly important when working
with C and C++ languages — careful attention to standards and
processes, and using adequate tools to overcome weak areas of
the languages and maintain control of application objects.

And most of all, it’s a matter of attitude. Doing it the same old
way won’t work for programming in C or managing C
development. A successful C programmer will take extra steps
to guard against language traps or dangerous programming
practices. Likewise, a successful manager will be sure that his or
her programmers follow safe, standardized C programming
practices. With the right attitude and a good set of programming
guidelines, you can have “calm Cs ahead.”

Previous Table of Contents Next


Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Table of Contents

BIBLIOGRAPHY

Books

Aho, Alfred V., Brian W. Kernigan, and Peter J. Weinberger.


The AWK Programming Language. Reading, MA: Addison-
Wesley Publishing Co., 1988.

Examining the Awk programming language gives


you an idea of what C "done right" might be —
Awk is easy and safe to use, yet has more power
than C for many of the common types of text and
table processing tools C is often used for. This
book is short and easy to follow. Combine it with
one of the Awk interpreters or compilers available
for the PC, and you'll have a useful tool, as well as
a source of ideas for useful C functions and macros.

Cargill, Tom. The Elements of Programming Style. Reading


MA: Addison-Wesley Publishing Co., 1992.

Cargill has produced a C++ book in the tradition


of The Elements of Programming Style by
Kernighan and Plauger. He focuses on
programming technique, rather than lexical style,
and does a brilliant job. Like Kernighan and
Plauger, Cargill starts with previously published
program examples and shows their flaws as an
introduction to writing correct and comprehensible
C++ programs. If you encounter a C++ "expert"
who "pooh-poohs" the difficulty of writing robust,
reusable C++ classes, push this book in their face
— Cargill knows better and demonstrates it with
examples. Following the book's 58 rules for C++
programming should be a minimum requirement
for any C++ programmer.

Coplien, James O. Advanced C++ Programming Styles and


Idioms. Reading, MA: Addison-Wesley Publishing Co., 1992.

Don't consider yourself a "master" C++


programmer until you can rightfully claim to
understand every chapter in this book. That may
take awhile because Coplien has covered a broad
range of truly advanced C++ applications. This
book is definitely not tutorial, but I still suggest it
as an important book for aspiring C++
programmers (and their managers). It's the deepest
treatment I've found on applying fundamental
software engineering principles to C++
programming. It should provide a healthy "reality
check" before anybody goes merrily off to create
their own reusable classes. Until a programmer
fully understands the first seven chapters of this
book (which deal with various object-oriented
programming concepts in C++), he or she isn't
ready to write production-level, reusable C++
classes to be shared with other programmers.
Another thing to be gained from reading Advanced
C++ Programming Styles and Idioms is an
appreciation for other OOP languages, such as
Smalltalk. Coplien describes how to implement the
necessary dynamic typing and memory
management to give C++ some of the power of
languages like Smalltalk; and although Coplien's
explanations are clear, the primary lesson is: "Why
bother, when other languages already do this
gracefully?"

Hamilton, Jennifer. C For RPG Programmers, Loveland, CO:


Duke Press, 1992.

Crossing the waters between RPG and C may not


be easy, unless you've got a guide who knows the
territory well. Hamilton has worked on IBM's
RPG/400 compiler and uses her experience with
RPG to provide helpful comparisons between the
two languages. This unique book is a tremendous
resource for RPG programmers who want to learn
C, or 3X/400 managers who are evaluating how C
might fit in their development toolset. Hamilton's
favorable assessment of C provides valuable
counterpoints to my more critical assessment of the
language.

Horton, Mark R. Portable C Software. Englewood Cliffs, NJ:


Prentice-Hall, Inc., 1990.

C is very portable, but transporting C programs


opens up a whole new set of traps, many of which
are extraordinarily subtle. If you use C to
implement software that runs on more than one
machine, you should have this book.
Kelly, Al, and Ira Pohl. C by Dissection, second edition.
Redwood City, CA: The Benjamin/Cummings Publishing Co.,
1992.

C by Dissection might handily win a poll as the


"best C tutorial." It earns this wide respect because
it teaches C by "dissecting" sample programs so
you can learn by example. Each chapter also has
short sections on programming style and common
programming errors. The second edition covers
ANSI C and has a short chapter on C++, which is
useful but not really adequate for learning C++.
There's also an excellent short introduction to the
"make" utility. I only wish the authors would do a
larger "make by Dissection" book, too. This is a
"must have" for anyone learning C.

Kernighan, Brian W., and Dennis M. Ritchie. The C


Programming Language, second edition. Englewood Cliffs, NJ:
Prentice Hall, 1988.

A revision of the original C reference and guide


(now reflecting the ANSI C standard), this is my
favorite source for understanding the formal
definition of C. The authors' explanations, while
precise, are much more comprehensible than the
ANSI or SAA references.

Koenig, Andrew. C Traps and Pitfalls. Reading, MA: Addison-


Wesley Publishing Co., 1989.

You must have this book if you program in C. Read


it, believe it, and then think about whether you
really want to use C.
Ladd, Scott Robert. C++ Techniques & Applications. Redwood
City, CA: M&T Books, 1990.

This is a good source of ideas for C++ string


implementations. Even if you don't use C++, you
can use the concepts to build a standard C function
and macro library.

Lafore, Robert. The Waite Group's Microsoft C Programming


for the PC, second edition. Indianapolis, IN: Howard W. Sams &
Co., 1989.

This is one of the best introductory, tutorial books


on C. It is useful even if you use a compiler other
than Microsoft C.

Miller, Webb. A Software Tools Sampler. Englewood Cliffs, NJ:


Prentice-Hall, Inc., 1987.

This is one of the better advanced C programming


texts. The style follows that of Kernighan and
Plauger's Software Tools. In his book, Miller also
uses the important concept of abstract data types,
although his code could be improved. The tools
used as examples are handy additions to a C
library, and the source code is available on diskette.

Oualline, Steve. C Elements of Style. San Mateo, CA: M&T


Publishing, 1992.

If you want a huge headstart on establishing C


programming standards, begin with the 113 rules
from this book. It's patterned somewhat after the
wonderful classic The Elements of Programming
Style by Kernighan and Plauger but concentrates
more on C lexical style than programming
practices. I disagree with a few of the suggestions.
For example, Oualline suggests: "When an if
affects more than one line, enclose the target in
braces." A much easier and safer practice is to
always use braces in if statements. But the
organization of the book makes it very easy to
revise and adapt Oualline's suggestions. Part I of
the book discusses the principles underlying the
suggested rules. Part II is a concise "Style Manual"
that can be quickly accessed for a particular topic.
Appendix A is an annotated set of examples and
Appendix B is a list of rules by topic. The book also
includes short chapters on C++ and on the "make"
utility and application directory organization. This
is certainly a book every programming manager
should have as a starting point for setting up C
programming standards.

Ranade, Jay, and Alan Nash. The Elements of C Programming


Style. New York, NY: McGraw-Hill, Inc., 1993.

Ranade and Nash copied the title of The Elements


of Programming Style by Kernighan and Plauger
most closely of any of the C/C++ "style" books, but
theirs is least in the spirit or form of the original.
Rather than being a discussion of programming
principles derived from published examples, this
book is a list of several hundred tips accompanied
by short "do and don't" examples. Many of their
suggestions are sound, some are matters of
preference, and some are downright wrong. For
example, in one chapter they recommend: "Do not
replace C keywords or idioms with macros." But
their justification that all C programmers
recognize a common set of idioms is unconvincing,
especially as C usage expands beyond a small
clique of Unix systems programmers. C
programmers in the same company should use a
standard set of macros (and "helper" functions)
across projects and tricky macros should be
avoided. But limiting yourself to arcane, primitive
forms of C coding — whether or not they are
"idioms" — when simple macros can make coding
easier and safer is a foolish restriction. In a later
chapter Ranade and Nash even suggest: "Minimize
the use of C idioms...replace [them] with a function
or macro." Their rationale in this section is "C
idioms...are difficult to understand" — a much
more sensible assessment than their earlier
comments. Although I don't think Ranade and Nash
have nearly the insight into C programming that
some of the other authors listed here do, this is still
a good book to scan quickly for C programming
tips and additional items to add to your own "C
Programming Standards" document. Just don't take
their advice as "Gospel."

Schustack, Steve. Variations in C. Redmond, WA: Microsoft


Press, 1989.

This isn't the first book to read on C, but it has a


few good ideas, including the idea for "visibility"
macros. The book also offers a few helpful C style
guidelines, including putting a blank after every
( and [ and before every ) and ]. You'd be surprised
how much this little tip can improve C readability.

Spuler, David. C++ and C Efficiency. Sydney, Australia:


Prentice-Hall of Australia, Ltd., 1992.
As I described in Chapter 6, you can get off course
by tricky C programming that's intended to shave a
few milliseconds off a program. That doesn't mean
you shouldn't know something about performance
of C programs, however, and Spuler provides a
reasonable guide. He covers both low-level
techniques, such as replacing multiplication with
addition (a technique which would only be
worthwhile inside a loop executed many times), as
well as high-level tactics aimed at algorithm
improvement. Although the book is a valuable
resource, my caution is to draw on its low-level
techniques sparingly.

Stroustrup, Bjarne. The C++ Programming Language, second


edition. Reading, MA: Addison-Wesley Publishing Co., 1992.

Because Stroustrup knows C++ like nobody else —


he was the primary designer — this is an excellent
reference for both the history and the philosophy of
C++, as well as the technical details of the
language. I find Stroustrup's writing style slow-
going at times so I wouldn't recommend this as a
tutorial. He does a good job of describing which C
++ features are appropriate for certain types of
problems, which can help someone new to C++
pick the right techniques.

Microsoft QuickC, Version 2.0 — C for Yourself. Redmond,


WA: Microsoft Corp., 1988.

Chapter 10, "Programming Pitfalls," includes


more than two dozen pitfalls, a number of which
are not covered in Koenig's book (C has enough
pitfalls for a couple of books).
IBM Manuals

Systems Application Architecture, Common Programming


Interface C Reference — Level 2 (SC09-1308).

Having read other SAA manuals, I was surprised that the SAA C
reference actually shed some light on several C pitfalls. But
don't look to it as your primary source for learning C, or even as
the first place to turn for an "official" reference. In too many
cases, the information is scattered and inconsistent. For example,
"linkage" is discussed sometimes as an attribute of a variable
(which it is) and at other times as a relationship between two
identifiers (which, although it makes more sense for the term
"linkage," isn't the way C defines it).

Table of Contents
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Table of Contents

APPENDIX A — C CODING SUGGESTIONS

Chapter 2

* Don't use = in an if statement expression, unless it is absolutely


necessary.
* Define a macro EQ for ==, and never use ==.
* Define macros for &, |, &&, and ||.
* Define macros for BOOL, TRUE, and FALSE.
* Use only Boolean-valued expressions in if statements.
* Use only Boolean variables with the logical operators && and ||.
* Do all assignments as separate statements, not as part of a more
complex expression.
* Use parentheses in expressions to explicitly define order of
evaluation.
* Don't use %i format specifications or numbers that begin with 0.
* Be sure to code addresses for arguments to scanf and similar
functions.

Chapter 3

* Always enclose conditional code in braces.


* Do all assignments as separate statements, not as part of a more
complex expression.
* Use parentheses around expressions on return statements.
* Never use the C switch statement.
* Place the opening /* and closing */ for comments on lines by
themselves, and use a | to begin each line of comment text.

Chapter 4

* Declare C arrays with one extra element, and don't use the element
with subscript 0.
* Use macros to define tables and loops over them.
* Always guard a string assignment against overwriting the target
variable.
* Create macros and functions to define strings and provide "safe"
string operations.

Chapter 5

* Declare functions static if you intend to call them only from within
the same source file.
* Use the most restricted visibility possible for variables; avoid
shared variables.
* Put all external declarations before the first function definition in a
file.
* Put functions that must share data and external declarations for
their shared variables in a file by themselves.
* Use EXPORT, SHARE, and IMPORT macros to clarifythe
intended visibility of a variable.

Chapter 6

* For non-array "output" or "input/output" parameters, use local


variables instead of dereferenced parameters in function calculations.
* Use array notation instead of pointers anddereferencing when
you're working with arrays.
* When working with pointers in assignment statements, double-
check that you're using the right level of indirection.
* Unless you're positive a pointer has been initialized, check it for
NULL before using it.
* Use a new_string function to return new strings from functions.
* Always check for a NULL return value after calling malloc.
* Use the allocate macro to prevent memory "leakage."
* Always initialize pointers in their definitions.
* Use typedef and PTR, contents_of, and address_of macros to
improve program readability.
* Avoid popular, but tricky, C idioms for business application
programming.
* Don't use ++ or — in assignments.

Table of Contents
Common Sense C - Advice & Warnings for C and C++
Programmers
(Publisher: 29th Street Press)
Author(s): Paul Conte
ISBN: 1882419006
Publication Date: 10/01/92

Table of Contents

#if...#endif facilities, 61
* dereferencing operator, 39
omissions of, 39-40
reducing use of, 47
++ and -- operators, 50
array subscripts and, 59
in assignments, 56
side effects of, 59
Ada language, 1, 2
comments and, 18
Addresses, 39-40
argument, 10-11
changing, in pointer variable, 41-42
allocate macro, 46
code for, 47
AND operator, 8
Arguments. See specific types of arguments
Array notation, 41-42, 63
compared to pointer/dereferencing notation, 42
multidimensional arrays and, 65
Arrays, 21
declaring, 22
multidimensional, 65
rules for using, 21-23
starting, 21
See also Array notation
Array subscripts, 54
++/-- operations within, 59
using [] and, 65
Assignments, 7
++/-- operators in, 56
functions, implementing, 76
in if statement, 8
levels of indirection in, 43
of one string variable to another, 24
pointers and, 43
as separate statements, 10, 14
within return statement, 13
Awk language, 84

Braces, 15-16
conditional statements and, 14, 15, 60
in macro definitions, 60-61

C++ language, 1, 4
advantages, 71-73
classes, 73-76, 77
coding suggestions, 78
comments, 69-70
compiler, 85
const variables and, 70
defined, 69
development, 79-88
disadvantages, 73-77
explained, 4
in-line functions, 71
macros and, 72
objects, 74, 77
output parameters and, 2
parenthesis in, 73
standards, 80-84
"streams" I/O package, 72
strings and, 25, 27
templates, 71-72
training in, 84
working with, 69-78
See also C language
C++ Programming Style, 73
Case-sensitivity, 66
cat function, 51-52
code for, 51
defined, 51
successful call of, 52
unsuccessful call of, 52
See also Functions
char variable, 61
coding suggestions, 67
default, type, 61
rule for using, 62
signed, 61
unsigned, 61
See also Variable declarations
C language
alternative to other languages, 1
assembly language and, 2-3
basic problem with, 1-3
condensed coding style and, 3
conquering, 4-5
development, 79-88
machine-level programming and, 10
programming attitude of, 5
programming with, 3
sequence of operations, 3
source/executable libraries, 5
speed and, 54-56
standards, 80-84
training in, 84
uses for, 1
See also C++ language
Classes, 73-76, 77
creating, 74, 80
definitions of, 74
OutQ, 73-75
reusing, 80
string, 75-76
C library functions, 62
errno variable, 64-65
See also Functions
Code analyzers, 85
CodeCheck, 85
Code reuse, 87
Coding
alternative, styles, 83
changing standards and, 83-84
discipline, 80
loops, 21
quick, 14
Coding suggestions, 12
array and string, 27
C++, 78
char variable, 67
macro, 67
pointers, 53
syntax, 19
variable declaration, 37
See also Coding
Comments
C++ and, 69-70
delimiting, 18
macro definitions and, 69-70
runaway, 18
sample code for, 18
syntax for, 17-20
Components, reusable, 87
principles of, 87-88
Constants
const variables and, 70
integer, 70
octal, 10
const variables, 70
See also Variable declarations
C pointers, 1
at application level, 2
See also Pointers

D
Dangling pointers, 44
creating, 44
See also Pointers
Debuggers, 85
Declarations, 29
examples of, 29-30
external, 31-32
between functions, 33-34
making, 35
primary rule for, 33
See also Variable declarations; Visibility
Dereferenced pointer parameters, 40
local variables and, 41
notation, 42
post-increments and, 50
when to use, 40-41

Eiffel language, 12
else clauses, 15-16
errno variable, 64
using, 65
See also Variable declarations
Errors, typo, 13
Export variables, 33
defining, 33
See also Variable declarations
extern keyword, 30, 32

Function arguments, 58
declaring, as const, 63
passing, 62-63
Function parameters, 2
Function prototypes, 62
for C library functions, 62
header files and, 62
Functions
addresses for, 10-11
assignment, implementing, 76
cat, 51-52
C library, 62
errno variable, 64-65
coupled, 34
declaring, 35
static, 30
EXPORT, 35
free(), 72
in-line, 71
malloc(), 72
member, 71
revised, 77
month_name, 46
new_string, 45
outq_entry, 73, 74
parenthesis in, 64
returning two values, 40
scanf, address for, 10-11
SHARE, 35
strcpymax, 58
strupr, 54, 55
system, 73
upper_case, 54-55
visibility classes and, 33

Header files, 62
High-level language (HLL), 40

Identifiers, 66
global, 66
if statements, 7
assignment in, 8
else clause matching and, 15-16
expressions within parenthesis, 9
logical expressions in, 9
non-zero value of, 9
In/out parameters, 41
local variables with, 42
Integers, 61-62
constants, 70
decimal, 10
unsigned, 62
int variable, 62
See also Variable declarations

Linkage, 30
internal, 32
Lint filters, 7-8
Local variables, 33
dereferenced parameters and, 41
with in/out parameters, 42
using, 33
when to use, 40-41
See also Variable declarations
Logical expressions, 9
Loops, 21
errors and, 22
for, 22
while, 51-52

Macro arguments, 59
evaluating, 60
Macros
address_of, 48
using, 49
allocate, 46, 47
and/AND, 8
C++ and, 72
conditional logic and, 60
contents_of, 48
C++ and, 72
using, 49, 50
cpystr, 25
cube, 57-58
defining, 60
definitions of, 57-58
braces and, 60-61
double, 57-58
improving, 60
ELSE, 16
ELSEIF, 16
coding with, 17
ENDIF, 16
EQ, 8
EXPORT, 35-36
GLOBAL, 37
IF, 16
IMPORT, 34, 36
or/OR, 8
OVER_TABLE, 23
parenthesis in, 58
pitfalls of, 57-61
avoiding, 58
pointer-related, 48
PTR, 48
C++ and, 72
using, 49
ptrace, 61
REF, 72
for safe strings, 25
using, 26
SEMIGLOBAL, 37
SHARE, 34, 35
size of operator and, 24-25
standard, 81
STRING, 25
STRING_TABLE, 25
strmaxlen, 25
TABLE, 22
table definition, 22-23
table loop, 23
THEN, 16
malloc operation, 46
Memory allocation, 46
Memory leakage, 46
avoiding, 48
reasons for, 47
Memory management, automated, 75-76
Mistakes, common, 7-12
if statements and, 7-8
month_name function, 46

Nested blocks, 33
new_string function, 45
NULL pointers, 43-44
allocate and, 46
alternatives to, 43-44
checking for, 44
reallocate with, 46
See also Pointers

Object data, 74
Object-oriented programming (OOP), 4
applying, techniques, 75
concepts, 4
features, 69
learning, facilities, 84
Operations
char variable and, 62
decrement, 59
embedding, 59
increment, 59
order-dependent, 65
post increment, 3
sequence of, 3, 65-66
Operators
(-), 76
bitwise, 8
delete, 72
logical, 8, 9
new, 72
OR, 8
overloading, 76
precedence levels, 10
sizeof, 24
OutQ class, 73-74
definition of, 74
revised, 75
See also Classes

Parenthesis
after function invocations, 64
C++ and, 73
if statements and, 9
in macros, 58
in order-dependent operations, 65-66
return statement expressions and, 14
using, 10
Pascal language, 1, 2
as foundation to C, 84
Pointers, 1, 39-56
* operator and, 39
address, 2
at application level, 2
array notation and, 41
in assignment statements, 43
changing addresses in, 41-42
dangling, 44
dereferenced, parameters, 40
handle, 2
initializing, in their definitions, 46-47
macros related to, 48
notation, 42, 63
null, 43-44
ways of implementing, 2
Pointer variables
automatic, 47
static, 47
See also Pointers; Variable declarations
"Pretty printers," 85
Programming editors, 83-84, 85
PVCS Version Manager, 86
R

References, 40, 63
C++, 72
return statements
coding, 15
parenthesis and, 14
Return values
after malloc, 46
negative/zero, 9-10
Reuse, coding, 87
principles of, 87-88

Sample code
array of part numbers, 11
for comments, 18
See also Coding; Coding suggestions
Scope, 30
defined, 30
Semicolon, missing, 13-14
Share variables, 33
defining, 33
using, 33
See also Variable declarations
Signals, 66
Software configuration management (SCM), 86
tools, 86
Source-code checkers, 7-8
Standards, programming, 80-84
approaching, 81
company, 80-81
discipline in, 80
evolution of, 83-84
getting started with, 81-83
indenting example, 81
instituting, 82-83
for lexical style, 80
suggested list of, 81-82
symbol use and, 82-83
Statements
conditional
braces and, 14, 15, 60
curly brackets around, 14
subordinate, adding/deleting, 15-16
syntax for, 13-20
See also specific statements
static keyword, 30, 32
initialization and, 32
Streams, 72
Strings, 24
assignments, 24
classes, 75-76
copy function, 26
guarding, copy, 24
macros for, 25
using, 26
objects, 76
safe, 25
type, 45
variable-length, 24
switch statement, 16-17
rule for using, 17

Table definition macros, 22


for two or more dimensions, 22
using, 23
See also macros
Table loop macros, 23
using, 23
See also Macros
Templates, 71-72
typedef feature, 45
using, 47

Unix Source Version Control System (SVCS), 86


upper_case function, 54-55
library function implementation of, 55
pointer implementation of, 55
"speed-demon," 54
subscript implementation of, 55
See also Functions

Values, uninitialized, 11
See also Return values
Variable declarations, 29-37
case-sensitivity and, 66
EXPORT, 36, 37
IMPORT, 36, 37
making, 35
nested blocks and, 33
SHARE, 36, 37
visibility classes and, 33
See also Declarations; Visibility
Visibility, 30
attributes of, 30
classes of, 33
concept of, 30
of C variables, 31
export, 33
function rules, 30
local, 33
share, 33
uses of, 30
variable, 30
See also Declarations

Table of Contents

También podría gustarte