c – Why are these constructs using pre and post-increment undefined behavior?

Your question was probably not, “Why are these constructs undefined behavior in C?” Your question was probably, “Why did this code (using ++) not give me the value I expected?”, and someone marked your question as a duplicate, and sent you here.

This Answer tries to answer that question: why did your code not give you the answer you expected, and how can you learn to recognize (and avoid) expressions that will not work as expected.

I assume you’ve heard the basic definition of C’s ++ and -- operators by now, and how the prefix form ++x differs from the postfix form x++. But these operators are hard to think about, so to make sure you understood, perhaps you wrote a tiny little test program involving something like

int x = 5;
printf("%d %d %dn", x, ++x, x++);

But, to your surprise, this program did not help you understand — it printed some strange, inexplicable output, suggesting that maybe ++ does something completely different, not at all what you thought it did.

Or, perhaps you’re looking at a hard-to-understand expression like

int x = 5;
x = x++ + ++x;
printf("%dn", x);

Perhaps someone gave you that code as a puzzle. This code also makes no sense, especially if you run it — and if you compile and run it under two different compilers, you’re likely to get two different answers! What’s up with that? Which answer is correct? (And the answer is that both of them are, or neither of them are.)

As you’ve heard by now, these expressions are undefined, which means that the C language makes no guarantee about what they’ll do. This is a strange and unsettling result, because you probably thought that any program you could write, as long as it compiled and ran, would generate a unique, well-defined output. But in the case of undefined behavior, that’s not so.

What makes an expression undefined? Are expressions involving ++ and -- always undefined? Of course not: these are useful operators, and if you use them properly, they’re perfectly well-defined.

For the expressions we’re talking about, what makes them undefined is when there’s too much going on at once, when we can’t tell what order things will happen in, but when the order matters to the result we’ll get.

Let’s go back to the two examples I’ve used in this answer. When I wrote

printf("%d %d %dn", x, ++x, x++);

the question is, before actually calling printfdoes the compiler compute the value of x first, or x++or maybe ++x? But it turns out we don’t know. There’s no rule in C which says that the arguments to a function get evaluated left-to-right, or right-to-left, or in some other order. So we can’t say whether the compiler will do x first, then ++xthen x++or x++ then ++x then x, or some other order. But the order clearly matters, because depending on which order the compiler uses, we’ll clearly get a different series of numbers printed out.

What about this crazy expression?

x = x++ + ++x;

The problem with this expression is that it contains three different attempts to modify the value of x: (1) the x++ part tries to take x‘s value, add 1, store the new value in x, and return the old value; (2) the ++x part tries to take x‘s value, add 1, store the new value in x, and return the new value; and (3) the x = part tries to assign the sum of the other two back to x. Which of those three attempted assignments will “win”? Which of the three values ​​will actually determine the final value of x? Again, and perhaps surprisingly, there’s no rule in C to tell us.

You might imagine that precedence or associativity or left-to-right evaluation tells you what order things happen in, but they do not. You may not believe me, but please take my word for it, and I’ll say it again: precedence and associativity do not determine every aspect of the evaluation order of an expression in C. In particular, if within one expression there are multiple different spots where we try to assign a new value to something like xprecedence and associativity do not Tell us which of those attempts happen first, or last, or anything.

So with all that background and introduction out of the way, if you want to make sure that all your programs are well-defined, which expressions can you write, and which ones can you not write?

These expressions are all fine:

y = x++;
z = x++ + y++;
x = x + 1;
x = a[i++];
x = a[i++] + b[j++];
x[i++] = a[j++] + b[k++];
x = *p++;
x = *p++ + *q++;

These expressions are all undefined:

x = x++;
x = x++ + ++x;
y = x + x++;
a[i] = i++;
a[i++] = i;
printf("%d %d %dn", x, ++x, x++);

And the last question is, how can you tell which expressions are well-defined, and which expressions are undefined?

As I said earlier, the undefined expressions are the ones where there’s too much going at once, where you can’t be sure what order things happen in, and where the order matters:

  1. If there’s one variable that’s getting modified (assigned to) in two or more different places, how do you know which modification happens first?
  2. If there’s a variable that’s getting modified in one place, and having its value used in another place, how do you know whether it uses the old value or the new value?

As an example of #1, in the expression

x = x++ + ++x;

There are three attempts to modify x.

As an example of #2, in the expression

y = x + x++;

we both use the value of xand modify it.

So that’s the answer: make sure that in any expression you write, each variable is modified at most once, and if a variable is modified, you don’t also attempt to use the value of that variable somewhere else.

One more thing. You might be wondering how to “fix” the undefined expressions I started this answer by presenting.

In the case of printf("%d %d %dn", x, ++x, x++);it’s easy — just write it as three separate printf calls:

printf("%d ", x);
printf("%d ", ++x);
printf("%dn", x++);

Now the behavior is perfectly well defined, and you’ll get sensible results.

In the case of x = x++ + ++x, on the other hand, there’s no way to fix it. There’s no way to write it so that it has guaranteed behavior matching your expectations — but that’s okay, because you would never write an expression like x = x++ + ++x in a real program anyway.

Leave a Comment