## Can you tell a Computer Scientist from the way they write loops? Part 2

May 8th, 2014 | Categories: C/C++, Guest posts, programming | Tags:

This is a guest article written by friend and colleague, Ian Cottam. For part 1, see https://www.walkingrandomly.com/?p=5435

So why do computer scientists use (i != N) rather than the more common (i < N)?

When I said the former identifies “computer scientists” from others, I meant programmers who have been trained in the use of non-operational formal reasoning about their programs. It’s hard to describe that in a sentence or two, but it is the use of formal logic to construct-by-design and argue the correctness of programs or fragments of programs. It is non-operational because the meaning of a program fragment is derived (only) from the logical meaning of the statements of the programming language. Formal predicate logic is extended by extra rules that say what assignments, while loops, etc., mean in terms of logical proof rules for them.

A simple, and far from complete, example is what role the guard in a while/for loop condition in C takes.

for (i= 0; i != N; ++i) {
/* do stuff with a[i] */
}

without further thought (i.e. I just use the formal rule that says on loop termination the negation of the loop guard holds), I can now write:

for (i= 0; i != N; ++i) {
/* do stuff with a[i] */
}
/* Here: i == N */

which may well be key to reasoning that all N elements of the array have been processed (and no more). (As I said, lots of further formal details omitted.)

Consider the common form:

for (i= 0; i < N; ++i) {
/* do stuff with a[i] */
}

without further thought, I can now (only) assert:

for (i= 0; i < N; ++i) {
/* do stuff with a[i] */
}
/* Here: i >= N */

That is, I have to examine the loop to conclude the condition I really need in my reasoning: i==N.

Anyway, enough of logic! Let’s get operational again. Some programmers argue that i<N is more “robust” – in some, to me, strange sense – against errors. This belief is a nonsense and yet is widely held.

Let’s make a slip up in our code (for an example where the constant N is 9) in our initialisation of the loop variable i.

for (i= 10; i != N; ++i) {
/* do stuff with a[i] */
}

Clearly the body of my loop is entered, executed many many times and will quite likely crash the program. (In C we can’t really say what will happen as “undefined behaviour” means exactly that, but you get the picture.)

My program fragment breaks as close as possible to where I made the slip, greatly aiding me in finding it and making the fix required.

Now. . .the popular:

for (i= 10; i<N; ++i) {
/* do stuff with a[i]
}

Here, my slip up in starting i at 10 instead of 0 goes (locally) undetected, as the loop body is never executed. Millions of further statements might be executed before something goes wrong with the program. Worse, it may even not crash later but produce an answer you believe to be correct.

I find it fascinating that if you search the web for articles about this the i<N form is often strongly argued for on the grounds that it avoids a crash or undefined behaviour. Perhaps, like much of probability theory, this is one of those bits of programming theory that is far from intuitive.

Giants of programming theory, such as David Gries and Edsger Dijkstra, wrote all this up long ago. The most readable account (to me) came from Gries, building on Dijkstra’s work. I remember paying a lot of money for his book – The Science of Programming – back in 1981. It is now freely available online. See page 181 for his wording of the explanation above. The Science of Programming is an amazing book. It contains both challenging formal logic and also such pragmatic advice as “in languages that use the equal sign for assignment, use asymmetric layout to make such standout. In C we would write

var= expr;

rather than

var = expr; /* as most people again still do */

The visible signal I get from writing var= expr has stopped me from ever making the = for == mistake in C-like languages.

1. This perfectly illustrates the difference between a computer scientist and a computer engineer. In either case you have to examine the loop to conclude the condition of i. If the loop somehow has a spurious ‘++i’, you’re going to have more problems with (i == N) than the programmer who wrote the end of loop condition defensively (i < N).

The pragmatic programmer assumes nothing. If you want i to be equal to N, then it's a quick check afterwards. If you're really worried about it, the correct thing to do is throw in an assert to trap the case that i isn't equal to N.

2. I think “Here: i == N” assumption is dangerous. What if we escape the loop prematurely by “break”?

Imho, if programmer rely on such hidden and easily breakable assumption, he will probably encounter problems with code extensibility and support in the future.

Typography-perfectionists who like the proper spacing might use “expr == var” to avoid “==” mistake. E.g. “NULL == ptr”.

3. Sorry, but the choice of the termination rule is entirely dependent on what type of set you are iterating and if you need i afterwards. Real-life algorithms tend to be much more complex than these classroom examples.
When iterating over window in a grid of cells for example, 0 iterations is exactly what I want if x1>= x0. Not crash or hidden memory corruption, thank you.
I only use == and != when iterating down/up to zero and when iterating over non-integer sets/sequences. Also I use “yoda conditionals” mentioned above as a more robust way to prevent unwanted assignments .

4. Beware!

for (double x=0.0; x!=1.0; x+=0.1) …

will not work as expected.

5. I completely agree. If i < n does prevent a crash, it will do so by also hiding a potentially wrong result. A silent wrong result is always worse than a crash. If the program crashes, at least I have proof that something's wrong.

I like to write programs in a way that'll bring out problems rather than hide them.

6. This is nonsense. Loops on doubles, or within dynamic ranges, or with internal manipulation of the index are all perfectly valid cases where > works and != does not.

What’s more, your assumption that i == N is obviously false as soon as you have a break in the loop. Which you do, if, for example, your loop is about searching an item. Which render the belief that i == N incorrect, on top of being useless. (Except when the break index is exactly what is being looked for in a loop, I have yet to see a use for said index after the loop. In most proper cases, the index is scoped inside the loop).