[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Please don't do this (code fragment)



Reply-To: 
In-Reply-To: <[🔎] 20020114064909.GA17907@crdic.ath.cx>

On Sun, Jan 13, 2002 at 10:49:09PM -0800, Craig Dickson wrote:
> elf@florence.buici.com wrote:
> 
> > On Mon, Jan 14, 2002 at 07:01:17AM +0100, Samuel Tardieu wrote:
> > > On 13/01, elf@florence.buici.com wrote:
> > > 
> > > |   int i;
> > > |   for (i = 0; i > -1; i += 1) {
> > > |     // ...
> > > |     if (terminal_condition)
> > > |       break;	
> > > |     // ...
> > > |   }
> > > [...]
> > > | Moreover, i is never used.  The loop could be reduced to
> > > | 
> > > |   while ((file = fts_read (dir)) != NULL) {
> > > |     // ...
> > > |   }
> > > 
> > > Those are not equivalent: the first loop, while ugly, has a guard against
> > > endless looping if fts_read always returns non-NULL for any reason (not
> > > knowing what fts_read contains, it is hard to tell whether there is a
> > > reason for this or not).
> > 
> > A poor reason to write that code for several reasons.  
> 
> Samuel wasn't defending the original code; he was just observing that
> your suggested replacement was not functionally equivalent. Which it
> wasn't.

That is the crux of my argument.  In a 64 bit environment, this code
as written *is* equivalent to my rewrite.

> >   1) It bounds the loop to the size of the integer /2 without
> >      explicitly indicating so.
>
> It's pretty clear if you understand how twos-complement arithmetic
> works. Which is not to say it's the right thing to do, 

Of course that is true.  Yet, such practices define poor style.  The
original author isn't writing what he means.  The fact that 2's
complement arithmetic wraps is a side-effect.  At least the use of
MAX_INT is an honest expression of the loop intent.

> but I've seen code that is much harder than this to understand.
 
And your point is what?  I'm saying nothing about the difficulty of
reading the code.  There is little point in having that discussion.

> >   3) It is non-deterministic.  Should the loop fail to terminate
> >      properly, the program will run for a relatively long time.  15
> >      seconds on the PIII 700MHz I'm using.
> 
> The original code is deterministic (the loop will definitely exit sooner
> or later); your suggested replacement is not, unless it is guaranteed
> that fts_read will eventually return NULL.

This is the kind of code that tends to break as systems evolve.  When
we moved from 16 bit to 32 bit, dependencies on machine organization
broke many programs that depended on them.  As I wrote in my reply, a
64 bit machine would have the same observable behavior as one without
the weak loop-termination guarantee.  Anything that requires more than
a month to terminate when one expects it to terminate in a few of
minutes has effectively hung.

> > While I don't agree that this is an appropriate method for
> > guaranteeing termination, the same result with clearer expression
> > would be
> > 
> >   int loop_max = MAX_INT;
> >   while ((file = fts_read (dir)) != NULL && --loop_max) {
> >   }
> 
> That's clearer, yes. It still sucks, unless there's a really good reason
> to use MAX_INT as the limit. It's hard to say without knowing more about
> the context of this tiny fragment of code, but I would guess that since
> the loop condition is only used to guarantee eventual termination, the
> limit could reasonably be set to a constant that is consistent across
> platforms (whatever that constant might be -- 10^3, 10^6, 10^9,
> whatever), rather than one that will vary by multiple orders of
> magnitude depending on whether the target system uses 16-, 32-, or
> 64-bit ints.

Still a poor choice.  You are hiding a limitation in the code that is
neither appropriate nor desirable.  Limiting the number of files, no
matter how you do it, will always change the semantics of the program.
Can you imaging people's suprise when the generate a search database
that appears to be missing a file that they know exists?  Setting a
time limit on the program run may be a more reliable standard; but
this, too, will be difficult to bound.  A process that runs without
bound will, at least, make itself known by loading the system.

> 
> > I'd rather see the program fail to terminate so that I knew something
> > was awry instead of imposing an unforseen limitation on the program.
> 
> I'd rather see the program send a useful diagnostic message to stderr or
> a log file, and either exit or properly recover from the error.

Of couse.  I agree that the code would be better if there was a
reasonable bound on the extent of the loop *AND* it reports that
something went awry.  But what should be the criteria?

The point is that the original code fragment fails to perform the
desired task deterministicly.  Within a few years, there will be a
significant number of machines running 64 bit CPUs.  Programs that use
this poor expression will be no better than those that don't.  I
believe this is the sort of failure-protection that Linus Torvalds
complains about.  Silently tolerating an error is worse than
discovering it when it executes without bound.

Cheers.



Reply to: