[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Howto implement device detection without thrashing



Randolph Chung writes:
> it's a good idea.... for that matter, since we know the ordering of
> tests, even writing the last test that was conducted successfully might
> be enough info to recover, right? what am i missing?

I wrote:
> You need to save the test results, or start over with test 0 after a
> crash.  But if you start over with test 0 after a crash, what if more
> than one test crashed?

Torsten Landschoff writes:
> We know the order of the tests we are doing so we can just write out the
> test we are currently about to do and have a list of tests which
> succeeded.  Probably most of the tests will fail and we do not need to
> record that.

Yes, that will work.  However, Randolph suggested saving only the ID of the
upcoming test, implying that crash recovery would involve redoing all
tests.  That is what I was responding to:

Start testing.

0: fail
1: fail
2: fail
3: crash
  
Restart, skipping 3 since 3 is in the checkpoint file:

0: fail
1: fail
2: fail
4: fail
5: fail
6: crash

Restart, skipping 6 since 6 is in the checkpoint file:

0: fail
1: fail
2: fail
3: crash

We have an infinite loop.

A straightforward algorithm:

After each test append a fixed record containing the test number and the
result to the checkpoint file.  After a crash you know you crashed because
the file exists, so read the last test number and skip over the one that
follows (since it crashed) and continue.

This has the advantage of requiring only appends and saving results through
a crash, but the disadvantage of making the file size proportional to the
number of tests.

A minimalist algorithm:

Before each test check the list of crashing tests and skip this test if it
is there.  After each test write the test number to the file, overwriting
the last record in the file.  After a crash you know you crashed because the
file exists.  Read it in and increment the last test number.  You now have
your list of crashing tests.  Fix the file to match it, append a dummy
record, and start testing again (with test 0, since we saved no results).

This has the advantage of keeping the file very small but the disadvantage
of requiring seeks and not saving any test results through a crash.  It
does give you a list of crashing tests.

My understanding of what Torsten means:

Before each test overwrite the last record in the file with the number of
the upcoming test if the previous one failed, or append it if the previous
one succeeded.  After a crash read in that last number (it is the number of
the test that crashed), increment it, write it back out, and start testing
again.  

This saves a list of successes through a crash but loses data about
crashing tests and requires seeking.  It keeps the file small, since most
tests will fail.

A variation:

Before each test overwrite the last record in the file with one containing
the number of the upcoming test and a "crashed" flag if the previous one
failed, or append the record if the previous test succeeded.  If this test
succeeds change the flag to "success".  Loop.  After a crash read in the
last record, increment the number, append a new record as above, and
continue.

This saves all data through a crash and keeps the file small.  It still
requires seeks.
-- 
John Hasler
john@dhh.gt.org (John Hasler)
Dancing Horse Hill
Elmwood, WI


Reply to: