[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: reiserfs/md1/failure/threads

While ext3 is a very stable file system in the amd64 Debian disro

For disk intensive applications like databases and those that stream data, XFS is a better choice due to the inherent performance capabilities and it's mature 64bit legacy in the SGI OS

Peter Yorke
Sr. Linux Server Engineer
Thumb typed from a tiny keyboard.

----- Original Message -----
From: Francesco Pietra <frapietra@alice.it>
To: Lennart Sorensen <lsorense@csclub.uwaterloo.ca>
Cc: debian-amd64@lists.debian.org <debian-amd64@lists.debian.org>
Sent: Wed Jul 19 01:14:31 2006
Subject: Re: reiserfs/md1/failure/threads

Thank you for most detailed instructions. On a global balance, I decided to 
carry out a fresh install of amd64 to have ext3 as file system. You (and 
general) strong advice to change to ext3 can not be ignored.

I am just downloading the amd64 net CD install built freshly daily, so that I 
can also help the preparation of the beta3 release of the net install.

This does not mean that I can be sure about the genuinity of my hardware but 
your examination of the signals I have produced suggests that it is. I have 
postponed the examination of the harware because the disks are OK and 
memories could be changed should they prove faulty. It seems the contrary of 
what one normally does: hardware before software but I am not sure to arrive 
at a conclusive test of my hardware.

I have not much to install besides base OS and a few tasks: jwd window 
manager, sensors, compilers if needed, your compilation of mpqc, my 
re-compilation of molecular mechanics (to carry out in any case because of 
improvements to the code). That's about all.

I can anticipate that mpqc 2..3.1 proves great.

Thanks again


On Tuesday 18 July 2006 19:44, Lennart Sorensen wrote:
> On Tue, Jul 18, 2006 at 05:33:51PM +0200, Francesco Pietra wrote:
> > Not to insist any further on the relative merits of the various
> > filesystems, but in the general interest of maintaining amd64 (and
> > therefore of examinining parameters one at once, withouth mixing
> > problems), did you notice my e-mail of today emphasizing that after the
> > crash my data are intact? I wonder whether your suspicion about memory or
> > cpu may be the point. How to carry out a thourough memory test and
> > identifying which slot is defective, if any? Although Kingston ECC, one
> > of the eight slots (1GB each) might be defective.
> Well I have certainly seen a number of messages from people with
> opterons having memory problems over the last few months.  The opteron
> seems to be very picky about memory quality, which makes some sense
> given have efficiently it uses it.  It drives the memory quite hard.
> Simplest way I know of to test memory andd cpu, is to run a lot of large
> kernel compiles.  Often a memory problem will cause that to segfault.
> Anything htat uses lots of cpu and lots of memory is usually a good
> test, at least if it fails spectacularly on an error, like gcc tends to
> do.
> To test the memory, remove half of it, and try the test.  If it fails,
> replace one stick of memory with one of the other ones, until you can
> run the test without a problem.  You could probably even run the test
> with 1 or 2 sticks of memory.  A number of people have managed to find
> faulty memory on an opteron this way.  Some people have come back going
> "I found a faulty stick of memory" after swearing that memtest86 had
> said all their ram was fine and they were sure their name brand ram
> wasn't faulty. :)  memtest86 does't catch all errors.  Of course with
> ECC memory I would have expected to see a machine check exception (MCE)
> if there was any single bit errors in the memory.  I am still most
> inclined to blame reiserfs or perhaps the cpu.  Of course since it was
> multiple errors all coming from reiserfs, with apparently nothing else
> seeing a problem, I really think it may simply be a reiserfs bug.  I was
> using XFS before on early 2.6 kernels on i386, and even tually had to
> give up and move to ext3 since it just wasn't reliably on top of LVM on
> top of MD raid.  The filesystem had some bad interaction with the LVM
> and MD raid that made it not work.  It probably got fixed since, but I
> needed something that worked then, and ext3 worked.
> > What about checking the cpu? I can simply tell that I monitored the
> > temperature during the long calculation, with the machine in a strongly
> > ventilated area. Starting from 36C, the temp raised to 44C at maximum. I
> > don't know the correspondence with real temp ($sensors) but the
> > difference should tell. AMD for my 265 dual opterons indicates case
> > temperature 49-67C (is what I measured just case temp?). AMD also
> > indicate as temp limits 10-35, but I gues this should be the ambient
> > temperatures.
> That temperature is fine as far as I can tell.
> > Also, how to check thouroghly the disks?
> Well there is badblocks which allows disk testing.  In my experience
> though, modern disks tend to either work or fail.  They very rarely have
> small problems.
> --
> Len Sorensen

To UNSUBSCRIBE, email to debian-amd64-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Reply to: