--- Begin Message ---
- To: Debian Bugs <submit@bugs.debian.org>
- Cc: Sergio Visinoni <piffio@piffio.org>
- Subject: libc6: Occasional EPERM during first fread() following popen()
- From: Adam C Powell IV <hazelsct@debian.org>
- Date: Thu, 24 Mar 2005 05:00:27 +0900
- Message-id: <1111608027.3564.51.camel@p4-117-2.mit.edu>
Package: libc6
Version: 2.3.2.ds1-20
Greetings,
I have an MPI program which does a popen and fread, something like:
if (snprintf (filename, 999, "gunzip -c < %s.cpu%.4d.data",
basename, rank) > 999)
return 1;
if (!(infile = popen (filename, "r")))
return 1;
if (ferror (infile))
{
printf ("[%d] Pipe open has error %d\n", rank, ferror(infile));
fflush (stdout);
}
... some stuff ...
nmemb=fread (globalarray, sizeof (PetscScalar), gridpoints * dof, infile);
if (nmemb != gridpoints*dof)
{
printf ("[%d] ferror = %d\n", rank, ferror (infile));
fflush (stdout);
}
So, there seems to be no error in the popen, but on between one and five
CPUs out of about 20, the fread results in an EPERM error. On the other
cluster, the error is less frequent but still there. They're both
identically-configured Debian beowulfs using the diskless package and
mpich, though the one with fewer errors is made of dual AthlonXP 1.53
GHz boxes and the one with more errors of dual Opteron 240 boxes running
Debian stock -k7-smp kernels and 32-bit userland.
On the other hand, the same program earlier fopen()s a file whose path
and name are identical to the popen redirected input except for the
extension, and those work flawlessly.
machines.LINUX on the starting node (say node2) in both cases looks
something like:
node2
node2
node3
node3
etc.
Authentication is via NIS, whose master server (and NFS server for the
files in question) is outside of the "subnet" of these clusters,
something like:
node1 node2 node3 node1 node2 node3
\ | / \ | /
-SWITCH-- -SWITCH--
| |
headnode1 headnode2 NIS master/NFS server
| | |
-------------------SWITCH-------------------------Internet
This problem has since been corroborated by Sergio Visinoni who sees the
error sometimes when using fread() after popen() in a multithreaded
program.
Any ideas on what could be going wrong, or even how to debug this, would
be appreciated.
Thanks,
-Adam
--
GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6
Welcome to the best software in the world today cafe!
http://www.take6.com/albums/greatesthits.html
--- End Message ---
--- Begin Message ---
- To: GOTO Masanori <gotom@debian.or.jp>
- Cc: Adam C Powell IV <hazelsct@debian.org>, 301153-done@bugs.debian.org, Sergio Visinoni <piffio@piffio.org>
- Subject: Re: Bug#301153: libc6: Occasional EPERM during first fread() following popen()
- From: Aurelien Jarno <aurelien@aurel32.net>
- Date: Mon, 22 May 2006 16:46:30 +0200
- Message-id: <20060522144630.GA7231@henry.aurel32.net>
- In-reply-to: <81ll7i52d6.wl@omega.webmasters.gr.jp>
- References: <1111608027.3564.51.camel@p4-117-2.mit.edu> <81vf7g72dw.wl@omega.webmasters.gr.jp> <81ll7i52d6.wl@omega.webmasters.gr.jp>
On Sat, Apr 16, 2005 at 09:32:05PM +0900, GOTO Masanori wrote:
> At Fri, 25 Mar 2005 13:52:59 +0900,
> GOTO Masanori wrote:
> > I think this problem should be separated from MPI and clusters. This
> > kind of random behavior is usually occured by an invalid access. I
> > recommend you to check your program with valgrind in first, then
> > isolate the problem from MPI.
>
> Does this problem still occur with you program? If we'll have no
> reply for this bug, I'll close it...
>
No reply in one year, so closing it with this mail.
--
.''`. Aurelien Jarno | GPG: 1024D/F1BCDB73
: :' : Debian GNU/Linux developer | Electrical Engineer
`. `' aurel32@debian.org | aurelien@aurel32.net
`- people.debian.org/~aurel32 | www.aurel32.net
--- End Message ---