[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#47289: nscd short write in cache_addhst, broken pipe error followed by system being totally hosed



severity 58367 critical
stop

I'm promoting this bug to critical, because it hoses the entire box when it
occurs. Recovery is difficult or impossible without a hard reboot. This bug
must be fixed, or nscd must be removed before potato is released.

I had this problem again today. When my Internet connection was down, I
noticed several messages like the following in my log:

Mar 19 22:26:30 osiris nscd: 1053: short write in cache_addhst: Broken pipe
Mar 19 22:28:26 osiris nscd: 1054: short write in cache_addhst: Broken pipe
Mar 20 12:19:35 osiris nscd: 1056: short write in cache_addhst: Broken pipe
Mar 20 17:29:34 osiris last message repeated 11 times
Mar 20 23:37:16 osiris last message repeated 11 times
Mar 21 07:42:23 osiris last message repeated 11 times
Mar 21 16:49:24 osiris nscd: 1052: short write in cache_addhst: Broken pipe
Mar 21 16:49:24 osiris nscd: 1054: short write in cache_addhst: Broken pipe
Mar 21 16:50:45 osiris nscd: 1056: short write in cache_addhst: Broken pipe
Mar 21 16:50:45 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 16:50:45 osiris nscd: 1050: short write in cache_addhst: Broken pipe
Mar 21 16:50:45 osiris nscd: 1050: cannot write result: Broken pipe
Mar 21 16:50:45 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 16:50:45 osiris last message repeated 3 times
Mar 21 16:53:27 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 16:54:48 osiris nscd: 1053: short write in cache_addhst: Broken pipe
Mar 21 16:54:48 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 16:57:31 osiris nscd: 1052: short write in cache_addhst: Broken pipe
Mar 21 16:57:31 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 16:57:31 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 16:57:31 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 16:57:31 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 16:57:31 osiris last message repeated 2 times
Mar 21 17:00:13 osiris nscd: 1050: short write in cache_addhst: Broken pipe
Mar 21 17:00:13 osiris nscd: 1050: cannot write result: Broken pipe
Mar 21 17:00:13 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:00:13 osiris nscd: 1050: cannot write result: Broken pipe
Mar 21 17:00:13 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:00:13 osiris last message repeated 5 times
Mar 21 17:02:55 osiris nscd: 1055: short write in cache_addhst: Broken pipe
Mar 21 17:02:55 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:02:55 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:02:55 osiris last message repeated 3 times
Mar 21 17:02:55 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:02:55 osiris last message repeated 9 times
Mar 21 17:05:37 osiris nscd: 1054: short write in cache_addhst: Broken pipe
Mar 21 17:05:37 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 17:05:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:05:37 osiris last message repeated 22 times
Mar 21 17:05:37 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 17:06:58 osiris nscd: 1050: short write in cache_addhst: Broken pipe
Mar 21 17:06:58 osiris nscd: 1056: short write in cache_addhst: Broken pipe
Mar 21 17:20:31 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:20:31 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:20:31 osiris nscd: 1055: short write in cache_addhst: Broken pipe
Mar 21 17:21:52 osiris nscd: 1056: short write in cache_addhst: Broken pipe
Mar 21 17:21:52 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:27:16 osiris nscd: 1053: short write in cache_addhst: Broken pipe
Mar 21 17:27:16 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:28:37 osiris nscd: 1056: short write in cache_addhst: Broken pipe
Mar 21 17:28:37 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:28:37 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:28:37 osiris nscd: 1050: cannot write result: Broken pipe
Mar 21 17:28:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:28:37 osiris last message repeated 4 times
Mar 21 17:28:37 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:28:37 osiris nscd: 1052: short write in cache_addhst: Broken pipe
Mar 21 17:28:37 osiris nscd: 1055: cannot write result: Broken pipe
Mar 21 17:28:37 osiris last message repeated 7 times
Mar 21 17:28:37 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:28:37 osiris nscd: 1053: short write in cache_addhst: Broken pipe
Mar 21 17:29:59 osiris nscd: 1054: short write in cache_addhst: Broken pipe
Mar 21 17:31:20 osiris nscd: 1056: short write in cache_addhst: Broken pipe
Mar 21 17:31:20 osiris nscd: 1056: cannot write result: Broken pipe
Mar 21 17:31:20 osiris last message repeated 31 times
Mar 21 17:32:41 osiris nscd: 1050: short write in cache_addhst: Broken pipe
Mar 21 17:34:02 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:34:02 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:34:02 osiris nscd: 1053: cannot write result: Broken pipe
Mar 21 17:34:02 osiris last message repeated 3 times
Mar 21 17:34:02 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:34:02 osiris nscd: 1054: cannot write result: Broken pipe
Mar 21 17:34:02 osiris nscd: 1052: cannot write result: Broken pipe
Mar 21 17:34:02 osiris last message repeated 20 times
Mar 21 17:44:49 osiris last message repeated 23 times

and my box became totally unusable. I suspect too many DNS requests blocking
in nscd is causing it to hose up. You can't log in as any user, nor can you
use su to become root. Ls, ps, w, top, and anything else that must look up a
hostname, uid/gid -> user/group name mapping block forever while trying to
read from the nscd socket. The end result is just about every process
blocking forever. On a system where new jobs are started automatically, the
process and/or fd tables eventually fill up, further compounding the
problem.

The system remained totally unusable until I killed nscd from a root console
I happened to have open. If I did not have a root console open, a reboot
would have been required to return the box to a usable state.

Here's an excerpt from a strace of 'strace ps' while nscd was hosed:

socket(PF_UNIX, SOCK_STREAM, 0)         = 7
connect(7, {sin_family=AF_UNIX, path="
/var/run/.nscd_socket"}, 110) = 0
write(7, "\2\0\0\0\1\0\0\0\2\0\0\0", 12) = 12
write(7, "0\0", 2)                      = 2
read(7,  <unfinished ...>

This is the same problem I had in bug #47289. These reports should be
combined as soon as they have similar states. 

-- 
Brian Ristuccia
brian@ristuccia.com
bristucc@nortelnetworks.com
bristucc@cs.uml.edu


Reply to: