[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#628600: cdrom_id freezing problems seem like a race condition



Severity: normal
Version: 2.6.39-2

Hi folks,

I've also been suffering from broken suspend on my Thinkpad, due to
cdrom_id refusing to freeze. This is on a Thinkpad X201, using an
Ultrabase X200 docking station. The problem occured for me both using
udev 171-2 as well as 172-1. Most of my debugging was done using 172-1.

I've done some extra debugging, the cdrom_id process that hangs
is the one started by this udev line from
/lib/udev/rules.d/60-cdrom_id.rules:

	# import device and media properties and lock tray to
	# enable the receiving of media eject button events
	IMPORT{program}="cdrom_id --lock-media $tempnode"

(Note that in udev 171, cdrom_id is run without --lock-media)

This line is run when I press the "undock" button on my docking station.
I see that there is a udev event with ACTION=change. This cdrom_id
execution then never terminates, which prevents suspend from working.
Note that the process is already hung after pressing the undock button,
the suspend only reveals the problem, it does not cause it AFAICS. When
the cdrom_id is in this hung state, it seems impossible to attach strace
or gdb to it (or perhaps strace attaches succesfully but nothing
happens, but both strace and gdb need a SIGKILL to terminate, ^C is not
enough).

While trying to debug this problem by running cdrom_id with --debug
and/or running it under strace, I found that the problem went away (and
I found an "unable to open '/dev/sr0'" message on stderr. However, there
has been at least one execution where nothing appeared on stderr and
cdrom_id, so I suspected that there might be a race condition: cdrom_id
opens /dev/sr0 just before the device disappears, somehow making
cdrom_id hang after that.

This is confirmed by making a small change to 60-cdrom_id.rules:

	IMPORT{program}="/bin/sh -c 'sleep 1; /lib/udev/cdrom_id --lock-media $tempnode'"

Using this line, the problem seems to go away entirely (I haven't tested
this for a longer period of time, but it hasn't occurred yet, while it
was pretty reproducable before).

I also captured stderr with this change, which again shows the "unable
to open '/dev/sr0'" message, confirming my suspicons.


I'm not exactly sure how cdrom_id should work internally or what the
real cause or solution of the problem is, but perhaps this helps direct
the search.

If more testing is needed, I'm happy to help out. If someone provides
some pointers for things to try (and/or where to look in the sourcecode,
I'm fairly well-versed in C), I'll see if I can find out more.

Gr.

Matthijs

-- System Information:
Debian Release: 6.0
  APT prefers stable
  APT policy: (990, 'stable'), (500, 'proposed-updates'), (500, 'unstable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.39-2-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_US.utf8, LC_CTYPE=en_US.utf8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Attachment: signature.asc
Description: Digital signature


Reply to: