[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Request for help: AMD microcode update testing



Debian users with AMD processors, hello!

I am the Debian Developer responsible for the amd64-microcode package, which
deploys microcode updates to AMD x86-64 processors, fixing several processor
bugs and on a few processors, enhancing "perf" functionality.

Unfortunately, there is a _kernel_ bug that is causing some annoyances,
which I'd like to track down as soon as possible.  However, I cannot track
it down without help because I don't have access to any box with an AMD
processor.

So, I need help from someone with an AMD microprocessor that both:

1. Is willing to compile/install and test a few kernel versions;

2. Has a BIOS+processor combination which was updated by the amd64-microcode
   update package in Debian stable.

The testing requires updating the processor microcode, so it cannot be done
on a VM, it has to be done on bare-metal.


The kernel issue seems to be:
  Two [sucessful] AMD processor microcode updates in a row cause the kernel
  to hang the process that initiated the microcode update.

  (the hung process can be killed normally with ^C or kill -9).


If you're willing to test, below is a list of the steps required.  Do note
you're testing a kernel bug, so please don't do it unless you're confortable
with a chance of things going wrong and requiring a hard reset (that
should't happen, but better safe than sorry).

All steps need to be run as root.  I will need to know the exact steps where
the hangs happen.

Note: different versions of the amd64-microcode package can be downloaded
from http://snapshot.debian.org/package/amd64-microcode/


I'd appreciate if you test the Debian kernels in Stable and wheezy-backports
(the latest 3.2 and 3.10 kernels).  It would also be very helpful if you can
test kernel 3.5.2 from www.kernel.org (which is know to be broken).


Step 1:  Run with BIOS microcode:

  1.1. purge the amd64-microcode package
  1.2. reboot. 
  1.3. cat /proc/cpuinfo > /tmp/step1-log.txt

Step 2: Do the first microcode update:

  2.1. modprobe microcode

  2.2. install the amd64-microcode package from Debian stable
       (versions 1.20120910-2 or 2.20120910-1~bpo70+1)

  ** IF THE PACKAGE INSTALL LOOKS LIKE IT HANG, SEE "it hang" BELOW **

  2.3. for kernels 3.0 to 3.5 do this:

       echo -n 1 > /sys/devices/system/cpu/cpu0/microcode/reload

       for kernels 3.6 and later, do this:

       echo -n 1 > /sys/devices/system/cpu/microcode/reload

  ** IF THIS HANGS: try to interrupt with ^C, and if that doesn't work,
     use a second terminal to kill the hung process with kill -9.
  
  2.4. dmesg -c | grep microcode > /tmp/step2-log.txt

Step 3: Do the second microcode update in a row:

  3.1. install the amd64-microcode package from Debian testing/unstable
       (version 2.20131007.1+really20130710.1)

  ** IF THE PACKAGE INSTALL LOOKS LIKE IT HANG, SEE "it hang" BELOW **

  3.2. for kernels 3.0 to 3.5 do this:

       echo -n 1 > /sys/devices/system/cpu/cpu0/microcode/reload

       for kernels 3.6 and later, do this:

       echo -n 1 > /sys/devices/system/cpu/microcode/reload

  ** IF THIS HANGS: try to interrupt with ^C, and if that doesn't work,
     use a second terminal to kill the hung process with kill -9.
  
  3.3. dmesg -c | grep microcode > /tmp/step3-log.txt

Step 4: Cleanup

  If you got a hang on step 2, please purge the amd64-microcode package,
  and wait until the problem is fixed.  I will report back to this thread.
  Alternatively, you can switch to a kernel that doesn't have the problem.

  If you got a hang on step 3, you can use the "add exit 0 to postinst"
  workaround discribed in bug #716917:
  http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=716917


** IT HANG **

If you experience hangs, please note the exact step where it happened, and
if possible, please press Sysrq+t and Sysrq+w on the keyboard: that will log
a lot of information to /var/log/kern.log that will help track down the bug
inside the kernel.

After you did that, you can kill the hung process, it will be stuck on state
"D" like this:

ps ax | grep amd64-microcode
 4237 pts/2    S+     0:00 /bin/sh /var/lib/dpkg/info/amd64-microcode.postinst configure 2.20120910-1
 4243 pts/2    D+     0:00 /bin/sh /var/lib/dpkg/info/amd64-microcode.postinst configure 2.20120910-1

(the version at the end can be different).  Kill the process in D state.

You can force the package to finish installing itself by doing this:

   (echo '#!/bin/sh' ; echo 'exit 0' ) > /var/lib/dpkg/info/amd64-microcode.postinst
   chmod +x /var/lib/dpkg/info/amd64-microcode.postinst
   dpkg --pending --configure


** SENDING ME THE DATA **

If you got any hangs, I'd appreciate the logs.  Please send them to me
*directly* (and not to the mailing-list), and if possible compress them
first (with something like gzip, bzip2 or xz).  I also need to know the
exact steps where the kernel hang.

The logs are in /tmp/step*-log.txt, and /var/log/kern.log.

If you did NOT get any hangs, just reply to the email and tell me your
kernel version, and that you did not get any hangs on any of the steps.

Thank you!

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh


Reply to: