[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: hdparm -t yields incorrect timings when Intel hyperthreading is enabled



I don't favor the interleaved response technique, so even if that technique is favored on this list, I'll just stay with keeping enough context so that previous messages don't need frequent reference.

Next, I don't agree that this hyperthreading problem reeks of a firmware issue. What it reeks of is a linux kernel issue. I'm not going to replace firmware just to hope that it will fix any issue, much less one with hyperthreading. I've already returned the motherboard once for the the HDMI port failing to function, and as far as I can tell, the Intel fix was to downgrade the firmware to a version older than I had on the machine. So I'm especially not going to randomly upgrade away that fix, even more especially when such an upgrade is irreversible.

Next, both recommending a firmware update and asking for /proc/cpuinfo were red herrings reminiscent of a conversation with any modern corporate support department. I will state flatly, that my goal was and is to improve linux. I've been using Microsoft stuff since MS-DOS in 1982. But now at 55, I'm basically jettisoning all that and starting anew with open source. Please give me at least token respect. It's just plain fact that Windows 7 gives one confidence that hyperthreading is functioning properly and linux doesn't

Next, from just this one data point, my experience tells me that linux isn't exactly playing friendly with Intel hyperthreading. Given that Intel is not that interested in hyperthreading any more, that would be maybe expected. Hyperthreading was in retrospect a mistake, just like pentium IV, Itanium, letting AMD do ia64 first, iAPX 432. Intel is a history of mistakes. But make no mistake, it's still a powerful company. I'd love for open source to have that kind of power. I'll work on it.

Next, I don't get what I'm supposed to bisect. Every kernel I've tried, 3.2.0-4, 3.12-0, and 3.12.9 have obvious issues with hyperthreading. So it seems unlikely to me that any kernel would function properly. In order to bisect, the first step is to find a correct kernel. Perhaps someone could recommend one.

Next, I wanted to really verify that the 3.2.0-4 kernel also exhibited the hdparm issue as I actually wasn't 100% certain. The reason that I updated the kernel in the first place from 3.2.0-4 to 3.12-0 was to obtain the native resolution on my 1920x1080 monitor and to allow the machine to successfully S3 suspend and resume. The 3.12 kernel did improve those issues, and the machine will now suspend/resume and it will do 1920x1080 albeit at 16 bits/pixel. But for some reason the card still doesn't steal enough memory, it's getting a paltry 4M, to do the 32 bits per pixel that it's perfectly capable of. That's why I've built my own 3.12.9 kernel, to debug the graphics subsystem issue. With all this stuff flying around, I thought that maybe I was confused and hdparm might actually show the correct disk bandwidth on the 3.2.0-4 kernel.

Next, it's not the case that I was confused. hdparm is still a reliable canary for hyperthreading problems on the dn2800mt motherboard. See attached data below, kernel 3.2.0-4 only . When hyperthreading is disabled in the bios, hdparm shows the expected disk bandwidth. With hyperthreading enabled, hdparm reports dramatically less than the expected disk bandwidth. The problem is not as "severe" on 3.2.0-4 as on the 3.12 series, but it's still obviously there.

Next as a newbie, I don't want to waste valuable kernel developer time unless I'm kind of sure that I'm not missing something obvious. But since my hyperthreading issue has not been trivially resolved on this list, I'm sort of assured that I'm not missing that trivial something. If after a few more days I still don't have a trivial answer, I will file a kernel bug.

Last, I don't want anyone to feel insulted by what I have to say or the way that I say it. I'm just a straight shooter from way back. If I compare linux to MS Windows and it doesn't measure up in some fashion, it just means that it should measure up and I'll work to make that happen.

_____________________________ Data mentioned above starts here ___________________________________

/proc/version:

Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.54-2

#Hyperthreading enabled

sudo hdparm -tT

/dev/sda:
 Timing cached reads:   1744 MB in  2.00 seconds = 871.70 MB/sec
 Timing buffered disk reads: 120 MB in  3.05 seconds =  39.30 MB/sec

/dev/sdb:
 Timing cached reads:   1648 MB in  2.00 seconds = 823.93 MB/sec
 Timing buffered disk reads: 264 MB in  3.01 seconds =  87.83 MB/sec

/dev/sdc:
 Timing cached reads:   1682 MB in  2.00 seconds = 840.98 MB/sec
 Timing buffered disk reads:  92 MB in  3.04 seconds =  30.31 MB/sec

# Hyperthreading disabled

/dev/sda:
 Timing cached reads:   1786 MB in  2.00 seconds = 893.04 MB/sec
 Timing buffered disk reads: 440 MB in  3.01 seconds = 146.34 MB/sec
paula@dn2800mt:/mnt/wd1g/home/paula$ sudo hdparm -tT /dev/sdb

/dev/sdb:
 Timing cached reads:   1816 MB in  2.00 seconds = 907.74 MB/sec
 Timing buffered disk reads: 752 MB in  3.00 seconds = 250.34 MB/sec

/dev/sdc:
 Timing cached reads:   1744 MB in  2.00 seconds = 872.13 MB/sec
 Timing buffered disk reads:  96 MB in  3.00 seconds =  31.97 MB/sec

On 5/8/2014 6:44 AM, Henrique de Moraes Holschuh wrote:
On Mon, 05 May 2014, Paul Ausbeck wrote:
I've attached the contents of /proc/cpuinfo below, two copies, one
with hyperthreading disabled and one enabled.
As I told you, the *very first thing* you must do is to make sure you're
using the latest firmware for your motherboard (*especially* the BIOS/EFI).
If you're not, update it.  This bug reeks of a firmware issue.

cpuinfo looks normal for both cases, and the microcode is newer than
anything Intel ever published to the general public.

I've also investigated things a bit further and now I'm thinking
that the hyperthreading state affects the system as a whole, not
just hdparm.
That's expected.

First, I've attached hdparm output from the same machine booting to
Windows 7. The reported disk speed is not affected by the
hyperthreading state.  I've also attached boot speed measurements
for the two states. Windows 7 boot time with hyperthreading enabled
is 2/3 that when disabled. This would be expected if hyperthreading
is actually worth anything.

Second, it turns out that the boot speed of linux is either
unaffected by the state of hyperthreading, 3.2 kernel, or adversely
affected by enabling hyperthreading, 3.12 kernel. I've attached
I believe you will need to take this to LKML, unfortunately.  One
information that will help track down the issue, is to try several kernel
versions in order to try to pinpoint better when things went bad.

LKML: linux-kernel mailing list.

I'm thinking that the hdparm scenario is a good canary for a more
fundamental problem with hyperthreading, at least on my dn2800mt
machine. Perhaps the backports 3.12 kernel hasn't been fully vetted
Yes.  It makes it trivial to "reproduce the bug", so it would help tracking
the issue down immensely.

But you'll still need to do it with help from the LKML people, unless you
can handle the git bissecting yourself.


About git bissect (guides):
https://www.kernel.org/pub/software/scm/git/docs/git-bisect-lk2009.html

You can do this:
git bissect start

git bissect good <tag for good kernel version>
git bissect bad <tag for bad kernel version>

repeat git bissect good/bad as required to enter all datapoints you alread
have or manually tested.

You can move to any kernel version you want with "git reset --hard <tag>",
compile, test, and then mark it with "git bissect good" or "git bissect
bad".   git bissect will offer you a new test point when you do that.

Hint: when bissecting, for safety, first you should test and mark as "GOOD"
or "BAD" released/stable kernels, i.e. v3.12.8, v3.11.5, etc.  See above,
use "git reset --hard" to move to different kernel versions, recompile,
boot, test, "git bissect good"/"git bissect bad", rinse and repeat.  Try to
use a binary search pattern, to reduce the number of kernels you will have
to test.

Only after you got reasonably near the issue using the above, should you let
"git bissect" choose the test point, because it will usually land you
somewhere deep into the release-candidate kernels (or even worse, inside the
merge window), and those can be quite broken.

Therefore, also for safety, when testing these kernels boot to single-user
mode, run the hdparm test, note down what happened in a paper somewhere, and
reboot to a known release/stable kernel.  Only do any real work (such as the
git bissect stuff, compiling, etc) on a safe, known release/stable kernel.

Obviously, test single-user mode in your known release/stable kernel first,
just to make sure the bug doesn't disappear (or always appear) in
single-user mode :)



Reply to: