Hi all,
Hi Jiri, Jiri Polach wrote:On Ben's advice I am trying to locate the commit that causes the problem to appear more precisely using 'git bisect'. However, too many of generated revisions are unbootable so I have to use 'bisect skip' frequently.Ok, so I've looked over the log at<http://bugs.debian.org/647095>, and this seems totally weird. Have I described the symptoms correctly below? (Warning: I am making some guesses, especially at step 5. In case of doubt, see the bug log just mentioned.) 1. Disable SMT in the BIOS. 2. Boot a bad kernel. /proc/cpuinfo (correctly) shows one entry per core. 3. "shutdown -h now". Enter BIOS. SMT is still disabled. Don't save. 4. Boot any kernel. /proc/cpuinfo shows two entries per core. 5. "shutdown -h now". Boot any kernel. /proc/cpuinfo still shows two entries per core. 6. "shutdown -h now". Enter BIOS. SMT is still disabled. Save. Now /proc/cpuinfo will (correctly) shows one entry per core. Reproducible for Jiri with v3.0.4.
Yes, this is exactly how it works. Something happens when kernel shuts down. Not when kernel reboots.
Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and many of the topic branches merged in the 2.6.38 merge window work ok. Some other topic branches do not boot at all. Jiri: if you have gitk installed, then "git bisect visualize" can help get a sense of what's in the middle of the regression range. "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way to find mainline commits to test before finding a topic branch to delve into.
I have been able to narrow the interval manually a little bit from the "top" (the bad side) and I will go on from the bottom now. However, there seems to be a large area where kernels are unbootable for me - they mostly stop when init is called and I do not know why.
x86 people: do the symptoms seem familiar? Any hints for tracking it down?
Please! I have spent more than a month trying to resolve it. I cannot revert back to 2.6.37 kernels and I cannot live with SMT changing on every shutdown - I have too many servers to allow such unusual behavior ...
Thank you, Jiri Polach
Thanks and hope that helps, Jonathan