On Thu, 9 May 2013 00:33:00 +1000 Steven McDonald <steven@steven-mcdonald.id.au> wrote: > I think this is ultimately not an efficient means of investigation to > pursue, so I plan to drop it and go back to trial and error -- I'll > add more and more of the responsible function definition until it > breaks, and then we can point at a specific line of code. I've made some amount of headway with this. The last two semicolon-lines in scsi_ata_pass_16 seem to be responsible for the hang; either one on its own is sufficient to hang the kernel, but commenting out only these lines and leaving the rest of the function intact results in normal behaviour. I'm confused about this for two reasons: - I can't see anything in either of these lines that would cause any problems at all (although this may be down to my lack of experience with kernel programming and C in general); and - the commit which introduces this function definition does not appear to introduce any calls to the function, so the code shouldn't even be being used. At the moment, I have no ideas on how to resolve the first of those points. My initial thought was that perhaps memory was being allocated improperly and thus writing to ata_cmd->control (the last field in a struct ata_pass_16) was actually writing to some other kernel data structure; however, the relevant type being passed into this function (a struct containing cdb_t, defined in sys/cam/cam_ccb.h) seems to have this data allocated as an array of IOCDBLEN u_int8_t's. IOCDBLEN is defined to be CAM_MAX_CDBLEN, which in turn is defined as 16 -- just big enough to fit an ata_pass_16. To gather more information, I think it would be useful to know if and where this function is being called from. I'm currently rebuilding the kernel with the attached patch applied, which comments out the problematic lines of code and causes a panic just before that. I don't know if the kernel is able to write anything to the console this early in boot, but if so this should enable me to get a backtrace which may shed some more light on the matter. As a final note, I tried rebuilding r250204 with the problematic lines commented out to see what would happen, and I'm getting a different sort of boot failure -- the screen is littered with seemingly random (though identical on every boot) ASCII characters of varying colour. It looks like there is a second problem introduced after r248992 but before r250204, but I'm hoping to fix the current problem before moving onto the next one. Any suggestions as to how to proceed would be welcome. =) Thanks, Steven.
--- a/sys/cam/scsi/scsi_all.c +++ b/sys/cam/scsi/scsi_all.c @@ -5885,6 +5885,8 @@ } else ata_cmd->device |= (lba >> 24) & 0x0f; ata_cmd->command = command; + panic("Help me Obi-Wan Kenobi, you're my only hope!"); + /* ata_cmd->control = control; cam_fill_csio(csio, @@ -5897,6 +5899,7 @@ sense_len, sizeof(*ata_cmd), timeout); + */ } void
Attachment:
signature.asc
Description: PGP signature