[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Early boot failures with kFreeBSD >= 10.0~svn248992



On Thu, 9 May 2013 00:33:00 +1000
Steven McDonald <steven@steven-mcdonald.id.au> wrote:

> I think this is ultimately not an efficient means of investigation to
> pursue, so I plan to drop it and go back to trial and error -- I'll
> add more and more of the responsible function definition until it
> breaks, and then we can point at a specific line of code.

I've made some amount of headway with this. The last two
semicolon-lines in scsi_ata_pass_16 seem to be responsible for the
hang; either one on its own is sufficient to hang the kernel, but
commenting out only these lines and leaving the rest of the function
intact results in normal behaviour.

I'm confused about this for two reasons:

 - I can't see anything in either of these lines that would cause any
   problems at all (although this may be down to my lack of experience
   with kernel programming and C in general); and

 - the commit which introduces this function definition does not appear
   to introduce any calls to the function, so the code shouldn't even
   be being used.

At the moment, I have no ideas on how to resolve the first of those
points. My initial thought was that perhaps memory was being allocated
improperly and thus writing to ata_cmd->control (the last field in a
struct ata_pass_16) was actually writing to some other kernel data
structure; however, the relevant type being passed into this function
(a struct containing cdb_t, defined in sys/cam/cam_ccb.h) seems to have
this data allocated as an array of IOCDBLEN u_int8_t's. IOCDBLEN is
defined to be CAM_MAX_CDBLEN, which in turn is defined as 16 -- just
big enough to fit an ata_pass_16.

To gather more information, I think it would be useful to know if and
where this function is being called from. I'm currently rebuilding the
kernel with the attached patch applied, which comments out the
problematic lines of code and causes a panic just before that. I don't
know if the kernel is able to write anything to the console this early
in boot, but if so this should enable me to get a backtrace which may
shed some more light on the matter.

As a final note, I tried rebuilding r250204 with the problematic lines
commented out to see what would happen, and I'm getting a different
sort of boot failure -- the screen is littered with seemingly random
(though identical on every boot) ASCII characters of varying colour. It
looks like there is a second problem introduced after r248992 but
before r250204, but I'm hoping to fix the current problem before moving
onto the next one.

Any suggestions as to how to proceed would be welcome. =)

Thanks,
Steven.
--- a/sys/cam/scsi/scsi_all.c
+++ b/sys/cam/scsi/scsi_all.c
@@ -5885,6 +5885,8 @@
 	} else
 		ata_cmd->device |= (lba >> 24) & 0x0f;
 	ata_cmd->command = command;
+	panic("Help me Obi-Wan Kenobi, you're my only hope!");
+	/*
 	ata_cmd->control = control;
 
 	cam_fill_csio(csio,
@@ -5897,6 +5899,7 @@
 		      sense_len,
 		      sizeof(*ata_cmd),
 		      timeout);
+	*/
 }
 
 void

Attachment: signature.asc
Description: PGP signature


Reply to: