[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: esp_scsi, was Re: Modified Linux 4.1.20 (mac+scsi) on Quadra 660av



On Mon, 31 Oct 2016, Michael Schmitz wrote:

> Am 30.10.2016 um 18:52 schrieb Finn Thain:
> > 
> > On Sun, 30 Oct 2016, Michael Schmitz wrote:
> > 
> > > but what makes the driver work with most disks but fail with some? 
> > > Different SCSI command set supported by the disks?
> > 
> > Anything that would inhibit disconnection/reselection should avoid the 
> > bug. (BTW ISTR that my SCSI2SD never disconnects a command.)
> > 
> > I think the target has to signal SEL-with-ATN during reselection in 
> > order to trigger the bug. I think this is not part of SCSI-1.
> > 
> > The SEL-with-ATN process is discussed in the "Configuration Register 
> > 3" section, Bit 3, in 
> > http://www.ibiblio.org/pub/historic-linux/early-ports/Sparc/NCR/NCR53C9X.txt
> 
> What chip revision is this doc for?

It can only be FAS100A. That's the document which David cited when he 
posted his ESP_CONFIG2_SCSI2ENAB patch in 2014, which is why I was using 
it. I think the documents Kars cited may be more relevant in this case:
http://www.spinics.net/lists/linux-scsi/msg73856.html
(those documents are all available from bitsavers).

> As far as I can make out, it should be fas100a? The esp_scsi.h comments 
> suggest the tagged queuing bit is for revisions fas236. The chips used 
> on Macs were probably esp236 or fas236? Depending on the

The Quadras I've used have ESP236 chips, according to esp_scsi detection. 

> 
> Not sure whether select with attention is present in SCSI-1,

I think SCSI-1 defines SEL-with-ATN for the initiator but not the target. 
In this case (reselection) we have the target doing it.

> but tagged queueing (which requires extended messages) is a SCSI-2 
> feature.
> 
> So we will need bit 3 in config2 set if any target support SCSI-2. From 
> my reading of esp_get_rev(), it's not currently set there (cleared when 
> config2 readback test is prepared and not restored after it's clear we 
> have better than ESP100). I suggest config2 is set to 
> ESP_CONFIG2_SCSI2ENAB before the config3 readback test is done.
> 
> Now what I'm unclear on is the interaction between the config2 setting 
> and the value of bit 3 in config3:
> 
> > 	Bit 3 Queue Tag Enable
> > 
> > 	When this bit is set, the FSC can receive 3-byte messages during 
> > businitiated select with ATN.  This feature is also enabled by setting 
> > bit 3 in the Configuration 2 register.  The message bytes consist of a 
> > 1-byte Identify message and a 2-byte Queue Tag message. The middle 
> > byte is the tagged queue message itself and the last byte is the tag 
> > value (0 to 255).  When this bit is set, the second byte is cheked to 
> > see if it is a valid queue taggin message.  If the value of the byte 
> > is not 20h, 21h, or 22h, the sequence halts and an interrupt is 
> > generated.  When this bit is not set, the chip aborts the select with 
> > ATN sequence after it receives one Identify message byte, if ATN is 
> > still asserted.
> 
> The ESP driver explicitly clears this bit and only sets it for the 
> FAS236 variant (using it as fast clock bit there). The logic for setting 
> a target's config3 value refer to the stored config for target 0, and 
> the only place I can see that get initialized is again in esp_get_rev(). 
> First to 5, then to 0 once it's clear the register even exists. We could 
> set bit 3 here, but it may not matter for the chip revisions that we are 
> interested in... Better set bit 6 (ESP_CONFIG3_TBMS) in our 
> esp_slave_configure() hook?

That hook lets particular board-specific drivers disable particular 
features for particular targets. It would be better, I think, to fix this 
in the core driver, so that any target is free to use three byte messages 
regardless of the board-specific driver (except ESP100 I guess).

Besides, ESP_CONFIG3_EWIDE == ESP_CONFIG3_TBMS, so this bit will get wiped 
in esp_reset_cleanup() anyway. Probably this should be conditional on 
FASHME. I think there's a similar bug in esp_data_bytes_sent().

> 
> > 
> >>> I wonder why this problem didn't affect the old ESP driver? (Stan 
> >>> pointed out earlier in this thread that "Debian 3.1 (with a Linux 
> >>> 2.2.25 kernel) works on the Quadra 660av".)
> >>
> >> Yep, I wonder about the same thing. Changes to SCSI domain 
> >> validation, perhaps even not at the driver but mid level?
> > 
> > Quite possible. I recall you said that you were able to resolve the 
> > issue with a patch to domain validation in the SCSI ML which would 
> > inhibit tagged queueing.
> > 
> > But we really need David's patch, otherwise we probably end up with 
> > domain validation reporting a SCSI-2 device with the ESP left in 
> > SCSI-1 mode.
> 
> Yep, I agree. David's old patch doesn't apply anymore but it's quite 
> straightforward to reimplement.
> 
> > 
> >>
> >> I vaguely recall that the old Mac ESP driver explicitly disabled 
> >> features like sync negotiation and perhaps others, and the interface 
> >> to do that had changed or disappeared after the 2.6 series? (No idea 
> >> really when that happened - my last interaction with the ESP driver 
> >> would have been almost 18 years ago. Totally different code.)
> > 
> > The present mac_esp also disables sync transfers for PIO mode because 
> > I found that necessary. (Async PIO might be impossible.)
> > 
> >                 printk(KERN_INFO PFX "using PIO for controller %d\n", dev->id);
> >                 esp_write8(0, ESP_TCLOW);
> >                 esp_write8(0, ESP_TCMED);
> >                 esp->flags = ESP_FLAG_DISABLE_SYNC;
> >                 mac_esp_ops.send_dma_cmd = mac_esp_send_pio_cmd;
> > 
> > Tuomas reported that PIO instead of DMA did not affect the bug. But my 
> > recall is that the disk which would not work on my Quadra 660av (which 
> > uses PIO exclusively) did in fact work on a different kind of Quadra 
> > (which used PDMA, but also probably has different ESP silicon).
> > 
> > Anyway, is safe to do this?
> > 
> >         esp->ops->send_dma_cmd(esp, esp->command_block_dma,
> >                                2, 2, 1, ESP_CMD_DMA | ESP_CMD_TI);
> > 
> >         /* ACK the message.  */
> >         scsi_esp_cmd(esp, ESP_CMD_MOK);
> > 
> > For PIO and PDMA, the transfer will be finished before the MOK 
> > ("Message Accepted") command, but for DMA this seems to be racey...
> 
> It sure is racey. Is there a DMA complete interrupt that you can wait 
> for before proceeding with the ACK?

That code comes from esp_reconnect_with_tag(). The interrupt you asked 
about never shows up, hence, "Reconnect IRQ2 timeout".

Anyway, there must about 3 potential patches from this discussion. I say 
we do as Geert suggests and move the discussion to the kernel.org lists 
(with code, of course).

-- 

> 
> Cheers,
> 
> 	Michael
> 
> 


Reply to: