[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: [patch 3/2] m68k: Atari EtherNAT - add writew_be for data push



Hi,

> > > Looking at the SCSI driver in 2.6.26, it's no surprise - the error handling 
> > > update that I did for 2.6.18 is not included in Geert's current 2.6 patch 
> > > series. Looks like I botched the patch submission way back when. 
> > 
> > It's a pity. You put so much work in it but none of your patches
> > (EtherNEC, EtherNAT or SCSI) got it into a current kernel.
> > 
> > Is there a reason why?
> 
> When Michael thinks they're ready, he can submit them to netdev or linux-scsi
> ;-)

I'm happy for them to live in linux-m68k :-)

EtherNEC and EtherNAT are 'ready' in the sense they are working reliably. They 
are by no means 'ready in the sense that mainline kernel maintainers would be 
happy to take my patches. 

The reasons for this are:

EtherNEC needs to be rewritten based on the current ne.c code. I'm not sure if 
it can be merged into ne.c at this stage. There's a lot of code in the EtherNEC 
driver that can be removed now. Due to the lack of interrupts, there will still 
be the need of using a special interrupt handler guarding from interrupt 
processing before the ethernet device has been opened. 

I did recently rewrite the EtherNAT driver based on the current smc91x.c one. 
The remaining difference between the two are related to using a dedicated MFP 
timer interrupt for polling the card (the timer setup can be separated into a 
shared module for use by both EtherNEC and EtherNAT!). Again, we need to use a 
modified interrupt handler here, as long as I have not got real interrupts to 
work out. 

SCSI needs to be rewritten and merged with the Mac 5380 driver (and perhaps a 
few more). 
While trying to get it into a working state, I got rid of the 'scheduling in 
interrupt' bugs, but I now lose IDE interrupts after a while. Command abort 
processing is broken, apparently because a command times out while waiting to 
be queued. I can hack the SCSI timers again, but that was frowned upon in a 
major way two years ago. Can't really expect this to be accepted now.
The alternative would be to _never_ wait for the ST-DMA lock and let the the 
block device layer deal with it. IDE would get all the priority in that 
case. Given that the lock handling of IDE may be dodgy, that scares me a bit. 

Reasons enough for you? 

The network driver issues are mainly janitorial work (with the exception of 
enabling and using EtherNAT interrupts). I'll get to the EtherNEC cleanup 
eventually, but I'm not sure I'm up for the long haul to convince netdev people 
to apply my patches. (I'm not even convinced it's a good idea to have other 
developers mess with m68k specific stuff - look at the whole IDE lock imbalance 
debacle...)

SCSI is more tricky, and more difficult to test and get right (in both a 
technical and code maintainablilty sense). 
The rewrite of m68k NCR5380 SCSI drivers is something that has come up before, 
and I'm sure we'll get to that sooner rather than later. 

I'll focus on getting SCSI fixed, first and foremost. First stab at this is 
attached (not fudging with the timeouts, but failing the request instead if the 
lock cannot be obtained).

Stephen, can you try to apply that one to the current Debian installer kernel? 
If it fails, let me know and I'll prepare a 2.6.26 version. This should be good 
enough to allow for installation off SCSI CD, at least. 

	Michael

--- linux-2.6.28-geert/drivers/scsi/atari_scsi.c
+++ linux-2.6.28/drivers/scsi/atari_scsi.c
@@ -196,7 +196,7 @@
 static irqreturn_t scsi_tt_intr(int irq, void *dummy);
 static irqreturn_t scsi_falcon_intr(int irq, void *dummy);
 static void falcon_release_lock_if_possible(struct NCR5380_hostdata *hostdata);
-static void falcon_get_lock(void);
+static int falcon_get_lock(void);
 #ifdef CONFIG_ATARI_SCSI_RESET_BOOT
 static void atari_scsi_reset_boot(void);
 #endif
@@ -498,6 +498,16 @@
  * again (but others waiting longer more probably will win).
  */
 
+/*
+ * MSch 20061228: in 2.6.x, the fairness wait appears to open a race with
+ * concurrent use of the lock by the IDE driver. Once the lock is taken by
+ * IDE, the SCSI queuecmd function can no longer schedule because it is run
+ * from softirq context a lot. 
+ * Disable the fairness wait until I have had time to sort out the lock issues.
+ */
+#undef FALCON_FAIRNESS_WAIT
+#define FALCON_NEVER_SLEEP 1
+
 static void falcon_release_lock_if_possible(struct NCR5380_hostdata *hostdata)
 {
 	unsigned long flags;
@@ -519,7 +529,9 @@
 		}
 		falcon_got_lock = 0;
 		stdma_release();
+#if defined(FALCON_FAIRNESS_WAIT)
 		wake_up(&falcon_fairness_wait);
+#endif
 	}
 
 	local_irq_restore(flags);
@@ -540,21 +552,37 @@
  * Complicated, complicated.... Sigh...
  */
 
-static void falcon_get_lock(void)
+/* MSch 20061229: in order to avoid sleeping when the lock is not available, 
+ * return the current lock state here. atari_queue_command() will then return
+ * with error, causing the midlevel code to pause request submission.
+ */
+static int falcon_get_lock(void)
 {
 	unsigned long flags;
 
 	if (IS_A_TT())
-		return;
+		return 0;
 
 	local_irq_save(flags);
 
-	while (!in_irq() && falcon_got_lock && stdma_others_waiting())
+#if defined(FALCON_FAIRNESS_WAIT)
+	/* MSch: we may not sleep even while in softirq ! */
+	while (!in_interrupt() && falcon_got_lock && stdma_others_waiting())
 		sleep_on(&falcon_fairness_wait);
-
+#endif
 	while (!falcon_got_lock) {
 		if (in_irq())
 			panic("Falcon SCSI hasn't ST-DMA lock in interrupt");
+		/* we may not sleep in soft interrupts neither, so bail out */
+#if defined(FALCON_NEVER_SLEEP)
+		if (stdma_islocked()) {
+#else
+		if (in_softirq() && stdma_islocked()) {
+#endif
+			// printk(KERN_INFO "Falcon SCSI does not hold ST-DMA lock in softirq!\n" );
+			local_irq_restore(flags);
+			return 1;
+		}
 		if (!falcon_trying_lock) {
 			falcon_trying_lock = 1;
 			stdma_lock(scsi_falcon_intr, NULL);
@@ -569,6 +597,8 @@
 	local_irq_restore(flags);
 	if (!falcon_got_lock)
 		panic("Falcon SCSI: someone stole the lock :-(\n");
+
+	return 0;
 }
 
 
@@ -828,7 +858,7 @@
 	} else {
 		atari_turnon_irq(IRQ_MFP_FSCSI);
 	}
-	if ((rv & SCSI_RESET_ACTION) == SCSI_RESET_SUCCESS)
+	if (rv == SUCCESS)
 		falcon_release_lock_if_possible(hostdata);
 
 	return rv;
--- linux-2.6.28-geert/drivers/scsi/atari_scsi.h
+++ linux-2.6.28/drivers/scsi/atari_scsi.h
@@ -59,32 +59,6 @@
 #define	NCR5380_dma_xfer_len(i,cmd,phase) \
 	atari_dma_xfer_len(cmd->SCp.this_residual,cmd,((phase) & SR_IO) ? 0 : 1)
 
-/* former generic SCSI error handling stuff */
-
-#define SCSI_ABORT_SNOOZE 0
-#define SCSI_ABORT_SUCCESS 1
-#define SCSI_ABORT_PENDING 2
-#define SCSI_ABORT_BUSY 3
-#define SCSI_ABORT_NOT_RUNNING 4
-#define SCSI_ABORT_ERROR 5
-
-#define SCSI_RESET_SNOOZE 0
-#define SCSI_RESET_PUNT 1
-#define SCSI_RESET_SUCCESS 2
-#define SCSI_RESET_PENDING 3
-#define SCSI_RESET_WAKEUP 4
-#define SCSI_RESET_NOT_RUNNING 5
-#define SCSI_RESET_ERROR 6
-
-#define SCSI_RESET_SYNCHRONOUS		0x01
-#define SCSI_RESET_ASYNCHRONOUS		0x02
-#define SCSI_RESET_SUGGEST_BUS_RESET	0x04
-#define SCSI_RESET_SUGGEST_HOST_RESET	0x08
-
-#define SCSI_RESET_BUS_RESET 0x100
-#define SCSI_RESET_HOST_RESET 0x200
-#define SCSI_RESET_ACTION   0xff
-
 /* Debugging printk definitions:
  *
  *  ARB  -> arbitration
--- linux-2.6.28-geert/drivers/scsi/atari_NCR5380.c
+++ linux-2.6.28/drivers/scsi/atari_NCR5380.c
@@ -956,23 +956,6 @@
 		}
 #endif
 
-	/*
-	 * We use the host_scribble field as a pointer to the next command
-	 * in a queue
-	 */
-
-	SET_NEXT(cmd, NULL);
-	cmd->scsi_done = done;
-
-	cmd->result = 0;
-
-	/*
-	 * Insert the cmd into the issue queue. Note that REQUEST SENSE
-	 * commands are added to the head of the queue since any command will
-	 * clear the contingent allegiance condition that exists and the
-	 * sense data is only guaranteed to be valid while the condition exists.
-	 */
-
 	local_irq_save(flags);
 	/* ++guenther: now that the issue queue is being set up, we can lock ST-DMA.
 	 * Otherwise a running NCR5380_main may steal the lock.
@@ -987,10 +970,42 @@
 	 * alter queues and touch the lock.
 	 */
 	if (!IS_A_TT()) {
+		int rv;
+		/* MSch: since we may now get called from softirq context here, 
+		 * we cannot always sleep while waiting for the lock.
+		 * Use status from falcon_get_lock() to detect if the lock was
+		 * successfully taken, and return appropriate status to midlevel
+		 * otherwise.
+		 * We used to pause the midlevel SCSI timer here; races between 
+		 * command timeout and lowlevel error handling should not hurt
+		 * because the command isn't in any of the queues yet.
+		 */
 		/* perhaps stop command timer here */
-		falcon_get_lock();
+		rv = falcon_get_lock();
 		/* perhaps restart command timer here */
+		if (rv) {
+			local_irq_restore(flags);
+			return SCSI_MLQUEUE_HOST_BUSY;
 	}
+	}
+
+	/*
+	 * We use the host_scribble field as a pointer to the next command
+	 * in a queue
+	 */
+
+	SET_NEXT(cmd, NULL);
+	cmd->scsi_done = done;
+
+	cmd->result = 0;
+
+	/*
+	 * Insert the cmd into the issue queue. Note that REQUEST SENSE
+	 * commands are added to the head of the queue since any command will
+	 * clear the contingent allegiance condition that exists and the
+	 * sense data is only guaranteed to be valid while the condition exists.
+	 */
+
 	if (!(hostdata->issue_queue) || (cmd->cmnd[0] == REQUEST_SENSE)) {
 		LIST(cmd, hostdata->issue_queue);
 		SET_NEXT(cmd, hostdata->issue_queue);
@@ -1014,10 +1029,12 @@
 	 * If we're not in an interrupt, we can call NCR5380_main()
 	 * unconditionally, because it cannot be already running.
 	 */
-	if (in_interrupt() || ((flags >> 8) & 7) >= 6)
+	/* As of 2.6.19, the coroutine has to be put on the task queue instead
+	 * of being called directly. It might still be called directly if not 
+	 * in softirq, though. Need to test this.  
+	 */
 		queue_main();
-	else
-		NCR5380_main(NULL);
+
 	return 0;
 }
 
@@ -2631,7 +2648,7 @@
  *	host byte of the result field to, if zero DID_ABORTED is
  *	used.
  *
- * Returns : 0 - success, -1 on failure.
+ * Returns : SUCCESS - success, FAILED on failure.
  *
  * XXX - there is no way to abort the command that is currently
  *	 connected, you have to wait for it to complete.  If this is
@@ -2701,11 +2718,11 @@
 			local_irq_restore(flags);
 			cmd->scsi_done(cmd);
 			falcon_release_lock_if_possible(hostdata);
-			return SCSI_ABORT_SUCCESS;
+			return SUCCESS;
 		} else {
 /*			local_irq_restore(flags); */
 			printk("scsi%d: abort of connected command failed!\n", HOSTNO);
-			return SCSI_ABORT_ERROR;
+			return FAILED;
 		}
 	}
 #endif
@@ -2729,7 +2746,7 @@
 			 * yet... */
 			tmp->scsi_done(tmp);
 			falcon_release_lock_if_possible(hostdata);
-			return SCSI_ABORT_SUCCESS;
+			return SUCCESS;
 		}
 	}
 
@@ -2747,7 +2764,7 @@
 	if (hostdata->connected) {
 		local_irq_restore(flags);
 		ABRT_PRINTK("scsi%d: abort failed, command connected.\n", HOSTNO);
-		return SCSI_ABORT_SNOOZE;
+		return FAILED;
 	}
 
 	/*
@@ -2782,7 +2799,7 @@
 			ABRT_PRINTK("scsi%d: aborting disconnected command.\n", HOSTNO);
 
 			if (NCR5380_select(instance, cmd, (int)cmd->tag))
-				return SCSI_ABORT_BUSY;
+				return FAILED;
 
 			ABRT_PRINTK("scsi%d: nexus reestablished.\n", HOSTNO);
 
@@ -2809,7 +2826,7 @@
 					local_irq_restore(flags);
 					tmp->scsi_done(tmp);
 					falcon_release_lock_if_possible(hostdata);
-					return SCSI_ABORT_SUCCESS;
+					return SUCCESS;
 				}
 			}
 		}
@@ -2835,7 +2852,7 @@
 	 */
 	falcon_release_lock_if_possible(hostdata);
 
-	return SCSI_ABORT_NOT_RUNNING;
+	return SUCCESS;
 }
 
 
@@ -2844,7 +2861,7 @@
  *
  * Purpose : reset the SCSI bus.
  *
- * Returns : SCSI_RESET_WAKEUP
+ * Returns : SUCCESS
  *
  */
 
@@ -2934,7 +2951,7 @@
 	 * the midlevel code that the reset was SUCCESSFUL, and there is no
 	 * need to 'wake up' the commands by a request_sense
 	 */
-	return SCSI_RESET_SUCCESS | SCSI_RESET_BUS_RESET;
+	return SUCCESS;
 #else /* 1 */
 
 	/* MSch: new-style reset handling: let the mid-level do what it can */
@@ -2982,6 +2999,7 @@
 	local_irq_restore(flags);
 
 	/* we did no complete reset of all commands, so a wakeup is required */
-	return SCSI_RESET_WAKEUP | SCSI_RESET_BUS_RESET;
+	/* The new error handler code implicitly does this for us anyway */
+	return SUCCESS;
 #endif /* 1 */
 }

Reply to: