[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#458154: Processed: Re: Bug#458154: network-console: long time-out time during install



Colin Watson wrote:
On Mon, Jan 07, 2008 at 12:12:21AM +0100, Frans Pop wrote:
However, the fact remains that _we_ have so far not been able to reproduce the issue. As I've said earlier, I've had an SSH install sitting unused for over 4 hours without the connection being lost, with basically default SSH settings both on the SSH client machine and in the installer.

So the question still is _why_ ssh drops the connection in your case.

My specific network configuration at the time of the installation - both failed attempts and the successful one - was (and is):

   * target "slug" on internal LAN (typical RFC1918 class C, 192.168.0.*)
   * ssh client (i.e., where I ran the "ssh installer@myslug" command)
     is itself behind a Netscreen 5XP VPN device; so indeed, there is
     "something" interesting about my network topology
   * ssh client is running RHEL4 - don't ask why and I won't feel lamer
     than I already do ;-)


It's common enough for the network to be at fault here (depending on
your preferred definition of "fault"); for example, entries in NAT
tables can time out, which will cause the connection to die when you
next come back to it and try to send packets. Setting
ServerAliveInterval on the client side, as Del did, is probably the best
response.

It's worth noting that before I modified my .ssh/config my ssh clients would indeed time out when connected to remote hosts. Since I made the change my connections remain solid, both to the slug and to other systems. So for me, this will be a "permanent" change, at least until it bites me for some other reason.

ServerAliveInterval is not enabled by default because it has negative
consequences for people with connections that are unreliable in a
different way. Bob Proulx recently put it like this on the
openssh-unix-dev mailing list:

8< snip 8<

Also, the solution you propose is on the _client_ side, so is not something we can fix in the installer. The only thing we could do at this point is document it.

Setting ClientAliveInterval in the installer's sshd configuration would
have a similar effect, but suffers from the same trade-off mentioned
above. We'd simply get a different set of bugs of approximately the same
severity from a different set of people.

I agree that documenting this is the best approach.

I am happy with a documentation "fix" as well. The wording that has been floated (and is being tweaked) in other parts of this bug's thread look fine to me and would have likely helped me avoid the originally-reported issue. Thanks!

In response to Frans Pop's comments dated Mon, 7 Jan 2008 00:12:21 +0100:

   On Friday 04 January 2008, G. Del Merritt wrote:
    > Please note: I was not able to recover "gracefully" from the timeout.

   That is expected. As the installation process itself runs almost
   entirely in
   memory and also all "state" information is kept completely in
   memory, it is
   extremely hard to reliably support resuming installations from a random
   point.

   However, you _could_ certainly have resumed the installation from
   certain
   points after starting a new session:
   - it is always safe to resume by starting partitioning again
   - it is even possible to restart base installation, though the installer
     may warn that "the base system is dirty"

I tried to restart the base install and the system did warn about possibly being "dirty"; unfortunately I don't think I have any screen captures aside from what I have provided already. However, after a few moments of "thinking", the installer always put me back to a red background saying it could not proceed. This is why I started from scratch.

-Del

p.s. - with regard to the reproducibility status of the issue, my apologies; I misread the email that noted the change.




Reply to: