Bug#458154: Processed: Re: Bug#458154: network-console: long time-out time during install
Colin Watson wrote:
On Mon, Jan 07, 2008 at 12:12:21AM +0100, Frans Pop wrote:
However, the fact remains that _we_ have so far not been able to reproduce
the issue. As I've said earlier, I've had an SSH install sitting unused for
over 4 hours without the connection being lost, with basically default SSH
settings both on the SSH client machine and in the installer.
So the question still is _why_ ssh drops the connection in your case.
My specific network configuration at the time of the installation - both
failed attempts and the successful one - was (and is):
* target "slug" on internal LAN (typical RFC1918 class C, 192.168.0.*)
* ssh client (i.e., where I ran the "ssh installer@myslug" command)
is itself behind a Netscreen 5XP VPN device; so indeed, there is
"something" interesting about my network topology
* ssh client is running RHEL4 - don't ask why and I won't feel lamer
than I already do ;-)
It's common enough for the network to be at fault here (depending on
your preferred definition of "fault"); for example, entries in NAT
tables can time out, which will cause the connection to die when you
next come back to it and try to send packets. Setting
ServerAliveInterval on the client side, as Del did, is probably the best
response.
It's worth noting that before I modified my .ssh/config my ssh clients
would indeed time out when connected to remote hosts. Since I made the
change my connections remain solid, both to the slug and to other
systems. So for me, this will be a "permanent" change, at least until
it bites me for some other reason.
ServerAliveInterval is not enabled by default because it has negative
consequences for people with connections that are unreliable in a
different way. Bob Proulx recently put it like this on the
openssh-unix-dev mailing list:
8< snip 8<
Also, the solution you propose is on the _client_ side, so is not something
we can fix in the installer. The only thing we could do at this point is
document it.
Setting ClientAliveInterval in the installer's sshd configuration would
have a similar effect, but suffers from the same trade-off mentioned
above. We'd simply get a different set of bugs of approximately the same
severity from a different set of people.
I agree that documenting this is the best approach.
I am happy with a documentation "fix" as well. The wording that has
been floated (and is being tweaked) in other parts of this bug's thread
look fine to me and would have likely helped me avoid the
originally-reported issue. Thanks!
In response to Frans Pop's comments dated Mon, 7 Jan 2008 00:12:21 +0100:
On Friday 04 January 2008, G. Del Merritt wrote:
> Please note: I was not able to recover "gracefully" from the timeout.
That is expected. As the installation process itself runs almost
entirely in
memory and also all "state" information is kept completely in
memory, it is
extremely hard to reliably support resuming installations from a random
point.
However, you _could_ certainly have resumed the installation from
certain
points after starting a new session:
- it is always safe to resume by starting partitioning again
- it is even possible to restart base installation, though the installer
may warn that "the base system is dirty"
I tried to restart the base install and the system did warn about
possibly being "dirty"; unfortunately I don't think I have any screen
captures aside from what I have provided already. However, after a few
moments of "thinking", the installer always put me back to a red
background saying it could not proceed. This is why I started from scratch.
-Del
p.s. - with regard to the reproducibility status of the issue, my
apologies; I misread the email that noted the change.
Reply to: