[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Ethernet interface numbering in etch



Hi,

I have spent the past few days trying to figure out why some of our
machines seem to have ethernet interface numbers that jump around --
eth0 one day, then eth4 or eth5 another.

The culprit comes down to udev.  I've filed a bug #416284 against it for
this.

Basically, udev is trying to assign persistent names to interfaces.  But
when the interface it assigns a persistent name to doesn't exist
anymore, udev prevents other interfaces from using it.

This fact is under-documented (should be listed in the interfaces
manpage at least) and causes serious breakage in a number of situations.
It has been Linux practice for years (and most other Unices are similar)
that eth0 is the first ethernet card in the system.  Now sometimes that
first card is eth5 or something.

This is bad for quite a number of scenarios and reasons:

 * I have a 1U dedicated server from a colo provider.  If its
   motherboard fries and they move the hard disk to a new box,
   the network won't come up because eth0 was configured statically,

 * dmesg output still mentions hardware using eth0, even if you can't
   talk to it at eth0 but must instead use eth5.  dmesg doesn't
   mention this fact, making it difficult to track down problems.

 * If I replace a NIC in a box, and the box is running etch in a default
   configuration, it will no longer bring up the network on boot because
   the device name changed.  If the box is using NFS, NIS, or LDAP,
   people may even have trouble logging in to it.

 * If a hard disk is moved from one box to another, the network won't
   come up on boot.

 * If a tool like systemimager is being used to image machines,
   the imaged systems won't work because eth0 is not being brought up.

It's non-obvious what caused the problem or how to fix it.  Fixing it
requires knowledge of udev and deleting a file and a symlink in
/etc/udev/rules.d.  Not very user-friendly.

I'm posting here because there was a discussion in IRC about what the
right way is, and it seems like a more general question.

I think that the right thing to do is to assign the persistent names to
network devices that still exist in the system, but to do nothing with
any other network devices.  That will allow systems to still boot and
come up properly in the face of network hardware changes.

-- John



Reply to: