[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Intel X540-AT2 and Debian: intermittent connection



On Sat, 2022-11-19 at 13:35 -0800, David Christensen wrote:
> On 11/19/22 06:50, hw wrote:
> > On Fri, 2022-11-18 at 17:02 -0800, David Christensen wrote:
> 
> > > ... I suggest trying a Category 6A factory patch cable at least 2 meters
> > > long.
> > 
> > I tried it with a 10m cat6 cable and the connection was intermittent.  It's
> > the
> > same (as in "identical to") cable that works between the other server and a
> > client.
> 
> 
> Okay.  I suggest putting a unique mark/ serial number on each cable for 
> tracking purposes until you resolve the intermittent connection issue.

What for?  All the cables I used except for the new ones are known good.

> > > What OS's for the various machines?
> > 
> > Fedora on the server and Debian on the backup server, Fedora on the client.
> 
> 
> Okay.  If the NIC works correctly in the backup server with Fedora, 
> maybe you should just use Fedora.

Unfortunately it doesn't work anymore with Fedora either ...  I tried it with a
live system if it would work and it didn't.

> > > Do you compile your own kernels and/or NIC drivers?
> > 
> > No, I'm using the kernels that come with the distributions.  
> 
> 
> Okay.  That is the safest approach.

And it's convenient :)

> > I did compile the
> > driver (i. e. module) from the source on Intels web site to see if a
> > different
> > driver would make a difference, and it didn't, so I restored the "original"
> > module.
> 
> 
> Okay.  Too bad it did not work; that seemed like a good suggestion.

It's good that it didn't make a difference because I won't be able to keep the
Intel source working.  Sooner or later the kernel will be incompatible, and
until then, I might have to recompile it for every new kernel version.  It's
always better to use hardware that is supported by the modules that come with
the kernel.

> > > If you have another Broadcom NIC, what happens if you swap it with the
> > > Intel NIC in the backup server?
> > 
> > I haven't tried yet because when I swap cards around, I'll have to redo the
> > configuration and the server has some network cards passed through to a VM
> > running OPNsense.  I don't want to mess with that.
> 
> 
> Perhaps that is a good reason to do some devops development -- e.g. 
> write a data-driven script that reads a configuration file to 
> interconnect the VM virtual network interfaces and host physical network 
> interfaces.

Why would I do that?  How would a script figure out which interface is which and
how would it guarantee that they will be exactly the same as seen by OPNsense
running in that VM when I switch them around?  I'm not saying it's impossible,
but I'd rather resolve this problem in a timely manner and not in a couple years
when I might have finished the script and tested it in a bunch of servers which
aren't even relevant.

> I prefer to use a dedicated hardware device for my LAN (UniFi Security 
> Gateway).

Ubiquity sucks.  I'd prefer to run OPNsense on dedicated hardware, but
electricity is insanely expensive here, and OPNsense works fine in this VM with
no issues whatsoever in over a year now.

> > I suspect it's a mainboard issue.  I pulled the Intel card and then the on-
> > board
> > network card quit working.  
> 
> 
> With the current Debian installation?

yes

> Did you try the d-i rescue shell 

You mean the rescue system that comes with the Debian installer?  No, I haven't.
How would that make a difference?

> or any live sticks?

only the Fedora one

> > I plugged the Intel card back in and the on-card
> > worked again.  I'd try disabling the on-board card but there is no option to
> > do
> > that in the BIOS.
> 
> 
> Okay.  That indicates the issue is software.

How would it be a software issue affecting a network card from a totally
different manufacturer in a PCI slot that the BIOS doesn't have an option to
disable the on-board network card?

> > > Do you have any diagnostic information that indicates the Intel NIC is
> > > overheating?
> > 
> > No, the idea that it might overheat is from internet searches revealing that
> > some people had issues with the card overheating and adding a fan blowing on
> > the
> > heatsink fixed the problem.  I always had a fan blowing over it from the top
> > of
> > the card, so that should be fine, and placing another fan directly on the
> > heatsink didn't make a difference.  I took the extra fan out today when I
> > was at
> > it because it's awfully loud --- it's an old Delta fan from 2003 that comes
> > from
> > an old IBM server and it makes a good airstream :)
> > 
> > The heat sink looks fine and unfortunately, it's designed in such a way that
> > I
> > can't remove it without breaking the pins holding the heatsink to the card,
> > so I
> > decided not to touch it.  That's how I discovered that the on-board network
> > card
> > quit working when the Intel card wasn't plugged in ...
> > 
> > Perhaps it's some kind of resource conflict or incompatibility, or the board
> > is
> > broken.
> 
> 
> At this point, all I can suggest is a program of A/B testing to isolate 
> the faulty hardware and/or software component(s).  Beware that you may 
> have multiple faults, so be meticulous.

The only thing I can do is try the network card that's in the client now.  It'll
be about a week before I can get to that.

> I prefer FreeBSD for my servers.  The "Intel ® Ethernet Controller 
> Products 27.7 Release Notes" indicate the "ix" driver is supported and 
> tested on FreeBSD 13 and FreeBSD 12.3 ("Fedora" and "Debian" appear 
> nowhere in that document):

Good idea, I can try this maybe: https://www.nomadbsd.org/

I'd be surprised if it worked, but maybe it does and if it does, I could just as
well use FreeBSD for the backup server.



Reply to: