VXLAN remote mac flapping between IP's
I have several debian servers running multicast vxlan for a qemu/kvm
cluster in a fully routed mini-datacenter.
Each server has multiple links to multiple top of rack switches,
configured as ecmp point-to-points with /30's
Additionally, each server is running ospf via FRR and thus has routes
to each other /30 in this cluster.
Each server also has a lo:1 address set, but I don't think that's
coming into play here...
I've been seeing my dmesg on all members getting spammed with messages
[169833.510533] vxlan_1gig1: 46:65:af:13:b7:d1 migrated from
172.16.150.114 to 172.16.150.218
[169833.511121] vxlan_1gig1: 46:65:af:13:b7:d1 migrated from
172.16.150.218 to 172.16.150.114
I'm seeing this across all vxlan interfaces on each server.
The mac that's flapping is the remote address for the 1gig1 vxlan
interface on one of the other peers, but
.218 and .114 are both 10gig addresses on that same remote machine,
which doesn't make sense as they're connected to different switches,
and that vxlan is bound to a 1gig interface...
>From the documentation, I thought that kernel vxlan interfaces were
tied to a specific interface.
However, 150.114 is on a 10gig switch, and .218 is as well, whereas
this vxlan interface is bound to one of the 1gig links.
It looks like the kernel is seeing better-cost routes via the 10gig
interfaces, and then.... ignoring the device parameter?
There's no packetloss going on, and performance is unaffected. I'd
just like to know more about why this is happening.
I always have syslog and my graylog server, but both the ring buffer
and systemd journal are basically useless because of this constant
This is how I'm building the vxlan interface in
/etc/network/interfaces (identical config across all servers, just
I know I'm doing things less than ideally, but am I missing something
big here? Any tips for a better implementation?
iface lo inet loopback
iface lo:1 inet static
iface 1gig1 inet static
iface 10gig1 inet static
iface 10gig2 inet static
iface vxlan_1gig1 inet manual
up exec `ip link add vxlan_1gig1 type vxlan id 250 group 22.214.171.124
dstport 4789 ttl 2 dev 1gig1; ip link set vxlan_1gig1 up; ip addr add
172.16.250.2/24 dev vxlan_1gig1`
down exec `ip link set vxlan_1gig1 down; ip link del vxlan_1gig1`
# vxlan via 1gig1 - primary corosync network
iface vxlan_10gig1 inet manual
up exec `ip link add vxlan_10gig1 type vxlan id 251 group 126.96.36.199
dstport 4789 ttl 2 dev 10gig1; ip link set vxlan_10gig1 up; ip addr
add 172.16.251.2/24 dev vxlan_10gig1`
down exec `ip link set vxlan_10gig1 down; ip link del vxlan_10gig1`
# vxlan via 10gig1 - secondary corosync network, vm migration network
iface vxlan_10gig2 inet manual
up exec `ip link add vxlan_10gig2 type vxlan id 252 group
188.8.131.52 dstport 4789 ttl 2 dev 10gig2; ip link set vxlan_10gig2
up; brctl addif vmbr0 vxlan_10gig2`
down exec `ip link set vxlan_10gig2 down; ip link del vxlan_10gig2`
# vxlan via 10gig2 - vm lan
iface vmbr0 inet static
# vm network
I'd appreciate any help on this or ideas for further
troubleshooting/improvements, I'm deeply confused at this point but
that probably just means I have something setup wrong.