Networking Best Practices

Replace Ethernet NICs

You may want to replace the Ethernet NICs of a bare-metal node in a SUSE Virtualization cluster for various reasons, including the following:

Malfunction or damage
Insufficent hardware capacity
Missing features

You can follow the steps below and run them in each node step by step.

Pre-Replacement Checks

Verify that the installed SUSE Virtualization version supports the new NICs.
Test the new NICs in non-production environment.
On the Virtual Machines screen of the SUSE Virtualization UI, verify that the status of all VMs is either Running or Stopped.
On the embedded SUSE Storage UI, verify that the status of all SUSE Storage volumes is Healthy.
(Optional) On the Harvester Support screen, generate a support bundle for comparison purposes.

Collect Information

Before any action is taken, it is important to collect the current network information and status.

SUSE Virtualization network configuration: By default, SUSE Virtualization creates a bond interface named mgmt-bo for the management network and one new bond interface for each cluster network. SUSE Virtualization saves network configuration details in the file /oem/90_custom.yaml.

Example: A NIC named ens3 was added to the mgmt-bo bond interface.

  - path: /etc/sysconfig/network/ifcfg-mgmt-bo
    permissions: 384
    owner: 0
    group: 0
    content: |+
      STARTMODE='onboot'
      BONDING_MASTER='yes'
      BOOTPROTO='none'
      POST_UP_SCRIPT="wicked:setup_bond.sh"
      BONDING_SLAVE_0='ens3'
      BONDING_MODULE_OPTS='miimon=100 mode=active-backup '
      DHCLIENT_SET_DEFAULT_ROUTE='no'
    encoding: ""
    ownerstring: ""

  - path: /etc/sysconfig/network/ifcfg-ens3
    permissions: 384
    owner: 0
    group: 0
    content: |
      STARTMODE='hotplug'
      BOOTPROTO='none'
    encoding: ""
    ownerstring: ""

Physical NICs: You can use the command ip link to retrieve related information, including the state of each NIC and the corresponding master (if applicable).

Example:

  $ ip link | grep master -1

  2: ens3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master mgmt-bo state UP mode DEFAULT group default qlen 1000
      link/ether 52:54:00:03:3a:e4 brd ff:ff:ff:ff:ff:ff
  --

  4: mgmt-bo: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master mgmt-br state UP mode DEFAULT group default qlen 1000
      link/ether 52:54:00:03:3a:e4 brd ff:ff:ff:ff:ff:ff

PCI devices: You can use the command lspci to retrieve a list of devices, which allows you to quickly identify the network NICs. To retrieve detailed information about each device, use the command lspci -v.

Example (lspci):

  $ lspci
  00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)

Example (lspci -v):

  $ lspci -v
  00:03.0 Ethernet controller: Intel Corporation 82540EM Gigabit Ethernet Controller (rev 03)
    Subsystem: Red Hat, Inc. QEMU Virtual Machine
    Physical Slot: 3
    Flags: bus master, fast devsel, latency 0, IRQ 11
    Memory at fc080000 (32-bit, non-prefetchable) [size=128K]
    I/O ports at c000 [size=64]
    Expansion ROM at fc000000 [disabled] [size=512K]
    Kernel driver in use: e1000
    Kernel modules: e1000

Linux kernel log: You can use the command dmesg to display kernel messages, which include most of the required information. If you save the messages to kernel.log, you can check the driver and link status.

SUSE Virtualization places sub-NICs into the bond interfaces. In the following example, an additional bond interface named data-bo is created in the cluster.

  $ grep "(slave" kernel.log  (or: dmesg | grep "(slave")

  Jan 08 00:35:00 localhost kernel: mgmt-bo: (slave eno5): Enslaving as a backup interface with an up link
  Jan 08 00:35:00 localhost kernel: mgmt-bo: (slave ens4f0): Enslaving as a backup interface with an up link
  Jan 08 00:37:34 localhost kernel: data-bo: (slave eno6): Enslaving as a backup interface with an up link
  Jan 08 00:37:35 localhost kernel: data-bo: (slave ens4f1): Enslaving as a backup interface with an up link

The NICs are renamed.

  $ grep "renamed" kernel.log

  Jan 08 00:34:48 localhost kernel: tg3 0000:02:00.0 eno1: renamed from eth2 // eth2 / eno1 is not used by Harvester
  Jan 08 00:34:48 localhost kernel: tg3 0000:02:00.3 eno4: renamed from eth6 // eth6 / eno4 is not used by Harvester
  Jan 08 00:34:48 localhost kernel: tg3 0000:02:00.2 eno3: renamed from eth5 // eth5 / eno3 is not used by Harvester
  Jan 08 00:34:48 localhost kernel: tg3 0000:02:00.1 eno2: renamed from eth3 // eth3 / eno2 is not used by Harvester
  Jan 08 00:34:49 localhost kernel: i40e 0000:5d:00.0 eno5: renamed from eth0
  Jan 08 00:34:49 localhost kernel: i40e 0000:af:00.0 ens4f0: renamed from eth4
  Jan 08 00:34:49 localhost kernel: i40e 0000:5d:00.1 eno6: renamed from eth1
  Jan 08 00:34:49 localhost kernel: i40e 0000:af:00.1 ens4f1: renamed from eth2

The NIC driver of eno5(0000:5d:00.0) is (intel) i40e 10Gbps Full Duplex.

  $ grep "0000:5d:00.0" kernel.log

  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0: fw 8.71.63306 api 1.11 nvm 10.54.7 [8086:1572] [103c:22fc]
  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0: MAC address: 48:df:37:24:c2:00
  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0: FW LLDP is enabled
  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0: PCI-Express: Speed 8.0GT/s Width x8
  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0: Features: PF-id[0] VFs: 64 VSIs: 66 QP: 112 RSS FD_ATR FD_SB NTUPLE DCB VxLAN Geneve PTP VEPA
  Jan 08 00:34:49 localhost kernel: i40e 0000:5d:00.0 eno5: renamed from eth0

The enabled NICs are detected.

  $ grep "is Up" kernel.log

  Jan 08 00:34:47 localhost kernel: i40e 0000:5d:00.0 eth0: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
  Jan 08 00:34:48 localhost kernel: i40e 0000:5d:00.1 eth1: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
  Jan 08 00:34:48 localhost kernel: i40e 0000:af:00.0 eth4: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None
  Jan 08 00:34:49 localhost kernel: i40e 0000:af:00.1 eth2: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: None

Enable Maintenance Mode

(Optional) Stop VMs that cannot or must not be migrated.
Enable maintenance mode on the target node to automatically migrate all VMs to other nodes.
- Wait for everything to become ready, and then repeat the steps in the Pre-check section.
- Manually stop a VM in the following situations:
  - The VM fails to migrate.
  - The VM has selectors that prevent it from migrating to other nodes.
  - The VM has special hardware (for example, PCI passthrough or vGPUs) that prevent it from migrating to other nodes.

(Optional) Update the Network Config

There are one or more Network Config under every Cluster Network on SUSE Virtualization. Each Network Config is backed by a VlanConfig CRD object.

Updating the Network Config is required if the new NICs will be placed in different physical slots or will have different uplink parameters.

Check the node.

When a SUSE Virtualization cluster node belongs to a Network Config, the Node object has a label with the key network.harvesterhci.io/vlanconfig.

Example:
```
 apiVersion: v1
 kind: Node
 metadata:
   labels:
     ...
     network.harvesterhci.io/vlanconfig: vlan123
```

Remove this node from the Network Config.

When the new NICs are placed in different slots, you must change the Network Config to exclude this node. You can delete the VlanConfig if the Network Config object selects only this node from nodeSelector.

Example:

 apiVersion: network.harvesterhci.io/v1beta1
 kind: VlanConfig
 spec:
   clusterNetwork: data
   nodeSelector:
     kubernetes.io/hostname: node123  // select one or more nodes
   uplink:
     bondOptions:
       miimon: 100
       mode: 802.3ad
     linkAttributes:
       mtu: 1500
       txQLen: -1
     nics:
     - enp0s1
     - enp0s2

When VMs are still running on an affected node, the network webhook returns an error.

Check the Node object.

Depending on the situation, either the label network.harvesterhci.io/vlanconfig changes or is removed.

Check the VlanStatus object.

Depending on the situation, either the status of the VlanStatus object’s ready condition changes to "True" or the object is deleted.

Example:

 apiVersion: network.harvesterhci.io/v1beta1
 kind: VlanStatus
 metadata:
 ...
 status:
   clusterNetwork: data
   conditions:
   - lastUpdateTime: "2024-02-03T18:32:41Z"
     status: "True"
     type: ready
   linkMonitor: public
   localAreas:
   - cidr: 10.190.186.0/24
     vlanID: 2013
   node: node123
   vlanConfig: vlan123

(Optional) Drain the Node

You may find that some SUSE Storage replicas remain active on the node even after completing the previously outlined procedures.

Drain the node. (This is optional in SUSE Virtualization.)
- Scenario 1: The numReplicas value of all volumes is 3, which means that each SUSE Storage volume has three active replicas.
  
  The Longhorn Engine recognizes that it can no longer communicate with the replica on the drained node, and then marks that replica as failed. None of the replicas hold any special significance to SUSE Storage so it functions as long as it can communicate with at least one replica.
- Scenario 2: Some SUSE Storage volumes have fewer than three active replicas, or you manually attached volumes using the SUSE Virtualization UI or SUSE Storage UI.
  
  You must manually detach the replicas or move them to other nodes, and then drain the node using the command kubectl drain --ignore-daemonsets <node name>. The option --ignore-daemonsets is required because SUSE Storage deploys daemonsets such as Longhorn Manager, Longhorn CSI plugin, and Longhorn Engine image.
  
  Replicas running on the node are stopped and marked as Failed. Longhorn Engine processes running on the node are migrated with the pod to other nodes. Once the node is fully drained, no replicas and engine processes should remain running on the node.
Replenish replicas.

After a node is shut down, SUSE Storage does not start rebuilding the replicas on other nodes until the replica-replenishment-wait-interval (default value: 600 seconds) is reached. If the node comes back online before the wait interval value is reached, SUSE Storage reuses the replicas. Otherwise, SUSE Storage rebuilds the replicas on another node.

During system maintenance, you can modify the replica-replenishment-wait-interval value using the embedded SUSE Storage UI to enable faster replica rebuilding.

Replace the Nics

Shut the node down.
Replace the NICs.
Restart the node.
Collect Information about the current network configuration and status.

If you observe any abnormalities, generate a support bundle for troubleshooting purposes.

(Optional) Update the Network Config Again

Updating the Network Config is required if the new NICs will be placed in different physical slots.

Add the node to the Network Config.

You must create a new Network Config or change the Network Config to include this node.
Check the Node object.

The label network.harvesterhci.io/vlanconfig reflects the specific Network Config used.
Check the VlanStatus object.

The status of the VlanStatus object’s ready condition changes to "True".

Disable Maintenance Mode

Wait for the node to be moved back to the cluster.
Disable maintenance mode.
(Optional) Start the VMs that you manually stopped.
(Optional) Manually migrate VMs to this node.

Troubleshooting

SUSE Virtualization uses multiple network-related pods and CRDs. When troubleshooting, check the pod logs and the status of CRD objects.

Pods:

$ kubectl get pods -n harvester-system
NAME                                                    READY   STATUS    RESTARTS      AGE
harvester-network-controller-cnf22                      1/1     Running   2 (60m ago)   3d22h  // Network controller agent daemonSet, deployed in each node
harvester-network-controller-manager-859c4bd874-xcllf   1/1     Running   2 (60m ago)   3d22h  // Network controller
harvester-network-webhook-56b877d5d5-z42dp              1/1     Running   2 (60m ago)   3d22h

CRDs:

clusternetworks.network.harvesterhci.io
linkmonitors.network.harvesterhci.io
vlanconfigs.network.harvesterhci.io
vlanstatuses.network.harvesterhci.io