Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / Configuring a Subnet Manager

Configuring a Subnet Manager

Publication Date: 12 Dec 2024
WHAT?

opensm is an InfiniBand compliant implementation of the Subnet Manager.

WHY?

To learn about the opensm implementation of the Subnet Manager and how to configure it.

EFFORT

Less than 30 minutes and basic knowledge of Linux administration.

1 Introduction to opensm

The following sections explain what opensm is and outline its role in the InfiniBand networking environment.

1.1 What is InfiniBand?

InfiniBand is a standard for computer networking communications. It offers a high-speed, low-latency networking technology commonly used in high-performance computing (HPC) environments. InfiniBand provides a high-bandwidth interconnect for connecting servers, storage and other network devices, for example, network switches. In the InfiniBand context, the group of connected network devices is called a fabric.

1.2 What is a Subnet Manager?

A Subnet Manager is a component of InfiniBand architecture. It manages and configures InfiniBand switches and connected devices. The Subnet Manager is important for discovering network devices, assigning addresses to them, and maintaining the overall health and configuration of the network.

1.3 What is opensm?

opensm is an InfiniBand compliant Subnet Manager that performs all required tasks for initializing InfiniBand hardware. InfiniBand switches usually come with a Subnet Manager embedded in their firmware. Because it may be outdated or has limited control over network devices, Dolomite offers opensm. At least one Subnet Manager must be running for each InfiniBand subnet.

1.4 How does opensm work?

opensm attaches to a specific InfiniBand port on the local host and configures only devices connected to it. If the local machine has other InfiniBand ports, opensm ignores devices connected to them. If no port is specified, opensm selects the first available port. By default, the opensm run is logged to /var/log/opensm.log. All errors reported in this file should be treated as indicators of InfiniBand fabric health issues.

1.5 Responsibilities of opensm

Key responsibilities of opensm include:

Device discovery

Identifying and managing the devices, such as nodes or switches, within the InfiniBand fabric.

Address assignment

Assigning a unique InfiniBand address to each device in the fabric. InfiniBand uses a unique addressing scheme known as the local identifier (LID) to identify each device.

Routing

Determining the optimal paths for data transmission within the fabric. InfiniBand supports both direct and routed communication between devices.

Topology management

Managing the topology of the InfiniBand fabric, including configuring and maintaining the connections between switches and devices.

Error handling

Monitoring the fabric for errors and handling them appropriately to ensure the reliability of communication.

2 Configuring opensm

Tip
Tip

The opensm command stores its main configuration in /etc/rdma/opensm.conf. Because the main configuration file gets updated on opensm upgrades, we recommend editing /etc/rdma/opensm instead, to prevent the need to merge changes manually.

  1. Install the opensm package on any hosts that will be running the Subnet Manager.

    > sudo transactional-update pkg install opensm
  2. Use the ibstat -p command to find GUID0 and GUID1 of the host channel adapter (HCA) ports, for example:

    > ibstat -p
    0x248a070300a80a80
    0x248a070300a80a81
  3. If you are using a single switch, follow these steps:

    1. Start the opensm service.

      > sudo systemctl start opensm
    2. Enable the opensm service on boot.

      > sudo systemctl enable opensm
    3. Edit the /etc/rdma/opensm file to add the identifier for the corresponding port.

      > sudo opensm -c /etc/rdma/opensm
      # The port GUID on which the OpenSM is running
      guid 0x248a070300a80a80
  4. If you are using a direct connect method or if you have multiple switches, enable the Subnet Manager on each port of the connected HCA on the host by adding the following lines to /etc/rc.d/after.local. Substitute the GUID0 and GUID1 with your discovered values. For P0 and P1, use the Subnet Manager priorities, with 1 being the lowest and 15 the highest.

    opensm -B -g GUID0 -p P0 -f /var/log/opensm-ib0.log
    opensm -B -g GUID1 -p P1 -f /var/log/opensm-ib1.log

    For example:

    > sudo cat /etc/rc.d/rc.local
    opensm -B -g 0x248a070300a80a80 -p 15 -f /var/log/opensm-ib0.log
    opensm -B -g 0x248a070300a80a81 -p 1 -f /var/log/opensm-ib1.log

3 For more information

Find detailed information in the following sources: