Configuring a Subnet Manager
- WHAT?
opensm is an InfiniBand compliant implementation of the Subnet Manager.
- WHY?
To learn about the opensm implementation of the Subnet Manager and how to configure it.
- EFFORT
Less than 30 minutes and basic knowledge of Linux administration.
1 Introduction to opensm #
The following sections explain what opensm is and outline its role in the InfiniBand networking environment.
1.1 What is InfiniBand? #
InfiniBand is a standard for computer networking communications. It offers a high-speed, low-latency networking technology commonly used in high-performance computing (HPC) environments. InfiniBand provides a high-bandwidth interconnect for connecting servers, storage and other network devices, for example, network switches. In the InfiniBand context, the group of connected network devices is called a fabric.
1.2 What is a Subnet Manager? #
A Subnet Manager is a component of InfiniBand architecture. It manages and configures InfiniBand switches and connected devices. The Subnet Manager is important for discovering network devices, assigning addresses to them, and maintaining the overall health and configuration of the network.
1.3 What is opensm? #
opensm is an InfiniBand compliant Subnet Manager that performs all required tasks for initializing InfiniBand hardware. InfiniBand switches usually come with a Subnet Manager embedded in their firmware. Because it may be outdated or has limited control over network devices, Dolomite offers opensm. At least one Subnet Manager must be running for each InfiniBand subnet.
1.4 How does opensm work? #
opensm attaches to a specific InfiniBand port on the local host and configures only
devices connected to it. If the local machine has other InfiniBand ports, opensm ignores
devices connected to them. If no port is specified, opensm selects the first available
port. By default, the opensm run is logged to /var/log/opensm.log
. All
errors reported in this file should be treated as indicators of InfiniBand fabric health
issues.
1.5 Responsibilities of opensm #
Key responsibilities of opensm include:
- Device discovery
Identifying and managing the devices, such as nodes or switches, within the InfiniBand fabric.
- Address assignment
Assigning a unique InfiniBand address to each device in the fabric. InfiniBand uses a unique addressing scheme known as the local identifier (LID) to identify each device.
- Routing
Determining the optimal paths for data transmission within the fabric. InfiniBand supports both direct and routed communication between devices.
- Topology management
Managing the topology of the InfiniBand fabric, including configuring and maintaining the connections between switches and devices.
- Error handling
Monitoring the fabric for errors and handling them appropriately to ensure the reliability of communication.
2 Configuring opensm #
The opensm
command stores its main configuration in
/etc/rdma/opensm.conf
. Because the main configuration file gets updated
on opensm upgrades, we recommend editing /etc/rdma/opensm
instead, to
prevent the need to merge changes manually.
Install the opensm package on any hosts that will be running the Subnet Manager.
>
sudo
transactional-update pkg install opensm
Use the
ibstat -p
command to findGUID0
andGUID1
of the host channel adapter (HCA) ports, for example:>
ibstat -p
0x248a070300a80a80 0x248a070300a80a81If you are using a single switch, follow these steps:
Start the
opensm
service.>
sudo
systemctl start opensm
Enable the
opensm
service on boot.>
sudo
systemctl enable opensm
Edit the
/etc/rdma/opensm
file to add the identifier for the corresponding port.>
sudo
opensm -c /etc/rdma/opensm
# The port GUID on which the OpenSM is running guid 0x248a070300a80a80
If you are using a direct connect method or if you have multiple switches, enable the Subnet Manager on each port of the connected HCA on the host by adding the following lines to
/etc/rc.d/after.local
. Substitute theGUID0
andGUID1
with your discovered values. ForP0
andP1
, use the Subnet Manager priorities, with 1 being the lowest and 15 the highest.opensm -B -g GUID0 -p P0 -f /var/log/opensm-ib0.log opensm -B -g GUID1 -p P1 -f /var/log/opensm-ib1.log
For example:
>
sudo
cat /etc/rc.d/rc.local
opensm -B -g 0x248a070300a80a80 -p 15 -f /var/log/opensm-ib0.log opensm -B -g 0x248a070300a80a81 -p 1 -f /var/log/opensm-ib1.log
3 For more information #
Find detailed information in the following sources:
InfiniBand home page
NVIDIA documentation on Subnet Manager
NVIDIA documentation on opensm
4 Legal Notice #
Copyright© 2006–2024 SUSE LLC and contributors. All rights reserved.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”.
For SUSE trademarks, see https://www.suse.com/company/legal/. All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.
All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors, nor the translators shall be held liable for possible errors or the consequences thereof.