Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Navigation
SUSE Linux Enterprise High Performance Computing 15 SP3

Administration Guide

This guide covers system administration tasks such as remote administration, workload management, and monitoring.

Publication Date: January 14, 2022
Preface
Available documentation
Giving feedback
Documentation conventions
Support
1 Introduction
1.1 Components provided
1.2 Hardware platform support
1.3 Support and life cycle
1.4 Documentation and other information
2 Installation and upgrade
2.1 System roles for SUSE Linux Enterprise High Performance Computing 15 SP3
2.2 Upgrading to SUSE Linux Enterprise High Performance Computing 15 SP3
3 Remote administration
3.1 Genders — static cluster configuration database
3.2 pdsh — parallel remote shell program
3.3 PowerMan — centralized power control for clusters
3.4 MUNGE authentication
3.5 mrsh/mrlogin — remote login using MUNGE authentication
4 Hardware
4.1 cpuid
4.2 hwloc — portable abstraction of hierarchical architectures for high-performance computing
5 Slurm — utility for HPC workload management
5.1 Installing Slurm
5.2 Slurm administration commands
5.3 Upgrading Slurm
5.4 Frequently asked questions
6 Monitoring and logging
6.1 ConMan — the console manager
6.2 Monitoring HPC clusters with Prometheus and Grafana
6.3 Ganglia — system monitoring
6.4 rasdaemon — utility to log RAS error tracings
7 HPC user libraries
7.1 Lmod — Lua-based environment modules
7.2 GNU Compiler Toolchain Collection for HPC
7.3 High Performance Computing libraries
7.4 File format libraries
7.5 MPI libraries
7.6 Profiling and benchmarking libraries and tools
8 Spack package management tool
8.1 Installing spack
8.2 Using spack: simple example with netcdf-cxx4
8.3 Using spack: complex example with mpich
8.4 Using a specific compiler
9 Dolly clone tool
9.1 Dolly cloning process
9.2 Using dolly
9.3 Dolly configuration file
9.4 Dolly limitations
A GNU licenses
A.1 GNU Free Documentation License

Copyright © 2020–2022 SUSE LLC and contributors. All rights reserved.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled GNU Free Documentation License.

For SUSE trademarks, see http://www.suse.com/company/legal/. All third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its affiliates. Asterisks (*) denote third-party trademarks.

All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its affiliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof.

Print this page