Salt Connectivity
Depending on the size and spreadness of the environment where SUSE Manager is used some connectivity issues are possible. There are no common recommendations for all the possible use cases as the environments could be very different expecially if public clouds instances are involved.
1. Minions Connectivity
In case clients are losing the connection to the SUSE Manager Server (or SUSE Manager Proxy if involved), they are unreachable from the SUSE Manager Server Web UI or with command line tools. To understand such a connection issue check if the client is not reachable from the SUSE Manager Server with salt MINION_ID test.ping
, while venv-salt-call test.ping
or (salt-call test.ping
in case if non-bundle salt is used on the client) is working fine on the client side. If this is the case, it is recommended to set tcp_keepalive
parameters.
The parameters with the values from the example below could be used as a starting point to look for the better combination, which could prevent cases of connection loss without restoring for some environments. It is recommended to put the parameters in the separate drop-in configuration file like /etc/venv-salt-minion/minion.d/tuning-keepalives.conf
or /etc/salt/minion.d/tuning-keepalives.conf
, depending on the minion type used on the client side.
###### Keepalive settings ###### ############################################ # ZeroMQ now includes support for configuring SO_KEEPALIVE if supported by # the OS. If connections between the minion and the master pass through # a state tracking device such as a firewall or VPN gateway, there is # the risk that it could tear down the connection the master and minion # without informing either party that their connection has been taken away. # Enabling TCP Keepalives prevents this from happening. # Overall state of TCP Keepalives, enable (1 or True), disable (0 or False) # or leave to the OS defaults (-1), on Linux, typically disabled. Default True, enabled. tcp_keepalive: True # How long before the first keepalive should be sent in seconds. Default 300 # to send the first keepalive after 5 minutes, OS default (-1) is typically 7200 seconds # on Linux see /proc/sys/net/ipv4/tcp_keepalive_time. tcp_keepalive_idle: 10 # How many lost probes are needed to consider the connection lost. Default -1 # to use OS defaults, typically 9 on Linux, see /proc/sys/net/ipv4/tcp_keepalive_probes. tcp_keepalive_cnt: 3 # How often, in seconds, to send keepalives after the first one. Default -1 to # use OS defaults, typically 75 seconds on Linux, see # /proc/sys/net/ipv4/tcp_keepalive_intvl. tcp_keepalive_intvl: 10
2. Proxies Connectivity
salt-broker
service is used on SUSE Manager Proxies to forward salt
traffic between the SUSE Manager Server and salt-minion
service used on the client side. It is possible that salt-broker
and all the clients behind it could be affected with the same issue as the clients directly connected to the SUSE Manager Server. The issue could be fixed with the same parameters as recommended for the minions, but specified in /etc/salt/broker
on each SUSE Manager Proxy.
The other possible issue which SUSE Manager Proxy can be affected with is the case when the connectivity to the SUSE Manager Server is lost for quite long interval, so the clients behind it started to retry the authentication to the salt-master
service on the SUSE Manager Server. This situation could be potentially dangerous as it could lead to collecting large amount of ZeroMQ messages with authentication attemps in the interal buffer of ZeroMQ sockets used inside the salt-broker
service, so that on restoring the connection to the salt-master
all of the messages will be pushed to it. It could lead to the issues on SUSE Manager Server side with salt-master
service, as it could be impossible to serve all the cached requests in apropriate time or even to the complete denial of the service.
To avoid such situation a set of extra parameters was introduced. The most important one is wait_for_backend
, which should be set to True
. This prevents opening the sockets for the clients behind the proxy while the connectivity to the salt-master
service is not established. In this case the messages from the clients are not collected in the internal buffers. drop_after_retries
is setting the number of retries before closing the sockets to drop the cached messages. The other parameters could help to fine tune the behaviour for the environment.
Setting timeouts, intervals and |
###### ZeroMQ connection options ###### ############################################ # For more details about the following parameters check ZeroMQ documentation: # http://api.zeromq.org/4-2:zmq-setsockopt # All of these parameters will be set to the backend sockets # (from the salt-broker to the salt-master) # connect_timeout (sets ZMQ_CONNECT_TIMEOUT) # default: 0 # value unit: milliseconds # Sets how long to wait before timing-out a connect to the remote socket. # 0 could take much time, so it could be better to set to more strict value # for particular environment depending on the network conditions. # The value equal to 10000 is setting 10 seconds connect timeout. connect_timeout: 3000 # reconnect_ivl (sets ZMQ_RECONNECT_IVL) # default: 100 # value unit: milliseconds # Sets the interval of time before reconnection attempt on connection drop. reconnect_ivl: 1000 # heartbeat_ivl (sets ZMQ_HEARTBEAT_IVL) # default: 0 # value unit: milliseconds # This parameter is important for detection of loosing the connection. # In case of value equal to 0 it is not sending heartbits. # It's better to set to more relevant value for the particular environment, # depending on possible network issues. # The value equal to 20000 (20 seconds) works good for most cases. heartbeat_ivl: 5000 # heartbeat_timeout (sets ZMQ_HEARTBEAT_TIMEOUT) # default: 0 # value unit: milliseconds # Sets the interval of time to consider that the connection is timed out # after sending the heartbeat and not getting the response on it. # The value equal to 60000 (1 minute) is considering the connection is down # after 1 minute of no response to the heartbeat. heartbeat_timeout: 10000 ###### Other connection options ###### # The following parameters are not related to ZeroMQ, # but the internal parameters of the salt-broker. # drop_after_retries # default: -1 # value unit: number of retries # Drop the frontend sockets of the salt-broker in case if it reaches # the number of retries to reconnect to the backend socket. # -1 means not drop the frontend sockets # It's better to choose more relevant value for the particular environment. # 10 can be a good choise for most of the cases. drop_after_retries: 5 # wait_for_backend # default: False # The main aim of this parameter is to prevent collecting the messages # with the open frontend socket and prevent pushing them on connecting # the backend socket to prevent large number of messages to be pushed # at once to salt-master. # It's better to set it to True if there is significant numer of minions # behind the salt-broker. wait_for_backend: True