Salt timeouts
1. General Salt timeouts
Salt features two timeout parameters called timeout
and gather_job_timeout
that are relevant during the execution of Salt commands and jobs—it does not matter whether they are triggered using the command line interface or API. These two parameters are explained in the following article.
This is a normal workflow when all clients are well reachable:
-
A salt command or job is executed:
salt '*' test.ping
-
Salt master publishes the job with the targeted clients into the Salt PUB channel.
-
Clients take that job and start working on it.
-
Salt master is looking at the Salt RET channel to gather responses from the clients.
-
If Salt master gets all responses from targeted clients, then everything is completed and Salt master will return a response containing all the client responses.
If some of the clients are down during this process, the workflow continues as follows:
-
If
timeout
is reached before getting all expected responses from the clients, then Salt master would trigger an aditional job (a Saltfind_job
job) targeting only pending clients to check whether the job is already running on the client. -
Now
gather_job_timeout
is evaluated. A new counter is now triggered. -
If this new
find_job
job responds that the original job is actually running on the client, then Salt master will wait for that client’s response for anothergather_job_timeout
interval before issuing the nextfind_job
job. -
In case of reaching
gather_job_timeout
without having any response from the client (neither for the initialtest.ping
nor for thefind_job
job), Salt master will return with only the gathered responses from the responding clients.
By default, SUSE Manager globally sets timeout
and gather_job_timeout
to 120 seconds. So, in the worst case, a Salt call targeting unreachable clients will end up with 240 seconds of waiting until getting a response.
You can configure these values differently by creating a /etc/salt/master.d/custom.conf
configuration file according to syntax in /etc/salt/master.conf
.
2. Presence Ping Timeouts
Before Actions are executed on Salt clients, whether they scheduled via the Web UI or the API, SUSE Manager performs a "presence ping" command to ensure the respective salt-minion
processes are active and able to respond. Then, a ping gather job runs on the Salt master to handle the incoming pings from the clients. Actual commands will begin only after all clients have either responded to the ping, or timed out.
The presence ping is an ordinary Salt command, but is not subject to the same timeout parameters as all other Salt commands (timeout
/gather_job_timeout
, described above). Rather, it has its own parameters (presence_ping_timeout
/presence_ping_gather_job_timeout
) that can be set in /etc/rhn/rhn.conf
.
To allow for quicker detection of unresponsive clients, the timeout values for presence pings are by default significantly shorter than the general defaults. You can configure the presence ping parameters in /etc/rhn/rhn.conf
, however the default values should be sufficient in most cases.
A lower total presence ping timeout value will increase the chance of false negatives. In some cases, a client might be marked as non-responding, when it is responding but did not respond quickly enough. Additionally, setting this total presence ping timeout value too low could result in a client hanging at the boot screen. A higher total presence ping timeout will increase the accuracy of the test, as even slow clients will respond to the presence ping before timing out. Additionally, a higher presence ping timeout could limit throughput if you are targeting a large number of clients, when some of them are slow.
If a client does not reply to a ping within the allocated time, it is marked as not available
, and is excluded from the command. The Web UI shows a minion is down or could not be contacted
message in this case.
The presence ping timeout parameter changes the timeout setting for the presence ping, in seconds. Adjust the java.salt_presence_ping_timeout
parameter. Defaults to 4 seconds.
The presence ping gather job parameter changes the timeout setting for gathering the presence ping, in seconds. Adjust the java.salt_presence_ping_gather_job_timeout
parameter. Defaults to 1 second.
3. Salt SSH Clients (SSH Push)
Salt SSH clients are slightly different than regular clients (zeromq). Salt SSH clients do not use Salt PUB/RET channels but a wrapper Salt command inside of an SSH call. Salt timeout
and gather_job_timeout
are not playing a role here.
SUSE Manager defines a timeout for SSH connections in /etc/rhn/rhn.conf
:
# salt_ssh_connect_timeout = 180