Salt features two timeout parameters called
gather_job_timeout that are relevant during the execution of Salt commands and jobs—it does not matter whether they are triggered using the command line interface or API. These two parameters are explained in the following article.
This is a normal workflow when all clients are well reachable:
A salt command or job is executed:
salt '*' test.ping
Salt master publishes the job with the targeted clients into the Salt PUB channel.
Clients take that job and start working on it.
Salt master is looking at the Salt RET channel to gather responses from the clients.
If Salt master gets all responses from targeted clients, then everything is completed and Salt master will return a response containing all the client responses.
If some of the clients are down during this process, the workflow continues as follows:
timeoutis reached before getting all expected responses from the clients, then Salt master would trigger an aditional job (a Salt
find_jobjob) targeting only pending clients to check whether the job is already running on the client.
gather_job_timeoutis evaluated. A new counter is now triggered.
If this new
find_jobjob responds that the original job is actually running on the client, then Salt master will wait for that client’s response for another
gather_job_timeoutinterval before issuing the next
In case of reaching
gather_job_timeoutwithout having any response from the client (neither for the initial
test.pingnor for the
find_jobjob), Salt master will return with only the gathered responses from the responding clients.
By default, SUSE Manager globally sets
gather_job_timeout to 120 seconds. So, in the worst case, a Salt call targeting unreachable clients will end up with 240 seconds of waiting until getting a response.
You can configure these values differently by creating a
/etc/salt/master.d/custom.conf configuration file according to syntax in
Before Actions are executed on Salt clients, whether they scheduled via the Web UI or the API, SUSE Manager performs a "presence ping" command to ensure the respective
salt-minion processes are active and able to respond. Then, a ping gather job runs on the Salt master to handle the incoming pings from the clients. Actual commands will begin only after all clients have either responded to the ping, or timed out.
The presence ping is an ordinary Salt command, but is not subject to the same timeout parameters as all other Salt commands (
gather_job_timeout, described above). Rather, it has its own parameters (
presence_ping_gather_job_timeout) that can be set in
To allow for quicker detection of unresponsive clients, the timeout values for presence pings are by default significantly shorter than the general defaults. You can configure the presence ping parameters in
/etc/rhn/rhn.conf, however the default values should be sufficient in most cases.
A lower total presence ping timeout value will increase the chance of false negatives. In some cases, a client might be marked as non-responding, when it is responding but did not respond quickly enough. Additionally, setting this total presence ping timeout value too low could result in a client hanging at the boot screen. A higher total presence ping timeout will increase the accuracy of the test, as even slow clients will respond to the presence ping before timing out. Additionally, a higher presence ping timeout could limit throughput if you are targeting a large number of clients, when some of them are slow.
If a client does not reply to a ping within the allocated time, it is marked as
not available, and is excluded from the command. The Web UI shows a
minion is down or could not be contacted message in this case.
The presence ping timeout parameter changes the timeout setting for the presence ping, in seconds. Adjust the
java.salt_presence_ping_timeout parameter. Defaults to 4 seconds.
The presence ping gather job parameter changes the timeout setting for gathering the presence ping, in seconds. Adjust the
java.salt_presence_ping_gather_job_timeout parameter. Defaults to 1 second.
Salt SSH clients are slightly different than regular clients (zeromq). Salt SSH clients do not use Salt PUB/RET channels but a wrapper Salt command inside of an SSH call. Salt
gather_job_timeout are not playing a role here.
SUSE Manager defines a timeout for SSH connections in
# salt_ssh_connect_timeout = 180