Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]

6 Salt Minion Scalability

6.1 Salt Minion Onboarding Rate

The rate at which SUSE Manager can on-board minions (accept Salt keys) is limited and depends on hardware resources. On-boarding minions at a faster rate than SUSE Manager is configured for will build up a backlog of unprocessed keys slowing the process and potentially exhausting resources. It is recommended to limit the acceptance key rate pro-grammatically. A safe starting point would be to on-board a minion every 15 seconds, which can be implemented via the following command:

for k in $(salt-key -l un|grep -v Unaccepted); do salt-key -y -a $k; sleep 15; done

6.2 Minions Running with Unaccepted Salt Keys

Minions which have not been on-boarded, (minions running with unaccepted Salt keys) consume resources, in particular inbound network bandwidth for ~2.5 Kb/s per minion. 1000 idle minions will consume around ~2.5 Mb/s, and this number will drop to almost 0 once on-boarding has been completed. Limit non-onboarded systems for optimal performance.

6.3 Salt Timeouts

Salt features two timeout parameters called timeout and gather_job_timeout that are relevant during the execution of Salt commands and jobs—​it does not matter whether they are triggered using the command line interface or API. These two parameters are explained in the following article.

This is a normal workflow when all minions are well reachable:

  • A salt command or job is executed:

    salt '*' test.ping
  • Salt master publishes the job with the targeted minions into the Salt PUB channel.

  • Minions take that job and start working on it.

  • Salt master is looking at the Salt RET channel to gather responses from the minions.

  • If Salt master gets all responses from targeted minions, then everything is completed and Salt master will return a response containing all the minion responses.

If some of the minions are down during this process, the workflow continues as follows:

  1. If timeout is reached before getting all expected responses from the minions, then Salt master would trigger an additional job (a Salt find_job job) targeting only pending minions to check whether the job is already running on the minion.

  2. Now gather_job_timeout is evaluated. A new counter is now triggered.

  3. If this new find_job job responses that the original job is actually running on the minion, then Salt master will wait for that minion’s response.

  4. In case of reaching gather_job_timeout without having any response from the minion (neither for the initial test.ping nor for the find_job job), Salt master will return with only the gathered responses from the responding minions.

By default, SUSE Manager globally sets timeout and gather_job_timeout to 120 seconds. So, in the worst case, a Salt call targeting unreachable minions will end up with 240 seconds of waiting until getting a response.

6.3.1 Presence Ping Timeout

There are two parameters that control how presence pings from the Salt master are handled, one for the ping timeout, and one for the ping gather job.

Salt batch calls begin with the Salt master performing a presence ping on the target minions. A ping gather job runs on the Salt master to handle the incoming pings from the minions. Batched commands will begin only after all minions have either responded to the ping, or timed out.

The presence ping is an ordinary Salt command, but is not subject to the same timeout parameters as all other Salt commands (timeout/gather_job_timeout), rather, it has its own parameters (presence_ping_timeout/presence_ping_gather_job_timeout). You can configure the global timeout values in the /etc/salt/master.d/custom.conf configuration file. However, to allow for quicker detection of unresponsive minions, the timeout values for presence pings are by default significantly shorter than those used elsewhere. You can configure the presence ping parameters in /etc/rhn/rhn.conf, however the default values should be sufficient in most cases.

A lower total presence ping timeout value will increase the chance of false negatives. In some cases, a minion might be marked as non-responding, when it is responding, but did not respond quickly enough. A higher total presence ping timeout will increase the accuracy of the test, as even slow minions will respond to the presence ping before timing out. Additionally, a higher presence ping timeout could limit throughput if you are targeting a large number of minions, when some of them are slow.

If a minion does not reply to a ping within the allocated time, it will be marked as not available, and will be excluded from the command. The Web UI will show a minion is down message in this case.

For more information on minion timeouts, see scale-minions.xml.

The presence ping timeout parameter changes the timeout setting for the presence ping, in seconds. Adjust the java.salt_presence_ping_timeout parameter. Defaults to 4 seconds.

The presence ping gather job parameter changes the timeout setting for gathering the presence ping, in seconds. Adjust the java.salt_presence_ping_gather_job_timeout parameter. Defaults to 1 second.

6.4 Batching

There are two parameters that control how actions are sent to clients, one for the batch size, and one for the delay.

When the Salt master sends a batch of actions to the target minions, it will send it to the number of minions determined in the batch size parameter. After the specified delay period, commands will be sent to the next batch of minions. The number of minions in each subsequent batch is equal to the number of minions that have completed in the previous batch.

Choosing a lower batch size will reduce system load and parallelism, but might reduce overall performance for processing actions.

The batch size parameter sets the maximum number of clients that can execute a single action at the same time. Adjust the java.salt_batch_size parameter. Defaults to 100.

Increasing the delay increases the chance that multiple minions will have completed before the next action is issued, resulting in fewer overall commands, and reducing load.

The batch delay parameter sets the amount of time, in seconds, to wait after a command is processed before beginning to process the command on the next minion. Adjust the java.salt_batch_delay parameter. Defaults to 1.0 seconds.

6.4.1 Salt SSH Minions (SSH Push)

Salt SSH minions are slightly different that regular minions (zeromq). Salt SSH minions do not use Salt PUB/RET channels but a wrapper Salt command inside of an SSH call. Salt timeout and gather_job_timeout are not playing a role here.

SUSE Manager defines a timeout for SSH connections in /etc/rhn/rhn.conf:

# salt_ssh_connect_timeout = 180

The presence ping mechanism is also working with SSH minions. In this case, SUSE Manager will use salt_presence_ping_timeout to override the default timeout value for SSH connections.

Print this page