Tuning Large Scale Deployments

SUSE Manager is designed by default to work on small and medium scale installations. For installations with more than 1000 clients per SUSE Manager Server, adequate hardware sizing and parameter tuning must be performed.

The instructions in this section can have severe and catastrophic performance impacts when improperly used. In some cases, they can cause SUSE Manager to completely cease functioning. Always test changes before implementing them in a production environment. During implementation, take care when changing parameters. Monitor performance before and after each change, and revert any steps that do not produce the expected result.

We strongly recommend that you contact SUSE Consulting for assistance with tuning.

SUSE will not provide support for catastrophic failure when these advanced parameters are modified without consultation.

Tuning is not required on installations of fewer than 1000 clients. Do not perform these instructions on small or medium scale installations.

1. The Tuning Process

Any SUSE Manager installation is subject to a number of design and infrastructure constraints that, for the purposes of tuning, we call environmental variables. Environmental variables can include the total number of clients, the number of different operating systems under management, and the number of software channels.

Environmental variables influence, either directly or indirectly, the value of most configuration parameters. During the tuning process, the configuration parameters are manipulated to improve system performance.

Before you begin tuning, you will need to estimate the best setting for each environment variable, and adjust the configuration parameters to suit.

To help you with the estimation process, we have provided you with a dependency graph. Locate the environmental variables on the dependency graph to determine how they will influence other variables and parameters.

Environmental variables are represented by graph nodes in a rectangle at the top of the dependency graph. Each node is connected to the relevant parameters that might need tuning. Consult the relevant sections in this document for more information about recommended values.

Tuning one parameter might require tuning other parameters, or changing hardware, or the infrastructure. When you change a parameter, follow the arrows from that node on the graph to determine what other parameters might need adjustment. Continue through each parameter until you have visited all nodes on the graph.

Tuning dependency graph
Key to the Dependency Graph
  • 3D boxes are hardware design variables or constraints

  • Oval-shaped boxes are software or system design variables or constraints

  • Rectangle-shaped boxes are configurable parameters, color-coded by configuration file:

    • Red: Apache httpd configuration files

    • Blue: Salt configuration files

    • Brown: Tomcat configuration files

    • Grey: PostgreSQL configuration files

    • Purple: /etc/rhn/rhn.conf

  • Dashed connecting lines indicate a variable or constraint that might require a change to another parameter

  • Solid connecting lines indicate that changing a configuration parameter requires checking another one to prevent issues

After the initial tuning has been completed, you will need to consider tuning again in these cases:

  • If your tuning inputs change significantly

  • If special conditions arise that require a certain parameter to be changed. For example, if specific warnings appear in a log file.

  • If performance is not satisfactory

To re-tune your installation, you will need to use the dependency graph again. Start from the node where significant change has happened.

2. Environmental Variables

This section contains information about environmental variables (inputs to the tuning process).

Network Bandwidth

A measure of the typically available egress bandwith from the SUSE Manager Server host to the clients or SUSE Manager Proxy hosts. This should take into account network hardware and topology as well as possible capacity limits on switches, routers, and other network equipment between the server and clients.

Channel count

The number of expected channels to manage. Includes any vendor-provided, third-party, and cloned or staged channels.

Client count

The total number of actual or expected clients. It is important to tune any parameters in advance of a client count increase, whenever possible.

OS mix

The number of distinct operating system versions that managed clients have installed. This is ordered by family (SUSE Linux Enterprise, openSUSE, Red Hat Enterprise Linux, or Ubuntu based). Storage and computing requirements are different in each case.

User count

The expected maximum amount of concurrent users interacting with the Web UI plus the number of programs simultaneously using the XMLRPC API. Includes spacecmd, spacewalk-clone-by-date, and similar.

3. Parameters

This section contains information about the available parameters.

3.1. MaxClients


The maximum number of HTTP requests served simultaneously by Apache httpd. Proxies, Web UI, and XMLRPC API clients each consume one. Requests exceeding the parameter will be queued and might result in timeouts.

Tune when

User count and proxy count increase significantly and this line appears in /var/log/apache2/error_log: […​] [mpm_prefork:error] [pid …​] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting.

Value default


Value recommendation



/etc/apache2/server-tuning.conf, in the prefork.c section


MaxClients = 200

After changing

Immediately change ServerLimit and check maxThreads for possible adjustment.


This parameter was renamed to MaxRequestWorkers, both names are valid.

More information


3.2. ServerLimit


The number of Apache httpd processes serving HTTP requests simultaneously. The number must equal MaxClients.

Tune when

MaxClients changes

Value default


Value recommendation

The same value as MaxClients


/etc/apache2/server-tuning.conf, in the prefork.c section


ServerLimit = 200

More information


3.3. maxThreads


The number of Tomcat threads dedicated to serving HTTP requests

Tune when

MaxClients changes. maxThreads must always be equal or greater than MaxClients

Value default


Value recommendation

The same value as MaxClients




<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8" address="" maxThreads="200" connectionTimeout="20000"/>

More information


3.4. connectionTimeout


The number of milliseconds before a non-responding AJP connection is forcibly closed.

Tune when

Client count increases significantly and AH00992, AH00877, and AH01030 errors appear in Apache error logs during a load peak.

Value default


Value recommendation





<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8" address="" maxThreads="200" connectionTimeout="1000000" keepAliveTimeout="300000"/>

More information


3.5. keepAliveTimeout


The number of milliseconds without data exchange from the JVM before a non-responding AJP connection is forcibly closed.

Tune when

Client count increases significantly and AH00992, AH00877, and AH01030 errors appear in Apache error logs during a load peak.

Value default


Value recommendation





<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" URIEncoding="UTF-8" address="" maxThreads="200" connectionTimeout="1000000" keepAliveTimeout="400000"/>

More information


3.6. Tomcat’s -Xmx


The maximum amount of memory Tomcat can use

Tune when

java.message_queue_thread_pool_size is increased or OutOfMemoryException errors appear in /var/log/rhn/rhn_web_ui.log

Value default

1 GiB

Value recommendation

4-8 GiB




JAVA_OPTS="…​ -Xmx8G …​"

After changing

Check memory usage

More information


3.7. java.disable_list_update_status


Disable displaying the update status for clients of a system group

Tune when

displaying the update status causes timeouts

Value default


Value recommendation




java.disable_list_update_status = true

After changing



More information

man rhn.conf

3.8. java.message_queue_thread_pool_size


The maximum number of threads in Tomcat dedicated to asynchronous operations

Tune when

Client count increases significantly

Value default


Value recommendation

50 - 150




java.message_queue_thread_pool_size = 50

After changing

Check hibernate.c3p0.max_size, as each thread consumes a PostgreSQL connection, starvation might happen if the allocated connection pool is insufficient. Check thread_pool, as each thread might perform Salt API calls, starvation might happen if the allocated Salt thread pool is insufficient. Check Tomcat’s -Xmx, as each thread consumes memory, OutOfMemoryException might be raised if insufficient.


Incoming Salt events are handled in separate thread pool, see java.salt_event_thread_pool_size

More information

man rhn.conf

3.9. java.salt_batch_size


The maximum number of minions concurrently executing a scheduled action.

Tune when

Client count reaches several thousands and actions are not executed quickly enough.

Value default


Value recommendation





java.salt_batch_size = 300

After changing

Check memory usage. Monitor memory usage closely before and after the change.

More information

Salt Rate Limiting

3.10. java.salt_event_thread_pool_size


The maximum number of threads in Tomcat dedicated to handling of incoming Salt events.

Tune when

The number of queued Salt events grows. Typically, this can happen during onboarding of large number of minions with higher value of java.salt_presence_ping_timeout. The number of events can be queried by echo "select count(*) from susesaltevent;" | spacewalk-sql --select-mode-direct -

Value default


Value recommendation





java.salt_event_thread_pool_size = 50

After changing

Check the length of Salt event queue. Check hibernate.c3p0.max_size, as each thread consumes a PostgreSQL connection, starvation might happen if the allocated connection pool is insufficient. Check thread_pool, as each thread might perform Salt API calls, starvation might happen if the allocated Salt thread pool is insufficient. Check Tomcat’s -Xmx, as each thread consumes memory, OutOfMemoryException might be raised if insufficient.

More information

man rhn.conf

3.11. java.salt_presence_ping_timeout


Before any action is executed on a client, a presence ping is executed to make sure the client is reachable. This parameter sets the amount of time before a second command (in most cases state.apply or any other Salt function) is sent to the client to verify its presence. Having many clients typically means some will respond faster than others, so this timeout could be raised to accommodate for the slower ones.

Tune when

Client count increases significantly, or some clients are responding correctly but too slowly, and SUSE Manager excludes them from calls. This line appears in /var/log/rhn/rhn_web_ui.log: "Got no result for <COMMAND> on minion <MINION_ID> (minion did not respond in time)"

Value default

4 seconds

Value recommendation

4-20 seconds




java.salt_presence_ping_timeout = 10

After changing

Large java.salt_presence_ping_timeout value can reduce overall throughput. This can be compensated by increasing java.salt_event_thread_pool_size

More information

Salt Timeouts

3.12. java.salt_presence_ping_gather_job_timeout


Before any action is executed on a client, a presence ping is executed to make sure the client is reachable. After java.salt_presence_ping_timeout seconds have elapsed without a response, a second command (in most cases state.apply or any other Salt function) is sent to the client and if there is no response from the client for the amount of seconds specified with this parameter one more call (saltutil.find_job) is sent for a final check. This parameter sets the number of seconds after the second command after which the client is definitely considered timeout. Having many clients typically means some will respond faster than others, so this timeout could be raised to accommodate for the slower ones.

Tune when

Client count increases significantly, or some clients are responding correctly but too slowly, and SUSE Manager excludes them from calls. This line appears in /var/log/rhn/rhn_web_ui.log: "Got no result for <COMMAND> on minion <MINION_ID> (minion did not respond in time)"

Value default

1 second

Value recommendation

1-50 seconds




java.salt_presence_ping_gather_job_timeout = 20

More information

Salt Timeouts

3.13. java.taskomatic_channel_repodata_workers


Whenever content is changed in a software channel, its metadata needs to be recomputed before clients can use it. Channel-altering operations include the addition of a patch, the removal of a package or a repository synchronization run. This parameter specifies the maximum number of Taskomatic threads that SUSE Manager will use to recompute the channel metadata. Channel metadata computation is both CPU-bound and memory-heavy, so raising this parameter and operating on many channels simultaneously could cause Taskomatic to consume significant resources, but channels will be available to clients sooner.

Tune when

Channel count becomes larger than 50, or more concurrent operations on channels are expected.

Value default


Value recommendation





java.taskomatic_channel_repodata_workers = 4

After changing

Check taskomatic.java.maxmemory for adjustment, as every new thread will consume memory

More information

man rhn.conf

3.14. taskomatic.java.maxmemory


The maximum amount of memory Taskomatic can use. Generation of metadata, especially for some OSs, can be memory-intensive, so this parameter might need raising depending on the managed OS mix.

Tune when

java.taskomatic_channel_repodata_workers increases, OSs are added to SUSE Manager (particularly Red Hat Enterprise Linux or Ubuntu), or OutOfMemoryException errors appear in /var/log/rhn/rhn_taskomatic_daemon.log.

Value default

4096 MiB

Value recommendation

4096-16384 MiB




taskomatic.java.maxmemory = 8192

After changing

Check memory usage.

More information

man rhn.conf

3.15. org.quartz.threadPool.threadCount


The number of Taskomatic worker threads. Increasing this value allows Taskomatic to serve more clients in parallel.

Tune when

Client count increases significantly

Value default


Value recommendation





org.quartz.threadPool.threadCount = 100

After changing

Check hibernate.c3p0.max_size and thread_pool for adjustment

More information


3.16. org.quartz.scheduler.idleWaitTime


Cycle time for Taskomatic. Decreasing this value lowers the latency of Taskomatic.

Tune when

Client count is in the thousands.

Value default

5000 ms

Value recommendation

1000-5000 ms




org.quartz.scheduler.idleWaitTime = 1000

More information


3.17. MinionActionExecutor.parallel_threads


Number of Taskomatic threads dedicated to sending commands to Salt clients as a result of actions being executed.

Tune when

Client count is in the thousands.

Value default


Value recommendation





taskomatic.minion_action_executor.parallel_threads = 10

3.18. SSHMinionActionExecutor.parallel_threads


Number of Taskomatic threads dedicated to sending commands to Salt SSH clients as a result of actions being executed.

Tune when

Client count is in the hundreds.

Value default


Value recommendation





taskomatic.sshminion_action_executor.parallel_threads = 40

3.19. hibernate.c3p0.max_size


Maximum number of PostgreSQL connections simultaneously available to both Tomcat and Taskomatic. If any of those components requires more concurrent connections, their requests will be queued.

Tune when

java.message_queue_thread_pool_size or maxThreads increase significantly, or when org.quartz.threadPool.threadCount has changed significantly. Each thread consumes one connection in Taskomatic and Tomcat, having more threads than connections might result in starving.

Value default


Value recommendation

100 to 200, higher than the maximum of java.message_queue_thread_pool_size + maxThreads and org.quartz.threadPool.threadCount




hibernate.c3p0.max_size = 100

After changing

Check max_connections for adjustment.

More information


3.20. rhn-search.java.maxmemory


The maximum amount of memory that the rhn-search service can use.

Tune when

Client count increases significantly, and OutOfMemoryException errors appear in journalctl -u rhn-search.

Value default

512 MiB

Value recommendation

512-4096 MiB




rhn-search.java.maxmemory = 4096

After changing

Check memory usage.

3.21. shared_buffers


The amount of memory reserved for PostgreSQL shared buffers, which contain caches of database tables and index data.

Tune when

RAM changes

Value default

25% of total RAM

Value recommendation

25-40% of total RAM




shared_buffers = 8192MB

After changing

Check memory usage.

More information


3.22. max_connections


Maximum number of PostgreSQL connections available to applications. More connections allow for more concurrent threads/workers in various components (in particular Tomcat and Taskomatic), which generally improves performance. However, each connection consumes resources, in particular work_mem megabytes per sort operation per connection.

Tune when

hibernate.c3p0.max_size changes significantly, as that parameter determines the maximum number of connections available to Tomcat and Taskomatic

Value default


Value recommendation

Depends on other settings, use /usr/lib/susemanager/bin/susemanager-connection-check to obtain a recommendation.




max_connections = 250

After changing

Check memory usage. Monitor memory usage closely before and after the change.

More information


3.23. work_mem


The amount of memory allocated by PostgreSQL every time a connection needs to do a sort or hash operation. Every connection (as specified by max_connections) might make use of an amount of memory equal to a multiple of work_mem.

Tune when

Database operations are slow because of excessive temporary file disk I/O. To test if that is happening, add log_temp_files = 5120 to /var/lib/pgsql/data/postgresql.conf, restart PostgreSQL, and monitor the PostgreSQL log files. If you see lines containing LOG: temporary file: try raising this parameter’s value to help reduce disk I/O and speed up database operations.

Value recommendation

2-20 MB




work_mem = 10MB

After changing

check if the SUSE Manager Server might need additional RAM.

More information


3.24. effective_cache_size


Estimation of the total memory available to PostgreSQL for caching. It is the explicitly reserved memory (shared_buffers) plus any memory used by the kernel as cache/buffer.

Tune when

Hardware RAM or memory usage increase significantly

Value recommendation

Start with 75% of total RAM. For finer settings, use shared_buffers + free memory + buffer/cache memory. Free and buffer/cache can be determined via the free -m command (free and buff/cache in the output respectively)




effective_cache_size = 24GB

After changing

Check memory usage


This is an estimation for the query planner, not an allocation.

More information


3.25. thread_pool


The number of worker threads serving Salt API HTTP requests. A higher number can improve parallelism of SUSE Manager Server-initiated Salt operations, but will consume more memory.

Tune when

java.message_queue_thread_pool_size or org.quartz.threadPool.threadCount are changed. Starvation can occur when there are more Tomcat or Taskomatic threads making simultaneous Salt API calls than there are Salt API worker threads.

Value default


Value recommendation

100-500, but should be higher than the sum of java.message_queue_thread_pool_size and org.quartz.threadPool.threadCount


/etc/salt/master.d/susemanager.conf, in the rest_cherrypy section.


thread_pool: 100

After changing

Check worker_threads for adjustment.

More information


3.26. worker_threads


The number of salt-master worker threads that process commands and replies from minions and the Salt API. Increasing this value, assuming sufficient resources are available, allows Salt to process more data in parallel from minions without timing out, but will consume significantly more RAM (typically about 70 MiB per thread). Setting this value to very high values could cause opposite effect as the workers will compete to each other for the CPU resources and the performance could be dropped significantly.

Tune when

Client count increases significantly, thread_pool increases significantly, or SaltReqTimeoutError or Message timed out errors appear in /var/log/salt/master could be a sign of too low or too high value of this parameter.

Value default


Value recommendation

8-32, depending on the number of the CPU cores available for the server, it is recommended to keep the value slightly less than the number of CPU cores.




worker_threads: 16

After changing

Check memory usage. Monitor memory usage closely before and after the change. It makes sense to monitor the salt-master stats event by enabling master_stats and adjusting master_stats_event_iter to fine tune the value of this parameter.

More information


3.27. auth_events


Determines whether the master will fire authentication events. Authentication events are fired when a minion performs an authentication check with the master. It helps to reduce the number of events published with the Salt Master Event Publisher and reduce the workload on Event Publisher subscribers.

Tune when

Large amount of salt/auth events published in the Salt event bus, which in most cases are useless for the subscribers.

Value default


Value recommendation





auth_events: False

More information


3.28. minion_data_cache_events


Determines whether the master will fire minion data cache events (minion/refresh/*). Minion data cache events are fired when a minion requests a minion data cache refresh. It helps to reduce the number of events published with the Salt Master Event Publisher and reduce the workload on Event Publisher subscribers.

Tune when

Large amount of minion/refresh/* events published in the Salt event bus, which in most cases are useless for the subscribers.

Value default


Value recommendation





minion_data_cache_events: False

More information


3.29. pub_hwm


The maximum number of outstanding messages sent by salt-master. If more than this number of messages need to be sent concurrently, communication with clients slows down, potentially resulting in timeout errors during load peaks.

Tune when

Client count increases significantly and Salt request timed out. The master is not responding. errors appear when pinging minions during a load peak.

Value default


Value recommendation





pub_hwm: 10000

More information

https://docs.saltproject.io/en/latest/ref/configuration/master.html#pub-hwm, https://zeromq.org/socket-api/#high-water-mark

3.30. zmq_backlog


The maximum number of allowed client connections that have started but not concluded the opening process. If more than this number of clients connects in a very short time frame, connections are dropped and clients experience a delay re-connecting.

Tune when

Client count increases significantly and very many clients reconnect in a short time frame, TCP connections to the salt-master process get dropped by the kernel.

Value default


Value recommendation





zmq_backlog: 2000

More information

https://docs.saltproject.io/en/latest/ref/configuration/master.html#zmq-backlog, http://api.zeromq.org/3-0:zmq-getsockopt (ZMQ_BACKLOG)

3.31. swappiness


How aggressively the kernel moves unused data from memory to the swap partition. Setting a lower parameter typically reduces swap usage and results in better performance, especially when RAM memory is abundant.

Tune when

RAM increases, or swap is used when RAM memory is sufficient.

Value default


Value recommendation

1-60. For 128 GB of RAM, 10 is expected to give good results.




vm.swappiness = 20

More information


3.32. wait_for_backend


Determines whether the salt-broker service should wait for backend sockets to be connected before opening the sockets for listening for connections from salt-minions. When enabled, it helps to prevent collecting ZeroMQ messages with the internal buffers of the sockets and pushing them to the salt-master once connection is restored.

Tune when

Unstable connectivity between the SUSE Manager Proxy and the SUSE Manager Server.

Value default


Value recommendation





wait_for_backend: True

More information

Proxies Connectivity

3.33. tcp_keepalive


The tcp keepalive interval to set on TCP ports. This setting can be used to tune Salt connectivity issues in messy network environments with misbehaving firewalls.

Tune when

Unstable connectivity between managed clients and the SUSE Manager Proxy or the SUSE Manager Server.

Value default


Value recommendation



/etc/venv-salt-minon/minion.d/tuning.conf or /etc/salt/minion.d/tuning.conf, depending on the minion type.


tcp_keepalive: True

After changing

Check Minions Connectivity for more details to fine tune extra keepalive parameters.

More information

https://docs.saltproject.io/en/latest/ref/configuration/minion.html#tcp-keepalive, Minions Connectivity

4. Memory Usage

Adjusting some of the parameters listed in this section can result in a higher amount of RAM being used by various components. It is important that the amount of hardware RAM is adequate after any significant change.

To determine how RAM is being used, you will need to check each process that consumes it.

Operating system

Stop all SUSE Manager services and inspect the output of free -h.

Java-based components

This includes Taskomatic, Tomcat, and rhn-search. These services support a configurable memory cap.

The SUSE Manager Server

Depends on many factors and can only be estimated. Measure PostgreSQL reserved memory by checking shared_buffers, permanently. You can also multiply work_mem and max_connections, and multiply by three for a worst case estimate of per-query RAM. You will also need to check the operating system buffers and caches, which are used by PostgreSQL to host copies of database data. These often automatically occupy any available RAM.

It is important that the SUSE Manager Server has sufficient RAM to accommodate all of these processes, especially OS buffers and caches, to have reasonable PostgreSQL performance. We recommend you keep several gigabytes available at all times, and add more as the database size on disk increases.

Whenever the expected amount of memory available for OS buffers and caches changes, update the effective_cache_size parameter to have PostgreSQL use it correctly. You can calculate the total available by finding the total RAM available, less the expected memory usage.

To get a live breakdown of the memory used by services on the SUSE Manager Server, use this command:

pidstat -p ALL -r --human 1 60 | tee pidstat-memory.log

This command will save a copy of displayed data in the pidstat-memory.log file for later analysis.