Tuning Large Deployments
In the following sections find considerations about a big scale deployment. In this context, a big scale comprises 1000 clients or more.
SUSE recommends the following in a big scale SUSE Manager deployment:
SUSE Manager servers should have at least 8 recent x86 cores, 32 GiB of RAM, and, most important, fast I/O devices such as at least an SSD (2 SSDs in RAID-0 are strongly recommended).
Proxies with many clients (hundreds) should have at least 2 recent x86 cores and 16 GiB of RAM.
Use one SUSE Manager Proxy per 500-1000 clients. Keep into account that download time depends on network capacity. Here is a rough example calculation with physical link speed of 1 GB/s:
400 Megabytes * 3000 / 119 Megabyte/s / 60 = 169 Minutes
Size of updates * Number of clients / Theoretical download speed / 60
Depending on hardware you can accept hundreds of client keys.
Plan time for onboarding clients- at least one hour per 1000 clients.
It is not recommended onboarding more than approx. 1000 clients directly to the SUSE Manager server- proxies should be used instead. This is because every client can use up to 3 TCP connections simultaneously, and too many TCP connections can cause performance issues.
If the following error appears in output of
dmesg, you probably have an excessive number of clients attached to a single SUSE Manager server or proxy for the ARP cache to contain all of their addresses:
kernel: neighbour table overflow
In that case, increase the ARP cache values via
sysctl, for example, by adding the following lines to
net.ipv4.neigh.default.gc_thresh1 = 4096 net.ipv4.neigh.default.gc_thresh2 = 8192 net.ipv4.neigh.default.gc_thresh3 = 16384 net.ipv4.neigh.default.gc_interval = 60 net.ipv4.neigh.default.gc_stale_time = 120
Start Small and Scale Up
Always start small and scale up gradually. Keep the server monitored in order to identify possible issues early.
SUSE proposes the following tuning settings in a big scale SUSE Manager deployment:
Increase the maximum Tomcat heap memory to face a potentially long queue of Salt return results. Set 8 GiB instead of the current default 1 GiB: parameter
/etc/sysconfig/tomcat(affects onboarding and Action execution).
Increase the number of Taskomatic workers, allowing to parallelize work on a high number of separate jobs. Set parameter
org.quartz.threadPool.threadCount = 100in
/etc/rhn/rhn.conf(affects onboarding and staging).
Allow Taskomatic to check for runnable jobs more frequently to reduce latency. Set parameter
org.quartz.scheduler.idleWaitTime = 1000in
/etc/rhn/rhn.conf(affects onboarding, staging and Action execution).
Increase Tomcat’s Salt return result workers to allow parallelizing work on a high number of Salt return results. Set parameter
java.message_queue_thread_pool_size = 100in
Increase the number of PostgreSQL connections available to Java applications (Tomcat, Taskomatic) according to the previous parameters, otherwise extra workers will starve waiting for a connection. Set parameter
hibernate.c3p0.max_size = 150in
/etc/rhn/rhn.conf(affects all client operations). Make sure enough PostgreSQL connections are configured before changing this parameter - refer to
smdba system-check autotuning --helpto get automatic tuning of the PostgreSQL configuration file while changing the number of available connections. Additional manual tuning is usually not necessary but might be required depending on scale and exact use cases.
Increase the number of Taskomatic’s
minion-action-executorworker threads allowing to parallelize the scheduling of Actions to clients. Set parameter
taskomatic.com.redhat.rhn.taskomatic.task.MinionActionExecutor.parallel_threads = 8in
/etc/rhn/rhn.conf(affects all client operations, especially staging).
Increase Salt’s presence ping timeouts if responses might come back later than the defaults. Set parameters
java.salt_presence_ping_timeout = 20and
java.salt_presence_ping_gather_job_timeout = 20in
/etc/rhn/rhn.conf(affects all client operations).
Increase the number of Salt master workers so that more requests can run in parallel (otherwise Tomcat and Taskomatic workers will starve waiting for the Salt API, and Salt will not be able to serve files timely). Set parameter
/etc/salt/master.d/susemanager.conf(affects onboarding and patching).
Increase this parameter further if file management states fail with the error "Unable to manage file: Message timed out"
Note that Salt master workers can consume significant amounts of RAM (typically about 70 MB per worker). It is recommended to keep usage monitored when increasing this value and to do so in relatively small increments (eg. 20) until failures are no longer produced.
Increase the maximum heap memory for the search daemon to be able to index many clients. Set 4 GiB instead of the current default 512 MB: add
/etc/rhn/rhn.conf(affects background indexing only).
Consider disabling Taskomatic jobs, especially if the provided functionality is not used:
Disable daily comparison of configuration files. Click, then the compare-configs-default link, then the Disable Schedule button and finally Delete Schedule.
Disable hourly synchronization of Cobbler files. Click, then the cobbler-sync-default link, then the Disable Schedule button and finally Delete Schedule.
Disable daily run of Gatherer and Subscription Matcher. Click, then the gatherer-matcher-default link, then the Disable Schedule button and finally Delete Schedule.
Note that increasing the number of PostgreSQL connections will require more RAM, make sure the SUSE Manager server is monitored and swap is never used.
Also note the above settings should be regarded as guidelines-they have been tested to be safe but care should be exercised when changing them, and consulting support is highly recommended.