Tuning Large Deployments

In the following sections find considerations about a big scale deployment. In this context, a big scale comprises 1000 clients or more.

General Recommendations

SUSE recommends the following in a big scale SUSE Manager deployment:

  • SUSE Manager servers should have at least 8 recent x86 cores, 32 GiB of RAM, and, most important, fast I/O devices such as at least an SSD (2 SSDs in RAID-0 are strongly recommended).

  • Proxies with many clients (hundreds) should have at least 2 recent x86 cores and 16 GiB of RAM.

  • Use one SUSE Manager Proxy per 500-1000 clients. Keep into account that download time depends on network capacity. Here is a rough example calculation with physical link speed of 1 GB/s:

    400 Megabytes  *        3000       /      119 Megabyte/s        / 60
    = 169 Minutes

    This is:

    Size of updates * Number of clients / Theoretical download speed / 60
  • Depending on hardware you can accept hundreds of client keys.

  • Plan time for onboarding clients- at least one hour per 1000 clients.

  • It is not recommended onboarding more than approx. 1000 clients directly to the SUSE Manager server- proxies should be used instead. This is because every client can use up to 3 TCP connections simultaneously, and too many TCP connections can cause performance issues.

  • If the following error appears in output of dmesg, you probably have an excessive number of clients attached to a single SUSE Manager server or proxy for the ARP cache to contain all of their addresses:

    kernel: neighbour table overflow

    In that case, increase the ARP cache values via sysctl, for example, by adding the following lines to /etc/sysctl.conf:

    net.ipv4.neigh.default.gc_thresh1 = 4096
    net.ipv4.neigh.default.gc_thresh2 = 8192
    net.ipv4.neigh.default.gc_thresh3 = 16384
    net.ipv4.neigh.default.gc_interval = 60
    net.ipv4.neigh.default.gc_stale_time = 120
Start Small and Scale Up

Always start small and scale up gradually. Keep the server monitored in order to identify possible issues early.

Tuning Proposals

SUSE proposes the following tuning settings in a big scale SUSE Manager deployment:

  • Increase the maximum Tomcat heap memory to face a potentially long queue of Salt return results. Set 8 GiB instead of the current default 1 GiB: parameter Xmx1G in /etc/sysconfig/tomcat (affects onboarding and Action execution).

  • Increase the number of Taskomatic workers, allowing to parallelize work on a high number of separate jobs. Set parameter org.quartz.threadPool.threadCount = 100 in /etc/rhn/rhn.conf (affects onboarding and staging).

  • Allow Taskomatic to check for runnable jobs more frequently to reduce latency. Set parameter org.quartz.scheduler.idleWaitTime = 1000 in /etc/rhn/rhn.conf (affects onboarding, staging and Action execution).

  • Increase Tomcat’s Salt return result workers to allow parallelizing work on a high number of Salt return results. Set parameter java.message_queue_thread_pool_size = 100 in /etc/rhn/rhn.conf (affects patching).

  • Increase the number of PostgreSQL connections available to Java applications (Tomcat, Taskomatic) according to the previous parameters, otherwise extra workers will starve waiting for a connection. Set parameter hibernate.c3p0.max_size = 150 in /etc/rhn/rhn.conf (affects all client operations). Make sure enough PostgreSQL connections are configured before changing this parameter - refer to smdba system-check autotuning --help to get automatic tuning of the PostgreSQL configuration file while changing the number of available connections. Additional manual tuning is usually not necessary but might be required depending on scale and exact use cases.

  • Increase the number of Taskomatic’s minion-action-executor worker threads allowing to parallelize the scheduling of Actions to clients. Set parameter taskomatic.com.redhat.rhn.taskomatic.task.MinionActionExecutor.parallel_threads = 8 in /etc/rhn/rhn.conf (affects all client operations, especially staging).

  • Increase Salt’s presence ping timeouts if responses might come back later than the defaults. Set parameters java.salt_presence_ping_timeout = 20 and java.salt_presence_ping_gather_job_timeout = 20 in /etc/rhn/rhn.conf (affects all client operations).

  • Increase the number of Salt master workers so that more requests can run in parallel (otherwise Tomcat and Taskomatic workers will starve waiting for the Salt API, and Salt will not be able to serve files timely). Set parameter worker_threads: 100 in /etc/salt/master.d/susemanager.conf (affects onboarding and patching).

    • Increase this parameter further if file management states fail with the error "Unable to manage file: Message timed out"

    • Note that Salt master workers can consume significant amounts of RAM (typically about 70 MB per worker). It is recommended to keep usage monitored when increasing this value and to do so in relatively small increments (eg. 20) until failures are no longer produced.

  • Increase the maximum heap memory for the search daemon to be able to index many clients. Set 4 GiB instead of the current default 512 MB: add rhn-search.java.maxmemory=4096 in /etc/rhn/rhn.conf (affects background indexing only).

  • Consider disabling Taskomatic jobs, especially if the provided functionality is not used:

    • Disable daily comparison of configuration files. Click Admin  Task Schedules, then the compare-configs-default link, then the Disable Schedule button and finally Delete Schedule.

    • Disable hourly synchronization of Cobbler files. Click Admin  Task Schedules, then the cobbler-sync-default link, then the Disable Schedule button and finally Delete Schedule.

    • Disable daily run of Gatherer and Subscription Matcher. Click Admin  Task Schedules, then the gatherer-matcher-default link, then the Disable Schedule button and finally Delete Schedule.

Note that increasing the number of PostgreSQL connections will require more RAM, make sure the SUSE Manager server is monitored and swap is never used.

Also note the above settings should be regarded as guidelines-they have been tested to be safe but care should be exercised when changing them, and consulting support is highly recommended.