Tuning Large Scale Deployments
- 1. The Tuning Process
- 2. Environmental Variables
- 3. パラメータ
- 3.1.
MaxClients
- 3.2.
ServerLimit
- 3.3.
maxThreads
- 3.4.
connectionTimeout
- 3.5.
keepAliveTimeout
- 3.6. Tomcat’s
-Xmx
- 3.7.
java.disable_list_update_status
- 3.8.
java.message_queue_thread_pool_size
- 3.9.
java.salt_batch_size
- 3.10.
java.salt_event_thread_pool_size
- 3.11.
java.salt_presence_ping_timeout
- 3.12.
java.salt_presence_ping_gather_job_timeout
- 3.13.
java.taskomatic_channel_repodata_workers
- 3.14.
taskomatic.java.maxmemory
- 3.15.
org.quartz.threadPool.threadCount
- 3.16.
org.quartz.scheduler.idleWaitTime
- 3.17.
MinionActionExecutor.parallel_threads
- 3.18.
SSHMinionActionExecutor.parallel_threads
- 3.19.
hibernate.c3p0.max_size
- 3.20.
rhn-search.java.maxmemory
- 3.21.
shared_buffers
- 3.22.
max_connections
- 3.23.
work_mem
- 3.24.
effective_cache_size
- 3.25.
thread_pool
- 3.26.
worker_threads
- 3.27.
auth_events
- 3.28.
minion_data_cache_events
- 3.29.
pub_hwm
- 3.30.
zmq_backlog
- 3.31.
swappiness
- 3.32.
wait_for_backend
- 3.33.
tcp_keepalive
- 3.1.
- 4. Memory Usage
SUSE Manager is designed by default to work on small and medium scale installations. For installations with more than 1000 clients per SUSE Manager Server, adequate hardware sizing and parameter tuning must be performed.
The instructions in this section can have severe and catastrophic performance impacts when improperly used. In some cases, they can cause SUSE Manager to completely cease functioning. Always test changes before implementing them in a production environment. During implementation, take care when changing parameters. Monitor performance before and after each change, and revert any steps that do not produce the expected result. |
We strongly recommend that you contact SUSE Consulting for assistance with tuning. SUSE will not provide support for catastrophic failure when these advanced parameters are modified without consultation. |
Tuning is not required on installations of fewer than 1000 clients. Do not perform these instructions on small or medium scale installations. |
1. The Tuning Process
Any SUSE Manager installation is subject to a number of design and infrastructure constraints that, for the purposes of tuning, we call environmental variables. Environmental variables can include the total number of clients, the number of different operating systems under management, and the number of software channels.
Environmental variables influence, either directly or indirectly, the value of most configuration parameters. During the tuning process, the configuration parameters are manipulated to improve system performance.
Before you begin tuning, you will need to estimate the best setting for each environment variable, and adjust the configuration parameters to suit.
To help you with the estimation process, we have provided you with a dependency graph. Locate the environmental variables on the dependency graph to determine how they will influence other variables and parameters.
Environmental variables are represented by graph nodes in a rectangle at the top of the dependency graph. Each node is connected to the relevant parameters that might need tuning. Consult the relevant sections in this document for more information about recommended values.
Tuning one parameter might require tuning other parameters, or changing hardware, or the infrastructure. When you change a parameter, follow the arrows from that node on the graph to determine what other parameters might need adjustment. Continue through each parameter until you have visited all nodes on the graph.
-
3D boxes are hardware design variables or constraints
-
Oval-shaped boxes are software or system design variables or constraints
-
Rectangle-shaped boxes are configurable parameters, color-coded by configuration file:
-
Red: Apache
httpd
configuration files -
Blue: Salt configuration files
-
Brown: Tomcat configuration files
-
Grey: PostgreSQL configuration files
-
紫:
/etc/rhn/rhn.conf
-
-
点線の接続線は、別のパラメータへの変更が必要な変数または定数を示します
-
実線の接続線は、設定パラメータを変更する場合に、別のパラメータを確認して問題を防止する必要があることを示します
次の場合、初期の調整が完了した後にもう一度調整を検討する必要があります。
-
調整の入力値が大きく変わった場合。
-
特定のパラメータの変更が必要な特殊な条件が発生した場合。 たとえば、ログファイルに特定の警告が表示される場合。
-
パフォーマンスに問題がある場合。
インストールを再調整するには、もう一度依存関係のグラフを使用する必要があります。大幅な変更が発生したノードから始めます。
2. Environmental Variables
This section contains information about environmental variables (inputs to the tuning process).
- Network Bandwidth
-
A measure of the typically available egress bandwith from the SUSE Manager Server host to the clients or SUSE Manager Proxy hosts. This should take into account network hardware and topology as well as possible capacity limits on switches, routers, and other network equipment between the server and clients.
- Channel count
-
The number of expected channels to manage. Includes any vendor-provided, third-party, and cloned or staged channels.
- Client count
-
The total number of actual or expected clients. It is important to tune any parameters in advance of a client count increase, whenever possible.
- OS mix
-
管理対象クライアントがインストールした固有のオペレーティングシステムバージョンの番号。ファミリ(SUSE Linux Enterprise、openSUSE、Red Hat Enterprise Linux、またはUbuntuベース)別に並べられます。ストレージとコンピューティングの要件はケースごとに異なります。
- User count (ユーザ数)
-
Web UIを操作する同時ユーザの想定最大数に、XMLRPC APIを同時に使用するプログラムの数を足したもの。
spacecmd
、`spacewalk-clone-by-date`などが含まれます。
3. パラメータ
このセクションでは、利用可能なパラメータについて説明します。
3.1. MaxClients
説明 |
Apache httpdで同時に提供するHTTP要求の最大数。 プロキシ、Web UI、およびXMLRPC APIクライアントそれぞれが1つの要求を消費します。 このパラメータを超える要求はキューに入れられ、タイムアウトが発生する場合があります。 |
調整すべき状況 |
User count (ユーザ数)とプロキシ数が大幅に増加し、この行が |
デフォルト値 |
150 |
推奨値 |
150~500 |
場所 |
|
例 |
|
変更後 |
|
注 |
このパラメータの名前は`MaxRequestWorkers`に変更されています。どちらのパラメータも有効です。 |
詳細情報 |
https://httpd.apache.org/docs/2.4/en/mod/mpm_common.html#maxrequestworkers |
3.2. ServerLimit
説明 |
HTTP要求を同時に処理するApache httpdプロセスの数。
この数は |
調整すべき状況 |
|
デフォルト値 |
150 |
推奨値 |
|
場所 |
|
例 |
|
詳細情報 |
https://httpd.apache.org/docs/2.4/en/mod/mpm_common.html#serverlimit |
3.3. maxThreads
説明 |
HTTP要求の処理専用のTomcatスレッドの数。 |
調整すべき状況 |
|
デフォルト値 |
150 |
推奨値 |
|
場所 |
|
例 |
|
詳細情報 |
3.4. connectionTimeout
説明 |
応答のないAJP接続を強制的に切断するまでのミリ秒数。 |
調整すべき状況 |
Client count (クライアント数)が大幅に増加し、負荷のピーク時にApacheのエラーログに |
デフォルト値 |
900000 |
推奨値 |
20000 ~3600000 |
場所 |
|
例 |
|
詳細情報 |
3.5. keepAliveTimeout
説明 |
応答のないAJP接続を強制的に切断するまでの、JVM からのデータ交換がない時間(ミリ秒数)。 |
調整すべき状況 |
Client count (クライアント数)が大幅に増加し、負荷のピーク時にApacheのエラーログに |
デフォルト値 |
300000 |
推奨値 |
20000~600000 |
場所 |
|
例 |
|
詳細情報 |
3.6. Tomcat’s -Xmx
説明 |
Tomcatが使用できるメモリの最大量。 |
調整すべき状況 |
|
デフォルト値 |
1 GiB |
推奨値 |
4-8 GiB |
場所 |
|
例 |
|
変更後 |
メモリ使用状況を確認します。 |
詳細情報 |
https://docs.oracle.com/javase/8/docs/technotes/tools/windows/java.html |
3.7. java.disable_list_update_status
説明 |
システムグループに属するクライアントの更新ステータスの表示を無効にします。 |
調整すべき状況 |
更新ステータスを表示するとタイムアウトが発生する場合。 |
デフォルト値 |
|
推奨値 |
|
場所 |
|
例 |
|
変更後 |
? |
注 |
|
詳細情報 |
|
3.8. java.message_queue_thread_pool_size
説明 |
Tomcatの非同期操作専用スレッドの最大数。 |
調整すべき状況 |
Client countが大幅に増加した場合。 |
デフォルト値 |
5 |
推奨値 |
50~150 |
場所 |
|
例 |
|
変更後 |
|
注 |
受信Saltイベントは別個のスレッドプールで処理されます。 |
詳細情報 |
|
3.9. java.salt_batch_size
説明 |
スケジュールされたアクションを同時に実行するMinionの最大数。 |
調整すべき状況 |
Client count (クライアント数)が数千に達し、アクションが十分高速に実行されない場合。 |
デフォルト値 |
200 |
推奨値 |
200~500 |
場所 |
|
例 |
|
変更後 |
memory usageを確認します。 変更の前後に、メモリ使用状況を入念に監視します。 |
詳細情報 |
3.10. java.salt_event_thread_pool_size
説明 |
Tomcatの、受信Saltイベント専用のスレッドの最大数。 |
調整すべき状況 |
キューに入れられたSaltイベントの数が増加する場合。これは通常、大量のMinionを
|
デフォルト値 |
8 |
推奨値 |
20~100 |
場所 |
|
例 |
|
変更後 |
Saltイベントキューの長さを確認します。
|
詳細情報 |
|
3.11. java.salt_presence_ping_timeout
Description |
Before any action is executed on a client, a presence ping is executed to make sure the client is reachable.
This parameter sets the amount of time before a second command (in most cases |
Tune when |
Client count increases significantly, or some clients are responding correctly but too slowly, and SUSE Manager excludes them from calls.
This line appears in |
Value default |
4 seconds |
Value recommendation |
4-20 seconds |
Location |
|
Example |
|
After changing |
Large |
More information |
3.12. java.salt_presence_ping_gather_job_timeout
Description |
Before any action is executed on a client, a presence ping is executed to make sure the client is reachable.
After |
Tune when |
Client count increases significantly, or some clients are responding correctly but too slowly, and SUSE Manager excludes them from calls.
This line appears in |
Value default |
1 second |
Value recommendation |
1-50 seconds |
Location |
|
Example |
|
More information |
3.13. java.taskomatic_channel_repodata_workers
説明 |
ソフトウェアチャンネルのコンテンツが変更された場合、クライアントがコンテンツを使用する前に、そのメタデータを再計算する必要があります。 チャンネル変更操作には、パッチの追加、パッケージの削除、またはリポジトリの同期の実行が含まれます。 このパラメータは、SUSE Managerがチャンネルのメタデータを再計算するために使用するTaskomaticスレッドの最大数を指定します。 チャンネルのメタデータの計算は、CPU制約でもあり、メモリも大量に使用します。そのため、このパラメータを増やして多数のチャンネルを同時に処理すると、Taskomaticは大量のリソースを消費しますが、クライアントはより短時間でチャンネルを利用できるようになります。 |
調整すべき状況 |
Channel count (チャンネル数)が大幅に増加した場合(50超)、または同時操作やチャンネルの増加が予期されている場合。 |
デフォルト値 |
2 |
推奨値 |
2~10 |
場所 |
|
例 |
|
変更後 |
すべての新規スレッドがメモリを消費するため、 |
詳細情報 |
|
3.14. taskomatic.java.maxmemory
説明 |
Taskomaticが使用できるメモリの最大量。 メタデータの生成は、特に一部のOSでは、メモリを大量に消費する可能性があります。そのため、管理されているOS mix (OSミックス)によっては、このパラメータ増やさなければならない場合があります。 |
調整すべき状況 |
|
デフォルト値 |
4096 MiB |
推奨値 |
4096~16384 MiB |
場所 |
|
例 |
|
変更後 |
メモリ使用状況を確認します。 |
詳細情報 |
|
3.15. org.quartz.threadPool.threadCount
説明 |
Taskomaticワーカスレの数。 この値を増やすと、Taskomaticはより多くのクライアントを並行して処理できます。 |
調整すべき状況 |
Client count (クライアント数)が大幅に増加した場合。 |
デフォルト値 |
20 |
推奨値 |
20~200 |
場所 |
|
例 |
|
変更後 |
|
詳細情報 |
http://www.quartz-scheduler.org/documentation/2.4.0-SNAPSHOT/configuration.html |
3.16. org.quartz.scheduler.idleWaitTime
説明 |
Taskomaticのサイクル時間。 この値を減らすと、Taskomaticのレイテンシが低下します。 |
調整すべき状況 |
Client count (クライアント数)が数千の場合。 |
デフォルト値 |
5000ミリ秒 |
推奨値 |
1000~5000ミリ秒 |
場所 |
|
例 |
|
詳細情報 |
http://www.quartz-scheduler.org/documentation/2.4.0-SNAPSHOT/configuration.html |
3.17. MinionActionExecutor.parallel_threads
説明 |
実行されているアクションの結果としてコマンドをSaltクライアントに送信するための専用のTaskomaticスレッドの数。 |
調整すべき状況 |
Client count (クライアント数)が数千の場合。 |
デフォルト値 |
1 |
推奨値 |
1~10 |
場所 |
|
例 |
|
3.18. SSHMinionActionExecutor.parallel_threads
説明 |
実行されているアクションの結果としてSalt SSHクライアントにコマンドを送信するための専用のTaskomaticスレッドの数。 |
調整すべき状況 |
Client count (クライアント数)が数千の場合。 |
デフォルト値 |
20 |
推奨値 |
20~100 |
場所 |
|
例 |
|
3.19. hibernate.c3p0.max_size
説明 |
TomcatとTaskomaticの両方で同時に利用可能なPostgreSQL接続の最大数。 これらのコンポーネントのいずれかでさらに多くの同時接続が必要になった場合、その要求はキューに入れられます。 |
調整すべき状況 |
|
デフォルト値 |
20 |
推奨値 |
100~200。 |
場所 |
|
例 |
|
変更後 |
|
詳細情報 |
3.20. rhn-search.java.maxmemory
Description |
The maximum amount of memory that the |
Tune when |
Client count increases significantly, and |
Value default |
512 MiB |
Value recommendation |
512-4096 MiB |
Location |
|
Example |
|
After changing |
Check memory usage. |
3.21. shared_buffers
Description |
The amount of memory reserved for PostgreSQL shared buffers, which contain caches of database tables and index data. |
Tune when |
RAM changes |
Value default |
25% of total RAM |
Value recommendation |
25-40% of total RAM |
Location |
|
Example |
|
After changing |
Check memory usage. |
More information |
https://www.postgresql.org/docs/15/runtime-config-resource.html#GUC-SHARED-BUFFERS |
3.22. max_connections
Description |
Maximum number of PostgreSQL connections available to applications.
More connections allow for more concurrent threads/workers in various components (in particular Tomcat and Taskomatic), which generally improves performance.
However, each connection consumes resources, in particular |
Tune when |
|
Value default |
400 |
Value recommendation |
Depends on other settings, use |
Location |
|
Example |
|
After changing |
Check memory usage. Monitor memory usage closely before and after the change. |
More information |
https://www.postgresql.org/docs/15/runtime-config-connection.html#GUC-MAX-CONNECTIONS |
3.23. work_mem
Description |
The amount of memory allocated by PostgreSQL every time a connection needs to do a sort or hash operation.
Every connection (as specified by |
Tune when |
Database operations are slow because of excessive temporary file disk I/O.
To test if that is happening, add |
Value recommendation |
2-20 MB |
Location |
|
Example |
|
After changing |
check if the SUSE Manager Server might need additional RAM. |
More information |
https://www.postgresql.org/docs/15/runtime-config-resource.html#GUC-WORK-MEM |
3.24. effective_cache_size
Description |
Estimation of the total memory available to PostgreSQL for caching.
It is the explicitly reserved memory ( |
Tune when |
Hardware RAM or memory usage increase significantly |
Value recommendation |
Start with 75% of total RAM.
For finer settings, use |
Location |
|
Example |
|
After changing |
Check memory usage |
Notes |
This is an estimation for the query planner, not an allocation. |
More information |
https://www.postgresql.org/docs/15/runtime-config-query.html#GUC-EFFECTIVE-CACHE-SIZE |
3.25. thread_pool
Description |
The number of worker threads serving Salt API HTTP requests. A higher number can improve parallelism of SUSE Manager Server-initiated Salt operations, but will consume more memory. |
Tune when |
|
Value default |
100 |
Value recommendation |
100-500, but should be higher than the sum of |
Location |
|
Example |
|
After changing |
Check |
More information |
3.26. worker_threads
Description |
The number of |
Tune when |
Client count increases significantly, |
Value default |
8 |
Value recommendation |
8-32, depending on the number of the CPU cores available for the server, it is recommended to keep the value slightly less than the number of CPU cores. |
Location |
|
Example |
|
After changing |
Check memory usage.
Monitor memory usage closely before and after the change.
It makes sense to monitor the |
More information |
https://docs.saltstack.com/en/latest/ref/configuration/master.html#worker-threads |
3.27. auth_events
Description |
Determines whether the master will fire authentication events. Authentication events are fired when a minion performs an authentication check with the master. It helps to reduce the number of events published with the Salt Master Event Publisher and reduce the workload on Event Publisher subscribers. |
Tune when |
Large amount of |
Value default |
True |
Value recommendation |
False |
Location |
|
Example |
|
More information |
https://docs.saltproject.io/en/latest/ref/configuration/master.html#auth-events |
3.28. minion_data_cache_events
Description |
Determines whether the master will fire minion data cache events ( |
Tune when |
Large amount of |
Value default |
True |
Value recommendation |
False |
Location |
|
Example |
|
More information |
https://docs.saltproject.io/en/latest/ref/configuration/master.html#minion-data-cache-events |
3.29. pub_hwm
Description |
The maximum number of outstanding messages sent by |
Tune when |
Client count increases significantly and |
Value default |
1000 |
Value recommendation |
10000-100000 |
Location |
|
Example |
|
More information |
https://docs.saltstack.com/en/latest/ref/configuration/master.html#pub-hwm, https://zeromq.org/socket-api/#high-water-mark |
3.30. zmq_backlog
Description |
The maximum number of allowed client connections that have started but not concluded the opening process. If more than this number of clients connects in a very short time frame, connections are dropped and clients experience a delay re-connecting. |
Tune when |
Client count increases significantly and very many clients reconnect in a short time frame, TCP connections to the |
Value default |
1000 |
Value recommendation |
1000-5000 |
Location |
|
Example |
|
More information |
https://docs.saltstack.com/en/latest/ref/configuration/master.html#zmq-backlog, http://api.zeromq.org/3-0:zmq-getsockopt ( |
3.31. swappiness
Description |
How aggressively the kernel moves unused data from memory to the swap partition. Setting a lower parameter typically reduces swap usage and results in better performance, especially when RAM memory is abundant. |
Tune when |
RAM increases, or swap is used when RAM memory is sufficient. |
Value default |
60 |
Value recommendation |
1-60. For 128 GB of RAM, 10 is expected to give good results. |
Location |
|
Example |
|
More information |
https://documentation.suse.com/sles/15-SP4/html/SLES-all/cha-tuning-memory.html#cha-tuning-memory-vm |
3.32. wait_for_backend
Description |
Determines whether the |
Tune when |
Unstable connectivity between the SUSE Manager Proxy and the SUSE Manager Server. |
Value default |
False |
Value recommendation |
True |
Location |
|
Example |
|
More information |
3.33. tcp_keepalive
Description |
The tcp keepalive interval to set on TCP ports. This setting can be used to tune Salt connectivity issues in messy network environments with misbehaving firewalls. |
Tune when |
Unstable connectivity between managed clients and the SUSE Manager Proxy or the SUSE Manager Server. |
Value default |
True |
Value recommendation |
True |
Location |
|
Example |
|
After changing |
Check Minions Connectivity for more details to fine tune extra keepalive parameters. |
More information |
https://docs.saltproject.io/en/latest/ref/configuration/minion.html#tcp-keepalive, Minions Connectivity |
4. Memory Usage
Adjusting some of the parameters listed in this section can result in a higher amount of RAM being used by various components. It is important that the amount of hardware RAM is adequate after any significant change.
To determine how RAM is being used, you will need to check each process that consumes it.
- Operating system
-
Stop all SUSE Manager services and inspect the output of
free -h
. - Java-based components
-
This includes Taskomatic, Tomcat, and
rhn-search
. These services support a configurable memory cap. - The SUSE Manager Server
-
Depends on many factors and can only be estimated. Measure PostgreSQL reserved memory by checking
shared_buffers
, permanently. You can also multiplywork_mem
andmax_connections
, and multiply by three for a worst case estimate of per-query RAM. You will also need to check the operating system buffers and caches, which are used by PostgreSQL to host copies of database data. These often automatically occupy any available RAM.
It is important that the SUSE Manager Server has sufficient RAM to accommodate all of these processes, especially OS buffers and caches, to have reasonable PostgreSQL performance. We recommend you keep several gigabytes available at all times, and add more as the database size on disk increases.
Whenever the expected amount of memory available for OS buffers and caches changes, update the effective_cache_size
parameter to have PostgreSQL use it correctly. You can calculate the total available by finding the total RAM available, less the expected memory usage.
To get a live breakdown of the memory used by services on the SUSE Manager Server, use this command:
pidstat -p ALL -r --human 1 60 | tee pidstat-memory.log
This command will save a copy of displayed data in the pidstat-memory.log
file for later analysis.