Fine Tuning Error Detection

SUSE® Rancher Prime Continuous Delivery monitors the status field of deployed resources to determine whether a Bundle is healthy or in error. In certain cases, Fleet may interpret a condition in the status field as an error, even if it is expected or harmless.

You can adjust this behavior in two ways:

  • Ignore conditions in fleet.yaml

  • Customize error mappings with environment variables

A visual flow of fine tuning error detections

NOTE: You should rarely need to configure readiness detection in SUSE® Rancher Prime Continuous Delivery with environment variables. If you do, open an issue or submit a pull request to help improve the default readiness detection.

Ignore conditions in fleet.yaml

Use the ignore.conditions setting in the fleet.yaml file to tell SUSE® Rancher Prime Continuous Delivery to ignore specific conditions.

# from https://fleet.rancher.io/ref-fleet-yaml

# Ignore fields when monitoring a Bundle. This can be used when {product_name} thinks
# some conditions in Custom Resources makes the Bundle to be in an error state
# when it shouldn't.
ignore:

  # Conditions to be ignored
  conditions:

    # In this example a condition will be ignored if it contains
    # {"type": "Active", "status", "False"}
    - type: Active
      status: "False"

This method is useful when a custom resource or controller sets conditions that cause SUSE® Rancher Prime Continuous Delivery to mark a Bundle as failed, even though the resource is healthy.

A visual flow of ignore condition during fine tuning error detection

Configure error mapping with environment variables

In SUSE® Rancher Prime Continuous Delivery v0.13, error detection was enhanced to give you more control. You can use the environment variable CATTLE_WRANGLER_CHECK_GVK_ERROR_MAPPING to customize how resource conditions are interpreted.

This variable lets you define, by Group, Version, Kind (GVK), which condition values should be treated as errors or explicitly not treated as errors.

Set this variable in your Fleet Helm chart deployment (values.yaml) using extraEnv. The value must be JSON.

# Extra environment variables passed to the fleet pods.
# extraEnv:
# - name: OCI_STORAGE
#   value: "false"

NOTE: This setting is global to all Fleet controllers and applies to every GitRepo. If you need to adjust error handling only for a specific Bundle, use the ignoreConditions option in fleet.yaml instead.

Merging behavior

When you override mappings with CATTLE_WRANGLER_CHECK_GVK_ERROR_MAPPING:

  • New Conditions are merged with predefined conditions.

  • Condition values are replaced for any condition you redefine.

For example, consider the Default mapping:

HelmChart.Failed=["True"]

This means Failed=True is treated as an error.

When you override with:

  • HelmChart.Failed=["False"]

  • HelmChart.Ready=["False"]

This results in:

  • Failed=["False"] replaces the default mapping. This means Failed=False is now treated as an error.

  • Ready=["False"] is added, so Ready=False is also treated as an error.

  • Other conditions unchanged.

Disable error interpretation example

Assume that every value of Failed was previously interpreted as an error, for example:

{ "type": "Failed", "status": ["True", "False"] }

You can narrow this mapping to treat only Failed=True as an error by setting:

[
  {
    "gvk": "sample.cattle.io/v1, Kind=Sample",
    "conditionMappings": [
      { "type": "Failed", "status": ["True"] }
    ]
  }
]

This configuration means only Failed=True is treated as an error. Failed=False is no longer considered an error.

You can also disable errors for any value of Failed by:

{ "type": "Failed", "status": [""] }

This configuration ensures that no value of Failed is treated as an error.

NOTE: Overriding conditions only affects the default error mappings (refer to Default error mappings). SUSE® Rancher Prime Continuous Delivery may still mark a resource as an error because other checks, such as those from the kstatus library, continue to run after your customization.

Enable error interpretation example

[
  {
    "gvk": "sample.cattle.io/v1, Kind=Sample",
    "conditionMappings": [
      { "type": "Failed", "status": ["True"] }
    ]
  }
]

Here, Failed=True is treated as an error.

Default error mappings

SUSE® Rancher Prime Continuous Delivery adds default error mappings to interpret certain resource conditions in the status field as errors. These mappings are applied besides other readiness checks, such as those performed by the Kubernetes kstatus library.

The following default mappings apply:

  • HelmChart (helm.cattle.io/v1)

  • JobCreated: Neither True nor False is considered an error.

  • Failed: True is considered an error.

  • Node (v1)

  • OutOfDisk: True is considered an error.

  • MemoryPressure: True is considered an error.

  • DiskPressure: True is considered an error.

  • NetworkUnavailable: True is considered an error.

  • Deployment (apps/v1)

  • ReplicaFailure: True is considered an error.

  • Progressing: False is considered an error.

  • ReplicaSet (apps/v1)

  • ReplicaFailure: True is considered an error.

Fallback mapping

If a resource does not match the listed GVKs, SUSE® Rancher Prime Continuous Delivery applies a fallback mapping:

  • Any Group and Version with any kind

  • Stalled: True is considered an error.

  • Failed: True is considered an error.