Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
Applies to SUSE OpenStack Cloud Crowbar 8

17 SUSE OpenStack Cloud Maintenance

17.1 Keeping the Nodes Up-To-Date

Keeping the nodes in SUSE OpenStack Cloud Crowbar up-to-date requires an appropriate setup of the update and pool repositories and the deployment of either the Updater barclamp or the SUSE Manager barclamp. For details, see Section 5.2, “Update and Pool Repositories”, Section 11.4.1, “Deploying Node Updates with the Updater Barclamp”, and Section 11.4.2, “Configuring Node Updates with the SUSE Manager Client Barclamp”.

If one of those barclamps is deployed, patches are installed on the nodes. Patches that do not require a reboot will not cause a service interruption. If a patch (for example, a kernel update) requires a reboot after the installation, services running on the machine that is rebooted will not be available within SUSE OpenStack Cloud. Therefore, we strongly recommend installing those patches during a maintenance window.

Note
Note: Maintenance Mode

As of SUSE OpenStack Cloud Crowbar 8, it is not possible to put your entire SUSE OpenStack Cloud into Maintenance Mode (such as limiting all users to read-only operations on the control plane), as OpenStack does not support this. However when Pacemaker is deployed to manage HA clusters, it should be used to place services and cluster nodes into Maintenance Mode before performing maintenance functions on them. For more information, see https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-guide/#cha-conf-hawk2.

Consequences when Rebooting Nodes
Administration Server

While the Administration Server is offline, it is not possible to deploy new nodes. However, rebooting the Administration Server has no effect on starting instances or on instances already running.

Control Nodes

The consequences a reboot of a Control Node depend on the services running on that node:

Database, Keystone, RabbitMQ, Glance, Nova:  No new instances can be started.

Swift:  No object storage data is available. If Glance uses Swift, it will not be possible to start new instances.

Cinder, Ceph:  No block storage data is available.

Neutron:  No new instances can be started. On running instances the network will be unavailable.

Horizon.  Horizon will be unavailable. Starting and managing instances can be done with the command line tools.

Compute Nodes

Whenever a Compute Node is rebooted, all instances running on that particular node will be shut down and must be manually restarted. Therefore it is recommended to evacuate the node by migrating instances to another node, before rebooting it.

17.2 Service Order on SUSE OpenStack Cloud Start-up or Shutdown

In case you need to restart your complete SUSE OpenStack Cloud (after a complete shut down or a power outage), ensure that the external Ceph cluster is started, available and healthy. Start then the nodes and services in the following order:

Service Order on Start-up
  1. Control Node/Cluster on which the Database is deployed

  2. Control Node/Cluster on which RabbitMQ is deployed

  3. Control Node/Cluster on which Keystone is deployed

  4. For Swift:

    1. Storage Node on which the swift-storage role is deployed

    2. Storage Node on which the swift-proxy role is deployed

  5. Any remaining Control Node/Cluster. The following additional rules apply:

    • The Control Node/Cluster on which the neutron-server role is deployed needs to be started before starting the node/cluster on which the neutron-l3 role is deployed.

    • The Control Node/Cluster on which the nova-controller role is deployed needs to be started before starting the node/cluster on which Heat is deployed.

  6. Compute Nodes

If multiple roles are deployed on a single Control Node, the services are automatically started in the correct order on that node. If you have more than one node with multiple roles, make sure they are started as closely as possible to the order listed above.

If you need to shut down SUSE OpenStack Cloud, the nodes and services need to be terminated in reverse order than on start-up:

Service Order on Shut-down
  1. Compute Nodes

  2. Control Node/Cluster on which Heat is deployed

  3. Control Node/Cluster on which the nova-controller role is deployed

  4. Control Node/Cluster on which the neutron-l3 role is deployed

  5. All Control Node(s)/Cluster(s) on which neither of the following services is deployed: Database, RabbitMQ, and Keystone.

  6. For Swift:

    1. Storage Node on which the swift-proxy role is deployed

    2. Storage Node on which the swift-storage role is deployed

  7. Control Node/Cluster on which Keystone is deployed

  8. Control Node/Cluster on which RabbitMQ is deployed

  9. Control Node/Cluster on which the Database is deployed

  10. If required, gracefully shut down an external Ceph cluster

17.3 Upgrading from SUSE OpenStack Cloud Crowbar 7 to SUSE OpenStack Cloud Crowbar 8

Upgrading from SUSE OpenStack Cloud Crowbar 7 to SUSE OpenStack Cloud Crowbar 8 can be done either via a Web interface or from the command line. A non-disruptive update is supported when the requirements listed at Non-Disruptive Upgrade Requirements are met. The non-disruptive upgrade provides a fully-functional SUSE OpenStack Cloud operation during most of the upgrade procedure.

If the requirements for a non-disruptive upgrade are not met, the upgrade procedure will be done in normal mode. When live-migration is set up, instances will be migrated to another node before the respective Compute Node is updated to ensure continuous operation.

Important
Important: STONITH and Administration Server

Make sure that the STONITH mechanism in your cloud does not rely on the state of the Administration Server (for example, no SBD devices are located there, and IPMI is not using the network connection relying on the Administration Server). Otherwise, this may affect the clusters when the Administration Server is rebooted during the upgrade procedure.

17.3.1 Requirements

When starting the upgrade process, several checks are performed to determine whether the SUSE OpenStack Cloud is in an upgradeable state and whether a non-disruptive update would be supported:

General Upgrade Requirements
  • All nodes need to have the latest SUSE OpenStack Cloud Crowbar 7 updates and the latest SLES 12 SP2 updates installed. If this is not the case, refer to Section 11.4.1, “Deploying Node Updates with the Updater Barclamp” for instructions on how to update.

  • All allocated nodes need to be turned on and have to be in state ready.

  • All barclamp proposals need to have been successfully deployed. If a proposal is in state failed, the upgrade procedure will refuse to start. Fix the issue or—if possible—remove the proposal.

  • If the Pacemaker barclamp is deployed, all clusters need to be in a healthy state.

  • The upgrade will not start when Ceph is deployed via Crowbar. Only external Ceph is supported. Documentation for SUSE Enterprise Storage is available at https://documentation.suse.com/ses/5.5.

  • Upgrade is only possible if the SQL Engine in the Database barclamp is set to MariaDB. For further info, see Section 17.3.2, “Preparing PostgreSQL-Based SUSE OpenStack Cloud Crowbar 7 for Upgrade”

  • The following repositories need to be available on a server that is accessible from the Administration Server. The HA repositories are only needed if you have an HA setup. It is recommended to use the same server that also hosts the respective repositories of the current version.

    SUSE-OpenStack-Cloud-Crowbar-8-Pool
    SUSE-OpenStack-Cloud-Crowbar-8-Update
    SLES12-SP3-Pool
    SLES12-SP3-Update
    SLE12-HA12-SP3-Pool (for HA setups only)
    SLE12-HA12-SP3-Update (for HA setups only)
    Important
    Important

    Do not add repositories to the SUSE OpenStack Cloud repository configuration. This needs to be done during the upgrade procedure.

  • A non-disruptive upgrade is not supported if Cinder has been deployed with the raw devices or local file back-end. In this case, you have to perform a regular upgrade, or change the Cinder back-end for a non-disruptive upgrade.

  • If SUSE Enterprise Storage is deployed using Crowbar, migrate it to an external cluster. You may want to upgrade SUSE Enterprise Storage, refer to https://documentation.suse.com/ses/5.5/single-html/ses-deployment/#ceph-upgrade-4to5crowbar.

  • Run the command nova-manage db archive_deleted_rows to purge deleted instances from the database table. This can significanly reduce time required for the database migration procedure.

  • Run the commands cinder-manage db purge and heat-manage purge_deleted to purge database entries that are marked as deleted.

Non-Disruptive Upgrade Requirements
  • All Control Nodes need to be set up highly available.

  • A non-disruptive upgrade is not supported if the Cinder has been deployed with the raw devices or local file back-end. In this case, you have to perform a regular upgrade, or change the Cinder back-end for a non-disruptive upgrade.

  • A non-disruptive upgrade is prevented if the cinder-volume service is placed on Compute Node. For a non-disruptive upgrade, cinder-volume should either be HA-enabled or placed on non-compute nodes.

  • A non-disruptive upgrade is prevented if manila-share service is placed on a Compute Node. For more information, see Section 12.15, “Deploying Manila”

  • Live-migration support needs to be configured and enabled for the Compute Nodes. The amount of free resources (CPU and RAM) on the Compute Nodes needs to be sufficient to evacuate the nodes one by one.

  • In case of a non-disruptive upgrade, Glance must be configured as a shared storage if the Default Storage Store value in the Glance is set to File.

  • For a non-disruptive upgrade, only KVM-based Compute Nodes with the nova-computer-kvm role are allowed in SUSE OpenStack Cloud Crowbar 7.

  • Non-disruptive upgrade is limited to the following cluster configurations:

    • Single cluster that has all supported controller roles on it

    • Two clusters where one only has neutron-network and the other one has the rest of the controller roles.

    • Two clusters where one only has neutron-server plus neutron-network and the other one has the rest of the controller roles.

    • Two clusters, where one cluster runs the database and RabbitMQ

    • Three clusters, where one cluster runs database and RabbitMQ, another cluster runs APIs, and the third cluster has the neutron-network role.

    If your cluster configuration is not supported by the non-disruptive upgrade procedure, you can still perform a normal upgrade.

17.3.2 Preparing PostgreSQL-Based SUSE OpenStack Cloud Crowbar 7 for Upgrade

Upgrading SUSE OpenStack Cloud Crowbar 7 is only possible when it uses MariaDB deployed with the Database barclamp. This means that before you can proceed with upgrading SUSE OpenStack Cloud Crowbar 7, you must migrate PostgreSQL to MariaDB first. The following description covers several possible scenarios.

17.3.2.1 Non-HA Setup or HA Setup with More Than 2 Nodes in the Cluster and PostgreSQL Database Backend

Install the latest maintenance updates on SUSE OpenStack Cloud Crowbar 7. In the Crowbar Web interface, switch to the Database barclamp. You should see the new mysql-server role in the Deployment section. Do not change the sql_engine at this point. Add your Database Node or cluster to the mysql-server role and apply the barclamp. MariaDB is now deployed and running, but it is still not used as a back end for OpenStack services.

Follow Procedure 17.1, “Data Migration” to migrate the data from PostgreSQL to MariaDB.

Procedure 17.1: Data Migration
  1. Run the /opt/dell/bin/prepare-mariadb script on the Admin Node to prepare the MariaDB instance by creating the required users, databases, and tables.

  2. After the script is finished, you'll find a list of all databases and URLs that are ready for data migration in the /etc/pg2mysql/databases.yaml located on one of the Database Nodes. The script's output may look as follows:

    Preparing node d52-54-77-77-01-01.vo6.cloud.suse.de
    Adding recipe[database::pg2mariadb_preparation] to run_list
    Running chef-client on d52-54-77-77-01-01.vo6.cloud.suse.de...
    Log: /var/log/crowbar/db-prepare.chef-client.log on
    d52-54-77-77-01-01.vo6.cloud.suse.de Run time: 444.725193199s
    Removing recipe[database::pg2mariadb_preparation] from run_list
    Prepare completed for d52-54-77-77-01-01.vo6.cloud.suse.de
    Summary of used databases: /etc/pg2mysql/databases.yaml on
    d52-54-77-77-01-01.vo6.cloud.suse.de

    The Summary of used databases: line shows the exact location of the /etc/pg2mysql/databases.yaml file.

    The /etc/pg2mysql/databases.yaml file contains a list of databases along with their source and target connection strings:

    keystone:
      source: postgresql://keystone:vZn3nfxXzv97@192.168.243.87/keystone
      target: mysql+pymysql://keystone:vZn3nfxXzv97@192.168.243.88/keystone?charset=utf8
    glance:
      source: postgresql://glance:cOau7NhaA54N@192.168.243.87/glance
      target: mysql+pymysql://glance:cOau7NhaA54N@192.168.243.88/glance?charset=utf8
    cinder:
      source: postgresql://cinder:idRll2gJPodv@192.168.243.87/cinder
      target: mysql+pymysql://cinder:idRll2gJPodv@192.168.243.88/cinder?charset=utf8
  3. Install the python-psql2mysql package on the Database Node (preferably the one with the /etc/pg2mysql/databases.yaml file).

  4. To determine whether the PostgreSQL databases contain data that cannot be migrated to MariaDB, run psql2mysql with the precheck option:

    tux > psql2mysql \
    --source postgresql://keystone:vZn3nfxXzv97@192.168.243.87/keystone \
    --target mysql+pymysql://keystone:vZn3nfxXzv97@192.168.243.88/keystone?charset=utf8 \
    precheck

    To run precheck operation on all databases in a single operation, use the --batch option and the /etc/pg2mysql/databases.yaml file as follows:

    tux > psql2mysql --batch /etc/pg2mysql/databases.yaml precheck

    If the precheck indicates that there is data that cannot be imported into MariaDB, modify the offending data manually to fix the problems. The example below shows what an output containing issues may look like:

    tux > psql2mysql --source postgresql://cinder:idRll2gJPodv@192.168.243.86/cinder precheck
    Table 'volumes' contains 4 Byte UTF8 characters which are incompatible with the 'utf8' encoding used by MariaDB
    The following rows are affected:
    +-----------------------------------------+-----------------+-------+
    |               Primary Key               | Affected Column | Value |
    +-----------------------------------------+-----------------+-------+
    | id=5c6b0274-d18d-4153-9fda-ef3d74ab4500 |   display_name  |   💫   |
    +-----------------------------------------+-----------------+-------+
    Error during prechecks. 4 Byte UTF8 characters found in the source database.
  5. Stop chef-client services on the nodes, to prevent regular runs of chef-client from starting database-related OpenStack services again. To do this from the Admin Node, you can use the knife ssh roles:dns-client systemctl stop chef-client command. Stop all OpenStack services that make use of the database to prevent them from writing new data during the migration.

    Note
    Note: Testing Migration Procedure

    If you want to perform a dry run of the migration procedure, you can run the psql2mysql migrate without stopping the database-related OpenStack services. This way, if the test migration fails due to errors that weren't caught by the precheck procedure, you can fix them with OpenStack services still running, thus minimizing the required downtime. When you perform the actual migration, the data in the target databases will be replaced with the latest one in the source databases.

    After the test migration and before the actual migration, it is recommended to run the psql2mysql purge-tables command to purge tables in the target database. While this step is optional, it speeds up the migration process.

    On an HA setup, shut down all services that make use of the database. To do this, use the crm command for example:

    crm resource stop apache2 keystone cinder-api glance-api \
          neutron-server swift-proxy nova-api magnum-api sahara-api heat-api ceilometer-collector
    Note
    Note

    If the Manage stateless active/active services with Pacemaker option in the Pacemaker barclamp is set to false, the OpenStack services must be stopped on each cluster node using the systemctl command.

    From this point, OpenStack services become unavailable.

  6. You can now migrate databases using the psql2mysql tool. However, before performing the migration, make sure that target database nodes have enough free space to accommodate the migrated data. To upgrade a single database, use the following command format:

    tux > psql2mysql \
    --source postgresql://neutron:secret@192.168.1.1/neutron \
    --target mysql+pymysql://neutron:evenmoresecret@192.168.1.2/neutron?charset=utf8 \
    migrate

    To migrate all databases in one operation, use the --batch option and the /etc/pg2mysql/databases.yaml file as follows:

    tux > psql2mysql --batch /etc/pg2mysql/databases.yaml migrate
  7. In the Crowbar Web interface, switch to the Database barclamp. Enable the raw view and set the value of sql_engine to mysql. Apply the barclamp. After this step, OpenStack services should be running again and reconfigured to use the MariaDB database back end.

  8. To prevent the PostgreSQL-related chef code from running, unassign the values from database-server role in the Database barclamp, and apply the barclamp.

  9. Start chef-client services on the nodes again.

  10. Stop PostgreSQL on the Database Nodes. Uninstall PostgreSQL packages.

    1. To stop the postgresql service, run the following command on one cluster node:

      tux > crm resource stop postgresql
      tux > crm resource stop fs-postgresql
      tux > crm resource stop drbd-postgresql

      Run the last command only if the previous setup used DRBD.

    2. Remove the packages on all cluster nodes:

      root # zypper rm postgresql94 postgresql94-server
    3. If you choose not to upgrade to SUSE OpenStack Cloud Crowbar 8 right away, delete unused pacemaker resource from one cluster node:

      tux > crm conf delete drbd-postgresql
      tux > crm conf delete fs-postgresql
      tux > crm conf delete postgresql
      Note
      Note

      Run the crm conf delete drbd-postgresql command only if the cloud setup your are upgrading uses DRBD.

  11. If DRBD is not used as a backend for RabbitMQ, it is possible to remove it at this point, using the following command:

    tux > sudo zypper rm drbd drbd-utils

    You can then reclaim the disk space used by Crowbar for DRBD. To do this, edit the node data using knife:

    tux > knife node edit -a DRBD_NODE

    Search for claimed_disks and remove the entry with owner set to LVM_DRBD.

    Otherwise, skip this step until after the full upgrade is done, since the RabbitMQ setup will be automatically switched from DRBD during the upgrade procedure.

17.3.2.2 HA Cluster with 2 Control Nodes

Before your proceed, extend the 2-node cluster with additional node that has no role assigned to it. Make sure that the new node has enough memory to serve as a Control Node.

  1. In Crowbar Web interface, switch to the Pacemaker barclamp, enable the raw view mode, find the allow_larger_cluster option, and set it value to true. Note that this is relevant only for DRBD clusters.

  2. Add the pacemaker-cluster-member role to the new node and apply the barclamp.

  3. Proceed with the migration procedure as described in Section 17.3.2.1, “Non-HA Setup or HA Setup with More Than 2 Nodes in the Cluster and PostgreSQL Database Backend”.

17.3.3 Upgrading Using the Web Interface

The Web interface features a wizard that guides you through the upgrade procedure.

Note
Note: Canceling Upgrade

You can cancel the upgrade process by clicking Cancel Upgrade. Note that the upgrade operation can be canceled only before the Administration Server upgrade is started. When the upgrade has been canceled, the nodes return to the ready state. However any user modifications must be undone manually. This includes reverting repository configuration.

  1. To start the upgrade procedure, open the Crowbar Web interface on the Administration Server and choose Utilities › Upgrade. Alternatively, point the browser directly to the upgrade wizard on the Administration Server, for example http://192.168.124.10/upgrade/.

  2. On the first screen of the Web interface you will run preliminary checks, get information about the upgrade mode and start the upgrade process.

  3. Perform the preliminary checks to determine whether the upgrade requirements are met by clicking Check in Preliminary Checks.

    The Web interface displays the progress of the checks. Make sure all checks are passed (you should see a green marker next to each check). If errors occur, fix them and run the Check again. Do not proceed until all checks are passed.

  4. When all checks in the previous step have passed, Upgrade Mode shows the result of the upgrade analysis. It will indicate whether the upgrade procedure will continue in non-disruptive or in normal mode.

  5. To start the upgrade process, click Begin Upgrade.

  6. While the upgrade of the Administration Server is prepared, the upgrade wizard prompts you to Download the Backup of the Administration Server. When the backup is done, move it to a safe place. If something goes wrong during the upgrade procedure of the Administration Server, you can restore the original state from this backup using the crowbarctl backup restore NAME command.

  7. Check that the repositories required for upgrading the Administration Server are available and updated. To do this, click the Check button. If the checks fail, add the software repositories as described in Chapter 5, Software Repository Setup of the Deployment Guide. Run the checks again, and click Next.

  8. Click Upgrade Administration Server to upgrade and reboot the admin node. Note that this operation may take a while. When the Administration Server has been updated, click Next.

  9. Check that the repositories required for upgrading all nodes are available and updated. To do this click the Check button. If the check fails, add the software repositories as described in Chapter 5, Software Repository Setup of the Deployment Guide. Run the checks again, and click Next.

  10. Stop the OpenStack services. Before you proceed, be aware that no changes can be made to your cloud during and after stopping the services. The OpenStack API will not be available until the upgrade process is completed. When you are ready, click Stop Services. Wait until the services are stopped and click Next.

  11. Before upgrading the nodes, the wizard prompts you to Back up OpenStack Database. The MariaDB database backup will be stored on the Administration Server. It can be used to restore the database in case something goes wrong during the upgrade. To back up the database, click Create Backup. When the backup operation is finished, click Next.

  12. Start the upgrade by clicking Upgrade Nodes. The number of nodes determines how long the upgrade process will take. When the upgrade is completed, press Finish to return to the Dashboard.

Note
Note

With this first maintenance update, only systems already using MariaDB as their OpenStack database will be able to upgrade. In a future maintenance update, there will be a way to migrate from PostgreSQL to MariaDB so PostgreSQL users will be able to upgrade.

Note
Note: Dealing with Errors

If an error occurs during the upgrade process, the wizard displays a message with a description of the error and a possible solution. After fixing the error, re-run the step where the error occurred.

17.3.4 Upgrading from the Command Line

The upgrade procedure on the command line is performed by using the program crowbarctl. For general help, run crowbarctl help. To get help on a certain subcommand, run crowbarctl COMMAND help.

To review the process of the upgrade procedure, you may call crowbarctl upgrade status at any time. Steps may have three states: pending, running, and passed.

  1. To start the upgrade procedure from the command line, log in to the Administration Server as root.

  2. Perform the preliminary checks to determine whether the upgrade requirements are met:

    root # crowbarctl upgrade prechecks

    The command's result is shown in a table. Make sure the column Errors does not contain any entries. If there are errors, fix them and restart the precheck command afterwards. Do not proceed before all checks are passed.

    root # crowbarctl upgrade prechecks
    +-------------------------------+--------+----------+--------+------+
    | Check ID                      | Passed | Required | Errors | Help |
    +-------------------------------+--------+----------+--------+------+
    | network_checks                | true   | true     |        |      |
    | cloud_healthy                 | true   | true     |        |      |
    | maintenance_updates_installed | true   | true     |        |      |
    | compute_status                | true   | false    |        |      |
    | ha_configured                 | true   | false    |        |      |
    | clusters_healthy              | true   | true     |        |      |
    +-------------------------------+--------+----------+--------+------+

    Depending on the outcome of the checks, it is automatically decided whether the upgrade procedure will continue in non-disruptive or in normal mode.

    Tip
    Tip: Forcing Normal Mode Upgrade

    The non-disruptive update will take longer than an upgrade in normal mode, because it performs certain tasks in parallel which are done sequentially during the non-disruptive upgrade. Live-migrating guests to other Compute Nodes during the non-disruptive upgrade takes additional time.

    Therefore, if a non-disruptive upgrade is not a requirement for you, you may want to switch to the normal upgrade mode, even if your setup supports the non-disruptive method. To force the normal upgrade mode, run:

    root # crowbarctl upgrade mode normal

    To query the current upgrade mode run:

    root # crowbarctl upgrade mode

    To switch back to the non-disruptive mode run:

    root # crowbarctl upgrade mode non_disruptive

    It is possible to call this command at any time during the upgrade process until the services step is started. After that point the upgrade mode can no longer be changed.

  3. Prepare the nodes by transitioning them into the upgrade state and stopping the chef daemon:

    root # crowbarctl upgrade prepare

    Depending of the size of your SUSE OpenStack Cloud deployment, this step may take some time. Use the command crowbarctl upgrade status to monitor the status of the process named steps.prepare.status. It needs to be in state passed before you proceed:

    root # crowbarctl upgrade status
    +--------------------------------+----------------+
    | Status                         | Value          |
    +--------------------------------+----------------+
    | current_step                   | backup_crowbar |
    | current_substep                |                |
    | current_substep_status         |                |
    | current_nodes                  |                |
    | current_node_action            |                |
    | remaining_nodes                |                |
    | upgraded_nodes                 |                |
    | crowbar_backup                 |                |
    | openstack_backup               |                |
    | suggested_upgrade_mode         | non_disruptive |
    | selected_upgrade_mode          |                |
    | compute_nodes_postponed        | false          |
    | steps.prechecks.status         | passed         |
    | steps.prepare.status           | passed         |
    | steps.backup_crowbar.status    | pending        |
    | steps.repocheck_crowbar.status | pending        |
    | steps.admin.status             | pending        |
    | steps.repocheck_nodes.status   | pending        |
    | steps.services.status          | pending        |
    | steps.backup_openstack.status  | pending        |
    | steps.nodes.status             | pending        |
    +--------------------------------+----------------+
  4. Create a backup of the existing Administration Server installation. In case something goes wrong during the upgrade procedure of the Administration Server you can restore the original state from this backup with the command crowbarctl backup restore NAME

    root # crowbarctl upgrade backup crowbar

    To list all existing backups including the one you have just created, run the following command:

    root # crowbarctl backup list
    +----------------------------+--------------------------+--------+---------+
    | Name                       | Created                  | Size   | Version |
    +----------------------------+--------------------------+--------+---------+
    | crowbar_upgrade_1534864741 | 2018-08-21T15:19:03.138Z | 219 KB | 4.0     |
    +----------------------------+--------------------------+--------+---------+
  5. This step prepares the upgrade of the Administration Server by checking the availability of the update and pool repositories for SUSE OpenStack Cloud Crowbar 8 and SUSE Linux Enterprise Server 12 SP3. Run the following command:

    root # crowbarctl upgrade repocheck crowbar
    +----------------------------------------+-------------------------------------+-----------+
    | Repository                             | Status                              | Type      |
    +----------------------------------------+-------------------------------------+-----------+
    | SLE12-SP3-HA-Pool                      | missing (x86_64), inactive (x86_64) | ha        |
    | SLE12-SP3-HA-Updates                   | available                           | ha        |
    | SLES12-SP3-Pool                        | available                           | os        |
    | SLES12-SP3-Updates                     | available                           | os        |
    | SUSE-OpenStack-Cloud-Crowbar-8-Pool    | available                           | openstack |
    | SUSE-OpenStack-Cloud-Crowbar-8-Updates | available                           | openstack |
    +----------------------------------------+-------------------------------------+-----------+

    The output above indicates that the SLE12-SP3-HA-Pool repository is missing, because it has not yet been added to the Crowbar configuration. To add it to the Administration Server proceed as follows.

    Note that this step is for setting up the repositories for the Administration Server, not for the nodes in SUSE OpenStack Cloud (this will be done in a subsequent step).

    1. Start yast repositories and proceed with Continue. Replace the repositories SLES12-SP2-Pool and SLES12-SP2-Updates with the respective SP3 repositories.

      If you prefer to use zypper over YaST, you may alternatively make the change using zypper mr.

    2. Next, replace the SUSE-OpenStack-Cloud-7 update and pool repositories with the respective SUSE OpenStack Cloud Crowbar 8 versions.

    3. Check for other (custom) repositories. All SLES SP2 repositories need to be replaced with the respective SLES SP3 version. In case no SP3 version exists, disable the repository—the respective packages from that repository will be deleted during the upgrade.

    Once the repository configuration on the Administration Server has been updated, run the command to check the repositories again. If the configuration is correct, the result should look like the following:

    root # crowbarctl upgrade repocheck crowbar
    +---------------------+----------------------------------------+
    | Status              | Value                                  |
    +---------------------+----------------------------------------+
    | os.available        | true                                   |
    | os.repos            | SLES12-SP3-Pool                        |
    |                     | SLES12-SP3-Updates                     |
    | openstack.available | true                                   |
    | openstack.repos     | SUSE-OpenStack-Cloud-Crowbar-8-Pool    |
    |                     | SUSE-OpenStack-Cloud-Crowbar-8-Updates |
    +---------------------+----------------------------------------+
  6. Now that the repositories are available, the Administration Server itself will be upgraded. The update will run in the background using zypper dup. Once all packages have been upgraded, the Administration Server will be rebooted and you will be logged out. To start the upgrade run:

    root # crowbarctl upgrade admin
  7. After the Administration Server has been successfully updated, the Control Nodes and Compute Nodes will be upgraded. At first the availability of the repositories used to provide packages for the SUSE OpenStack Cloud nodes is tested.

    Note
    Note: Correct Metadata in the PTF Repository

    When adding new repositories to the nodes, make sure that the new PTF repository also contains correct metadata (even if it is empty). To do this, run the createrepo-cloud-ptf command.

    Note that the configuration for these repositories differs from the one for the Administration Server that was already done in a previous step. In this step the repository locations are made available to Crowbar rather than to libzypp on the Administration Server. To check the repository configuration run the following command:

    root # crowbarctl upgrade repocheck nodes
    +---------------------------------+----------------------------------------+
    | Status                          | Value                                  |
    +---------------------------------+----------------------------------------+
    | ha.available                    | false                                  |
    | ha.repos                        | SLES12-SP3-HA-Pool                     |
    |                                 | SLES12-SP3-HA-Updates                  |
    | ha.errors.x86_64.missing        | SLES12-SP3-HA-Pool                     |
    |                                 | SLES12-SP3-HA- Updates                 |
    | os.available                    | false                                  |
    | os.repos                        | SLES12-SP3-Pool                        |
    |                                 | SLES12-SP3-Updates                     |
    | os.errors.x86_64.missing        | SLES12-SP3-Pool                        |
    |                                 | SLES12-SP3-Updates                     |
    | openstack.available             | false                                  |
    | openstack.repos                 | SUSE-OpenStack-Cloud-Crowbar-8-Pool    |
    |                                 | SUSE-OpenStack-Cloud-Crowbar-8-Updates |
    | openstack.errors.x86_64.missing | SUSE-OpenStack-Cloud-Crowbar-8-Pool    |
    |                                 | SUSE-OpenStack-Cloud-Crowbar-8-Updates |
    +---------------------------------+----------------------------------------+

    To update the locations for the listed repositories, start yast crowbar and proceed as described in Section 7.4, “Repositories.

    Once the repository configuration for Crowbar has been updated, run the command to check the repositories again to determine, whether the current configuration is correct.

    root # crowbarctl upgrade repocheck nodes
    +---------------------+----------------------------------------+
    | Status              | Value                                  |
    +---------------------+----------------------------------------+
    | ha.available        | true                                   |
    | ha.repos            | SLE12-SP3-HA-Pool                      |
    |                     | SLE12-SP3-HA-Updates                   |
    | os.available        | true                                   |
    | os.repos            | SLES12-SP3-Pool                        |
    |                     | SLES12-SP3-Updates                     |
    | openstack.available | true                                   |
    | openstack.repos     | SUSE-OpenStack-Cloud-Crowbar-8-Pool    |
    |                     | SUSE-OpenStack-Cloud-Crowbar-8-Updates |
    +---------------------+----------------------------------------+
    Important
    Important: Shut Down Running instances in Normal Mode

    If the upgrade is done in normal mode (prechecks compute_status and ha_configured have not been passed), you need to shut down all running instances now.

    Important
    Important: Product Media Repository Copies

    To PXE boot new nodes, an additional SUSE Linux Enterprise Server 12 SP3 repository—a copy of the installation system— is required. Although not required during the upgrade procedure, it is recommended to set up this directory now. Refer to Section 5.1, “Copying the Product Media Repositories” for details. If you had also copied the SUSE OpenStack Cloud Crowbar 6 installation media (optional), you may also want to provide the SUSE OpenStack Cloud Crowbar 8 the same way.

    Once the upgrade procedure has been successfully finished, you may delete the previous copies of the installation media in /srv/tftpboot/suse-12.2/x86_64/install and /srv/tftpboot/suse-12.2/x86_64/repos/Cloud.

  8. To ensure the status of the nodes does not change during the upgrade process, the majority of the OpenStack services will be stopped on the nodes. As a result, the OpenStack API will no longer be accessible. The instances, however, will continue to run and will also be accessible. Run the following command:

    root # crowbarctl upgrade services

    This step takes a while to finish. Monitor the process by running crowbarctl upgrade status. Do not proceed before steps.services.status is set to passed.

  9. The last step before upgrading the nodes is to make a backup of the OpenStack PostgreSQL database. The database dump will be stored on the Administration Server and can be used to restore the database in case something goes wrong during the upgrade.

    root # crowbarctl upgrade backup openstack
  10. The final step of the upgrade procedure is upgrading the nodes. To start the process, enter:

    root # crowbarctl upgrade nodes all

    The upgrade process runs in the background and can be queried with crowbarctl upgrade status. Depending on the size of your SUSE OpenStack Cloud it may take several hours, especially when performing a non-disruptive update. In that case, the Compute Nodes are updated one-by-one after instances have been live-migrated to other nodes.

    Instead of upgrading all nodes you may also upgrade the Control Nodes first and individual Compute Nodes afterwards. Refer to crowbarctl upgrade nodes --help for details. If you choose this approach, you can use the crowbarctl upgrade status command to monitor the upgrade process. The output of this command contains the following entries:

    current_node_action

    The current action applied to the node.

    current_substep

    Shows the current substep of the node upgrade step. For example, for the crowbarctl upgrade nodes controllers, the current_substep entry displays the controller_nodes status when upgrading controllers.

    After the controllers have been upgraded, the steps.nodes.status entry in the output displays the running status. Check then the status of the current_substep_status entry. If it displays finished, you can move to the next step of upgrading the Compute Nodes.

    Postponing the Upgrade

    It is possible to stop the upgrade of compute nodes and postpone their upgrade with the command:

    root # crowbarctl upgrade nodes postpone

    After the upgrade of compute nodes is postponed, you can go to Crowbar Web interface, check the configuration. You can also apply some changes, provided they do not affect the Compute Nodes. During the postponed upgrade, all OpenStack services should be up and running. Compute Nodes are still running old version of services.

    To resume the upgrade, issue the command:

    root # crowbarctl upgrade nodes resume

    And finish the upgrade with either crowbarctl upgrade nodes all or upgrade nodes one node by one with crowbarctl upgrade nodes NODE_NAME.

    When upgrading individual Compute Nodes using the crowbarctl upgrade nodes NODE_NAME command, the current_substep_status entry changes to node_finished when the upgrade of a single node is done. After all nodes have been upgraded, the current_substep_status entry displays finished.

Note
Note: Dealing with Errors

If an error occurs during the upgrade process, the output of the crowbarctl upgrade status provides a detailed description of the failure. In most cases, both the output and the error message offer enough information for fixing the issue. When the problem has been solved, run the previously-issued upgrade command to resume the upgrade process.

17.3.5 Simultaneous Upgrade of Multiple Nodes

It is possible to select more Compute Nodes for selective upgrade instead of just one. Upgrading multiple nodes simultaneously significantly reduces the time required for the upgrade.

To upgrade multiple nodes simultaneously, use the following command:

root # crowbarctl upgrade nodes NODE_NAME_1,NODE_NAME_2,NODE_NAME_3

Node names can be separated by comma, semicolon, or space. When using space as separator, put the part containing node names in quotes.

Use the following command to find the names of the nodes that haven't been upgraded:

root # crowbarctl upgrade status nodes

Since the simultaneous upgrade is intended to be non-disruptive, all Compute Nodes targeted for a simultaneous upgrade must be cleared of any running instances.

Note
Note

You can check what instances are running on a specific node using the following command:

tux > nova list --all-tenants --host NODE_NAME

This means that it is not possible to pick an arbitrary number of Compute Nodes for the simultaneous upgrade operation: you have to make sure that it is possible to live-migrate every instance away from the batch of nodes that are supposed to be upgraded in parallel. In case of high load on all Compute Nodes, it might not be possible to upgrade more than one node at a time. Therefore, it is recommended to perform the following steps for each node targeted for the simultaneous upgrade prior to running the crowbarctl upgrade nodes command.

  1. Disable the Compute Node so it's not used as a target during live-evacuation of any other node:

    tux > openstack compute service set --disable "NODE_NAME" nova-compute
  2. Evacuate all running instances from the node:

    tux > nova host-evacuate-live "NODE_NAME"

After completing these steps, you can perform a simultaneous upgrade of the selected nodes.

17.3.6 Troubleshooting Upgrade Issues

Q: 1. Upgrade of the admin server has failed.

Check for empty, broken, and not signed repositories in the Administration Server upgrade log file /var/log/crowbar/admin-server-upgrade.log. Fix the repository setup. Upgrade then remaining packages manually to SUSE Linux Enterprise Server 12 SP3 and SUSE OpenStack Cloud 8 using the command zypper dup. Reboot the Administration Server.

Q: 2. An upgrade step repeatedly fails due to timeout.

Timeouts for most upgrade operations can be adjusted in the /etc/crowbar/upgrade_timeouts.yml file. If the file doesn't exist, use the following template, and modify it to your needs:

        :prepare_repositories: 120
        :pre_upgrade: 300
        :upgrade_os: 1500
        :post_upgrade: 600
        :shutdown_services: 600
        :shutdown_remaining_services: 600
        :evacuate_host: 300
        :chef_upgraded: 1200
        :router_migration: 600
        :lbaas_evacuation: 600
        :set_network_agents_state: 300
        :delete_pacemaker_resources: 600
        :delete_cinder_services: 300
        :delete_nova_services: 300
        :wait_until_compute_started: 60
        :reload_nova_services: 120
        :online_migrations: 1800

The following entries may require higher values (all values are specified in seconds):

  • upgrade_os Time allowed for upgrading all packages of one node.

  • chef_upgraded Time allowed for initial crowbar_join and chef-client run on a node that has been upgraded and rebooted.

  • evacuate_host Time allowed for live migrate all VMs from a host.

Q: 3. Node upgrade has failed during live migration.

The problem may occur when it is not possible to live migrate certain VMs anywhere. It may be necessary to shut down or suspend other VMs to make room for migration. Note that the Bash shell script that starts the live migration for the Compute Node is executed from the Control Node. An error message generated by the crowbarctl upgrade status command contains the exact names of both nodes. Check the /var/log/crowbar/node-upgrade.log file on the Control Node for the information that can help you with troubleshooting. You might also need to check OpenStack logs in /var/log/nova on the Compute Node as well as on the Control Nodes.

It is possible that live-migration of a certain VM takes too long. This can happen if instances are very large or network connection between compute hosts is slow or overloaded. If this case, try to raise the global timeout in /etc/crowbar/upgrade_timeouts.yml.

We recommend to perform the live migration manually first. After it is completed successfully, call the crowbarctl upgrade command again.

The following commands can be helpful for analyzing issues with live migrations:

        nova server-migration-list
        nova server-migration-show
        nova instance-action-list
        nova instance-action

Note that these commands require OpenStack administrator privileges.

The following log files may contain useful information:

  • /var/log/nova/nova-compute on the Compute Nodes that the migration is performed from and to.

  • /var/log/nova/*.log (especially log files for the conductor, scheduler and placement services) on the Control Nodes.

It can happen that active instances and instances with heavy loads cannot be live migrated in a reasonable time. In that case, you can abort a running live-migration operation using the nova live-migration-abort MIGRATION-ID command. You can then perform the upgrade of the specific node at a later time.

Alternatively, it is possible to force the completion of the live migration by using the nova live-migration-force-complete MIGRATION-ID command. However, this might pause the instances for a prolonged period of time and have a negative impact on the workload running inside the instance.

Q: 4. Node has failed during OS upgrade.

Possible reasons include an incorrect repository setup or package conflicts. Check the /var/log/crowbar/node-upgrade.log log file on the affected node. Check the repositories on node using the zypper lr command. Make sure the required repositories are available. To test the setup, install a package manually or run the zypper dup command (this command is executed by the upgrade script). Fix the repository setup and run the failed upgrade step again. If custom package versions or version locks are in place, make sure that they don't interfere with the zypper dup command.

Q: 5. Node does not come up after reboot.

In some cases, a node can take too long to reboot causing a timeout. We recommend to check the node manually, make sure it is online, and repeat the step.

Q: 6. N number of nodes were provided to compute upgrade using crowbarctl upgrade nodes node_1,node_2,...,node_N, but less then N were actually upgraded.

If the live migration cannot be performed for certain nodes due to a timeout, Crowbar upgrades only the nodes that it was able to live-evacuate in the specified time. Because some nodes have been upgraded, it is possible that more resources will be available for live-migration when you try to run this step again. See also Node upgrade has failed during live migration. .

Q: 7. Node has failed at the initial chef client run stage.

An unsupported entry in the configuration file may prevent a service from starting. This causes the node to fail at the initial chef client run stage. Checking the /var/log/crowbar/crowbar_join/chef.* log files on the node is a good starting point.

Q: 8. I need to change OpenStack configuration during the upgrade but I cannot access Crowbar.

Crowbar Web interface is accessible only when an upgrade is completed or when it is postponed. Postponing the upgrade can be done only after upgrading all Control Nodes using the crowbarctl upgrade nodes postpone command. You can then access Crowbar and save your modifications. Before you can continue with the upgrade of rest of the nodes, resume the upgrade using the crowbarctl upgrade nodes resume command.

Q: 9. Failure occurred when evacuating routers.

Check the /var/log/crowbar/node-upgrade.log file on the node that performs the router evacuation (it should be mentioned in the error message). The ID of the router that failed to migrate (or the affected network port) is logged to /var/log/crowbar/node-upgrade.log. Use the OpenStack CLI tools to check the state of the affected router and its ports. Fix manually, if necessary. This can be done by bringing the router or port up and down again. The following commands can be useful for solving the issue:

        openstack router show ID
        openstack port list --router ROUTER-ID
        openstack port show PORT-ID
        openstack port set

Resume the upgrade by running the failed upgrade step again to continue with the router migration.

Q: 10. Some non-controller nodes were upgraded after performing crowbarctl upgrade nodes controllers.

In the current upgrade implementation, OpenStack nodes are divided into Compute Nodes and other nodes. The crowbarctl upgrade nodes controllers command starts the upgrade of all the nodes that do not host compute services. This includes the controllers.

17.4 Recovering from Compute Node Failure

The following procedure assumes that there is at least one Compute Node already running. Otherwise, see Section 17.5, “Bootstrapping Compute Plane”.

Procedure 17.2: Procedure for Recovering from Compute Node Failure
  1. If the Compute Node failed, it should have been fenced. Verify that this is the case. Otherwise, check /var/log/pacemaker.log on the Designated Coordinator to determine why the Compute Node was not fenced. The most likely reason is a problem with STONITH devices.

  2. Determine the cause of the Compute Node's failure.

  3. Rectify the root cause.

  4. Boot the Compute Node again.

  5. Check whether the crowbar_join script ran successfully on the Compute Node. If this is not the case, check the log files to find out the reason. Refer to Section 19.2, “On All Other Crowbar Nodes” to find the exact location of the log file.

  6. If the chef-client agent triggered by crowbar_join succeeded, confirm that the pacemaker_remote service is up and running.

  7. Check whether the remote node is registered and considered healthy by the core cluster. If this is not the case check /var/log/pacemaker.log on the Designated Coordinator to determine the cause. There should be a remote primitive running on the core cluster (active/passive). This primitive is responsible for establishing a TCP connection to the pacemaker_remote service on port 3121 of the Compute Node. Ensure that nothing is preventing this particular TCP connection from being established (for example, problems with NICs, switches, firewalls etc.). One way to do this is to run the following commands:

    tux > lsof -i tcp:3121
    tux > tcpdump tcp port 3121
  8. If Pacemaker can communicate with the remote node, it should start the nova-compute service on it as part of the cloned group cl-g-nova-compute using the NovaCompute OCF resource agent. This cloned group will block startup of nova-evacuate until at least one clone is started.

    A necessary, related but different procedure is described in Section 17.5, “Bootstrapping Compute Plane”.

  9. It may happen that NovaCompute has been launched correctly on the Compute Node by lrmd, but the openstack-nova-compute service is still not running. This usually happens when nova-evacuate did not run correctly.

    If nova-evacuate is not running on one of the core cluster nodes, make sure that the service is marked as started (target-role="Started"). If this is the case, then your cloud does not have any Compute Nodes already running as assumed by this procedure.

    If nova-evacuate is started but it is failing, check the Pacemaker logs to determine the cause.

    If nova-evacuate is started and functioning correctly, it should call Nova's evacuate API to release resources used by the Compute Node and resurrect elsewhere any VMs that died when it failed.

  10. If openstack-nova-compute is running, but VMs are not booted on the node, check that the service is not disabled or forced down using the nova service-list command. In case the service is disabled, run the nova service-enable SERVICE_ID command. If the service is forced down, run the following commands:

    tux > fence_nova_param () {
        key="$1"
        cibadmin -Q -A "//primitive[@id="fence-nova"]//nvpair[@name='$key']" | \
        sed -n '/.*value="/{s///;s/".*//;p}'
    }
    tux > fence_compute \
        --auth-url=`fence_nova_param auth-url` \
        --endpoint-type=`fence_nova_param endpoint-type` \
        --tenant-name=`fence_nova_param tenant-name` \
        --domain=`fence_nova_param domain` \
        --username=`fence_nova_param login` \
        --password=`fence_nova_param passwd` \
        -n COMPUTE_HOSTNAME \
        --action=on

The above steps should be performed automatically after the node is booted. If that does not happen, try the following debugging techniques.

Check the evacuate attribute for the Compute Node in the Pacemaker cluster's attrd service using the command:

tux > attrd_updater -p -n evacuate -N NODE

Possible results are the following:

  • The attribute is not set. Refer to Step 1 in Procedure 17.2, “Procedure for Recovering from Compute Node Failure”.

  • The attribute is set to yes. This means that the Compute Node was fenced, but nova-evacuate never initiated the recovery procedure by calling Nova's evacuate API.

  • The attribute contains a time stamp, in which case the recovery procedure was initiated at the time indicated by the time stamp, but has not completed yet.

  • If the attribute is set to no, the recovery procedure recovered successfully and the cloud is ready for the Compute Node to rejoin.

If the attribute is stuck with the wrong value, it can be set to no using the command:

tux > attrd_updater -n evacuate -U no -N NODE

After standard fencing has been performed, fence agent fence_compute should activate the secondary fencing device (fence-nova). It does this by setting the attribute to yes to mark the node as needing recovery. The agent also calls Nova's force_down API to notify it that the host is down. You should be able to see this in /var/log/nova/fence_compute.log on the node in the core cluster that was running the fence-nova agent at the time of fencing. During the recovery, fence_compute tells Nova that the host is up and running again.

17.5 Bootstrapping Compute Plane

If the whole compute plane is down, it is not always obvious how to boot it up, because it can be subject to deadlock if evacuate attributes are set on every Compute Node. In this case, manual intervention is required. Specifically, the operator must manually choose one or more Compute Nodes to bootstrap the compute plane, and then run the attrd_updater -n evacuate -U no -N NODE command for each of those Compute Nodes to indicate that they do not require the resurrection process and can have their nova-compute start up straight away. Once these Compute Nodes are up, this breaks the deadlock allowing nova-evacuate to start. This way, any other nodes that require resurrection can be processed automatically. If no resurrection is desired anywhere in the cloud, then the attributes should be set to no for all nodes.

Important
Important

If Compute Nodes are started too long after the remote-* resources are started on the control plane, they are liable to fencing. This should be avoided.

17.6 Updating MariaDB with Galera

Updating MariaDB with Galera must be done manually. Crowbar does not install updates automatically. Updates can be done with Pacemaker or with the CLI. In particular, manual updating applies to upgrades to MariaDB 10.2.17 or higher from MariaDB 10.2.16 or earlier. See MariaDB 10.2.22 Release Notes - Notable Changes.

Note
Note

In order to run the following update steps, the database cluster needs to be up and healthy.

Using the Pacemaker GUI, update MariaDB with the following procedure:

  1. Put the cluster into maintenance mode. Detailed information about the Pacemaker GUI and its operation is available in the https://documentation.suse.com/sle-ha/12-SP5/single-html/SLE-HA-guide/#cha-conf-hawk2.

  2. Perform a rolling upgrade to MariaDB following the instructions at Upgrading Between Minor Versions with Galera Cluster.

    The process involves the following steps:

    1. Stop MariaDB

    2. Uninstall the old versions of MariaDB and the Galera wsrep provider

    3. Install the new versions MariaDB and the Galera wsrep provider

    4. Change configuration options if necessary

    5. Start MariaDB

    6. Run mysql_upgrade with the --skip-write-binlog option

  3. Each node must upgraded individually so that the cluster is always operational.

  4. Using the Pacemaker GUI, take the cluster out of maintenance mode.

When updating with the CLI, the database cluster must be up and healthy. Update MariaDB with the following procedure:

  1. Mark Galera as unmanaged:

    crm resource unmanage galera

    Or put the whole cluster into maintenance mode:

    crm configure property maintenance-mode=true
  2. Pick a node other than the one currently targeted by the loadbalancer and stop MariaDB on that node:

    crm_resource --wait --force-demote -r galera -V
  3. Perform updates with the following steps:

    1. Uninstall the old versions of MariaDB and the Galera wsrep provider.

    2. Install the new versions of MariaDB and the Galera wsrep provider. Select the appropriate instructions at Installing MariaDB with zypper.

    3. Change configuration options if necessary.

  4. Start MariaDB on the node.

    crm_resource --wait --force-promote -r galera -V
  5. Run mysql_upgrade with the --skip-write-binlog option.

  6. On the other nodes, repeat the process detailed above: stop MariaDB, perform updates, start MariaDB, run mysql_upgrade.

  7. Mark Galera as managed:

    crm resource manage galera

    Or take the cluster out of maintenance mode.

17.7 Periodic OpenStack Maintenance Tasks

Heat-manage helps manage Heat specific database operations. The associated database should be periodically purged to save space. The following should be setup as a cron job on the servers where the heat service is running at /etc/cron.weekly/local-cleanup-heat with the following content:

  #!/bin/bash
  su heat -s /bin/bash -c "/usr/bin/heat-manage purge_deleted -g days 14" || :

nova-manage db archive_deleted_rows command will move deleted rows from production tables to shadow tables. Including --until-complete will make the command run continuously until all deleted rows are archived. It is recommended to setup this task as /etc/cron.weekly/local-cleanup-nova on the servers where the nova service is running, with the following content:

  #!/bin/bash
  su nova -s /bin/bash -c "/usr/bin/nova-manage db archive_deleted_rows --until-complete" || :

17.8 Rotating Fernet Tokens

Fernet tokens should be rotated frequently for security purposes. It is recommended to setup this task as a cron job in /etc/cron.weekly/openstack-keystone-fernet on the keystone server designated as a master node in a highly available setup with the following content:

  #!/bin/bash
  su keystone -s /bin/bash -c "keystone-manage fernet_rotate"

  /usr/bin/keystone-fernet-keys-push.sh 192.168.81.168; /usr/bin/keystone-fernet-keys-push.sh 192.168.81.169;

The IP addresses in the above example, i.e. 192.168.81.168 and 192.168.81.169 are the IP addresses of the other two nodes of a three-node cluster. Be sure to use the correct IP addresses when configuring the cron job. Note that if the master node is offline and a new master is elected, the cron job will need to be removed from the previous master node and then re-created on the new master node. Do not run the fernet_rotate cron job on multiple nodes.

For a non-HA setup, the cron job should be configured at /etc/cron.weekly/openstack-keystone-fernet on the keystone server as follows:

  #!/bin/bash
  su keystone -s /bin/bash -c "keystone-manage fernet_rotate"
Print this page