Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / SUSE Enterprise Storage with Veeam Backup & Replication
SUSE Enterprise Storage 5.5, Veeam Backup & Replication 9.5

SUSE Enterprise Storage with Veeam Backup & Replication

Implementation Guide

Technical Reference Documentation
Getting Started
Authors
Masood Noori, ISV Solutions Architect (SUSE)
Alex Zacharow, ISV Certification Engineer (SUSE)
Image
SUSE Enterprise Storage 5.5
Veeam Backup and Replication 9.5
Date: 2020-10-20
This guide provides instructions how to implement SUSE Enterprise Storage 5.5 with Veeam Backup & Replication as both a Linux Repository and an S3 target as part of a Scale Out Backup Repository.Disclaimer: The articles and individual documents published in the SUSE Best Practices series were contributed voluntarily by SUSE employees and by third parties. If not stated otherwise inside the document, the articles are intended only to be one example of how a particular action could be taken. Also, SUSE cannot verify either that the actions described in the articles do what they claim to do or that they do not have unintended consequences. All information found in this article has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Therefore, we need to specifically state that neither SUSE LLC, its affiliates, the authors, nor the translators may be held liable for possible errors or the consequences thereof.

1 Introduction

The objective of this guide is to present instructions on how to implement SUSE Enterprise Storage (v5.5) with Veeam Backup & Replication as both a Linux repository and an S3 target as part of a Scale Out Backup Repository. It is suggested that the document be read in its entirety, along with the supplemental Appendix A, OS Networking Configuration information, before attempting the process.

The deployment presented in this guide aligns with architectural best practices of both SUSE and Veeam.

Upon completion of the steps in this document, a working SUSE Enterprise Storage (v5.5) cluster will be operational as described in the SUSE Enterprise Storage Deployment Guide and integrated with Veeam Backup & Replication.

2 Solution Description

The solution outlined in this guide enables a customer to deploy a disk-to-disk target that is orchestrated through Veeam software. SUSE Enterprise Storage can be used as a backup target via a Veeam proxy over a common network. The result is a high-performing and flexible backup target with exabyte scalability.

3 Business Value

SUSE Enterprise Storage

SUSE Enterprise Storage provides a Veeam disk-to-disk backup solution with:

  • commodity hardware for minimal hardware cost.

  • open source software for minimal software cost and maximum flexibility.

  • a self-managing, self-healing architecture for minimal management cost.

  • a flexible, cluster-based design for graceful and inexpensive upgrade and innovative licensing model that avoids per-gigabyte storage charges, so you will not owe more for saving more data.

  • the lowest-price solution for enterprise archive and backup implementations with minimal acquisition, management, and upgrade cost.

Veeam Backup & Replication

Veeam Backup & Replication delivers availability for all your virtual, physical and cloud-based workloads. Through a single management console, you can manage fast, flexible and reliable backup, recovery and replication of all your applications and data to eliminate legacy backup solutions. The solution includes native, certified SAP support for backups and recoveries.

Together, Veeam and SUSE deliver the flexibility and near-unlimited scalability you want for long-term data retention, plus a single storage architecture that delivers the various performance requirements a Veeam backup solution needs. It is ideal for mission-critical applications and platforms such as SAP HANA, and allows you to recover vital data fast when failures occur.

Solution Value Propositions

  • Issue: Customer needs to handle more simultaneous backup streams but not capacity.

    Solution: In large environments with many backup streams, adding more Veeam proxies and Linux repository servers may handle more simultaneous backups.

  • Issue: Customer needs more storage, but not more simultaneous streams.

    Solution: Add more Object Storage Devices (OSDs) to the SUSE Enterprise Storage cluster.

  • Issue: Customer wants to add S3 repository on-prem for long term archive.

    Solution: Deploy Rados Gateways (RGWs) for SUSE Enterprise Storage and implement the S3 repo.

  • Issue: Customer has multiple sites with small local Veeam repositories, but wants to replicate to a central Veeam location.

    Solution: Deploy Veeam server and Linux repositories with SUSE Enterprise Server as central site OR deploy SUSE Enterprise Storage with S3 to act as central S3 repository.

One of the huge advantages is that every Veeam backup can benefit from the aggregated throughput of the cluster. This brings both performance and storage efficiency. Instead of being limited to the throughput capacity of a single server, the I/O is spread across ALL the storage nodes. It also means that there won’t be one storage enclosure that is maxed out on I/O capability while another is sitting idle. This is all done without using Veeam’s Scale Out Backup Repository.

4 Requirements

The solution has the following requirements:

  • Simple to setup and deploy

  • Able to meet the documented guidelines for system hardware, networking and environmental prerequisites

  • Adaptable to the physical and logical constraints needed by the business, both initially and as needed over time for performance, security, and scalability concerns

  • Resilient to changes in physical infrastructure components, caused by failure or required maintenance

  • Capable of providing optimized object and block services to client access nodes, either directly or through gateway services

  • Data protection configurable to meet the customer’s individual needs at a granular level

5 Architectural Overview

SUSE Enterprise Storage provides everything that Veeam needs for storage, from the high-performance tier to the Cloud Tier repository.

Veeam and SES Architecture

5.1 Solution Architecture - RBD

SUSE Enterprise Storage can be used as a storage location for Veeam via a Veeam Linux repository. The architecture and settings described below were used during testing to achieve the repository level of validation.

The architecture used to achieve compliance with the Veeam Ready program uses a RADOS Block Device on a Veeam Linux repository server. This paper will also briefly discuss CephFS as a potential storage mechanism.

5.2 Solution Architecture - S3

Veeam Backup & Replication enables the usage of storage targets that are compatible with specific S3 API calls. SUSE Enterprise Storage provides a target that is certified with Veeam software.

5.3 Architectural Notes and Discussion

For both RBD and CephFS, having both the proxy and the Linux repository server can offer several benefits:

  1. The Veeam Proxy server located on the ESX server is able to directly mount the VMware snapshot images, resulting in the highest possible streaming read performance for the backups. The figure below illustrates the traffic flow of a backup process.

  2. In large deployments, network communication from the Veeam Proxy to the Linux Repository server flows across the ESX server without traversing the physical network infrastructure. This results in very high network performance between these two critical pieces of infrastructure for the Veeam Backup and Restore environment.

Veeam Data Flow

6 Pool Configuration

When configuring the SUSE Enterprise Storage cluster for use as a backup target, the data protection scheme is an important consideration. There are two main options for data protection, each with advantages and disadvantages.

The first scheme is replication. It works by replicating each data chunk on each of the specified number of unique devices. The default is three. If the failure domain is assumed to be at the storage host level, this means the cluster could survive the loss of two storage servers without data loss. The downside of replication is the space overhead, which is 200 percent or two thirds of the total cluster capacity.

The performance characteristics of replication are that it has lower latency than erasure coding. This is especially true where the I/O pattern is that of small random I/O.

The second scheme is erasure coding (EC). It works by splitting the data into the specified number of chunks (k) and then performing a mathematical calculation to create the requested number of EC chunks (m). Again, assuming the failure domain is at the host level, a system using an EC scheme of k=6, m=3 has an overhead of only 50 percent, or one third of the total cluster capacity. Because EC actually writes less data, it is sometimes faster than replication for writes, but slower on the reads due to the requirement to reassemble the data from multiple nodes.

Another aspect to consider is the total cluster size. In general, it is not recommended to use EC with a cluster of fewer than seven storage nodes. When using EC with SUSE Enterprise Storage, it is recommended that the data chunks + (2x erasure coding chunks) are less than or equal to the cluster node count.

Expressed in a formula this looks as follows:

data chunks [k] + (coding chunks [m] * 2) <= cluster node count

A cluster size of seven would thus allow for three data chunks plus 2 erasure coding chunks plus 2 spare nodes to allow for device failures. In a larger cluster, EC profiles of 8+3, 6+4, 9+3 and the like are not uncommon and represent superior percentages of storage available for data.

An additional consideration is the availability of hardware accelerators for erasure coding. Intel CPUs provide such an accelerator, which is specified with the plug-in option when creating the erasure coding profile for the pool.

ceph osd erasure-code-profile set veeam_ec plugin=isa k=8 m=3

6.1 Ceph Protocol – RBD

The RBD protocol is the native block protocol for Ceph. Clients leveraging RBD could be termed “intelligent” because they are able to leverage the CRUSH algorithm to determine where data will be placed. Thus they can communicate directly to each individual storage device. The result is performance that scales horizontally with the cluster.

As a client protocol, RBD has numerous tuning options that can be controlled on each client, or for the cluster as a whole. These include things like caching type, size, etc. For this effort, some tuning was performed for the caching parameters to optimize performance for the I/O patterns being tested. These are outlined in the deployment section below.

The Veeam Linux repository maps the RBD device created as a block device, and then a file system is placed on it. This allows for tuning that can be applied to the particular filesystem you plan to use and to accelerate performance.

6.2 Ceph protocol – CephFS

CephFS is a distributed file system available for and integrated with Veeam. Our testing showed similar or better performance as RBD. An advantage of this particular protocol choice is that multiple repositories can be hosted on the same massively scalable distributed file system. This also means that if a backup server disappears or fails, it is quite simple to add the repository to another server.

6.3 Ceph Protocol - S3

The S3 protocol has become the de facto standard for use in developing web-scale friendly applications that store and retrieve data. The protocol uses either HTTP or HTTPS as the data transport protocol. This makes it capable of leveraging standard load-balancing and proxy technologies to ensure scalability and improved security.

7 Deployment Recommendations

This deployment section should be seen as a supplement to the available online documentation. This is specifically the case for the SUSE Enterprise Storage 5 Deployment Guide and the SUSE Linux Enterprise Server Administration Guide.

7.1 Network Deployment Overview

When working with a backup environment, there are multiple considerations when it comes to designing the network to support horizontally scaling storage. These include single stream throughput, aggregate write throughput, verification job requirements, and any replication traffic that may be needed. It is important to identify the maximum simultaneous throughput that is required to support the backup traffic and then account for a back-end operation like reconstruction of a failed node.

If two physically separate networks are used, it is somewhat simple to calculate and allocate an appropriate amount of network bandwidth for back-end reconstruction for a replicated storage environment.

[back-end network throughput] = [front-end network] * 3

Sizing the network in this way ensures that there is sufficient bandwidth for two operations writing from the primary OSD to the two replica OSDs while a reconstruction operation is taking place.

For an environment where the networks are all sharing the same physical paths, but segmented using VLANs, the calculation would be similar.

[aggregate backup performance required] = [backup throughput required] * 4

8 RBD/CephFS Deployment

This section outlines the steps required to deploy an environment similar in architecture to the tested environment.

8.1 Deploying and Preparing the SUSE Enterprise Storage Environment

Build and deploy a SUSE Enterprise Storage Cluster as described in the SUSE Enterprise Storage Deployment Guide (https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_deployment/data/book_storage_deployment.html)

  • Create an EC profile from the command line on the admin node.

    ceph osd erasure-code-profile set veeam_ec plugin=isa k=4 m=2

8.1.1 Creating Pools

  • Create one pool for each protocol being supported. To create an EC pool, type the following:

    ceph osd pool create ecpool 512 512 erasure veeam_ec
  • Create the RBD.

    rbd create reppool/veeam -size 5T -data-pool ecpool

8.2 Creating and Configuring Linux Repository Virtual Machines

Perform the following steps:

  1. Create virtual machines on ESX.

    • Configure resource reservations.

  2. Perform a base Linux install.

    SLES 12 SPx
    1. Select the KVM Host install pattern.

    2. Unselect the KVM Host from Software Selection on the summary screen.

    3. Enable Multi-queue block IO (blk-mq).

    SLES 15 SPx
    1. Select the base server pattern.

    2. Select to disable mitigations during installation.

    All
    1. Set network tuning parameters in /etc/sysctl.conf for SUSE Enterprise Storage nodes and Linux target(s) as found in Appendix A, OS Networking Configuration.

  3. Add repositories and packages for Veeam.

  4. Modify /etc/ssh/sshd_config to enable the Veeam services to work correctly.

    1. See https://www.veeam.com/kb1512.

    2. Find the PasswordAuthentication parameter and set the value to "yes".

    3. Save and restart the SSHD daemon.

      systemctl restart sshd.service
  5. To add the required perl-SOAP-Lite, the Software Development Kit (SDK) repositories need to be added.

    For SLES12SP3
    SUSEConnect -p sle-sdk/12.3/x86_64
    zypper in perl-SOAP-Lite
    For SLES15
    SUSEConnect -p PackageHub/15/x86_64
    zypper in perl-SOAP-Lite
  6. The following script can be used to validate that all packages/perl modules are installed. If any are missing, they should be added.

    #!/bin/bash
    for i in constant Carp Cwd Data::Dumper Encode Encode::Alias \
       Encode::Config Encode::Encoding Encode::MIME::Name Exporter \
       Exporter::Heavy File::Path File::Spec File::Spec::Unix \
       File::Temp List::Util Scalar::Util SOAP::Lite Socket Storable threads
    do
     echo "Checking for perl $i;..."
     perldoc -lm $i >/dev/null
     perlpkgfound=$?
     if [ ! $perlpkgfound -eq 1 ]
     then
       echo Installed
     fi
    done
  7. Add the package ceph-common to the Linux target.

    zypper in ceph-common
  8. Add the client key and ceph.conf file to the directory /etc/ceph.

    1. From the admin node, run:

      scp /etc/* root@vtarget:/etc/ceph/
  9. Edit the directory /etc/ceph/rbdmap on the Linux repository nodes and add the RBD.

    RbdDevice Parameters
    poolname/imagename  id=client,keyring=/etc/ceph/ceph.client.keyring
    reppool/veeam       id=admin,keyring=/etc/ceph/ceph.client.admin.keyring
  10. Enable and start the systemd rbdmap service.

    systemctl enable rbdmap
    systemctl start rbdmap
  11. Use the command mkfs.xfs to create an XFS file system on the target.

    mkfs.xfs /dev/rbd0
  12. Add the mount point.

    mkdir /veeam
  13. Add your entry to the file systems table configuration file fstab and include any tuning desired.

    /dev/rbd0 /veeam xfs _netdev 1 1
  14. Mount the file system.

    mount -a
  15. Verify it mounted.

    mount

    The output should be as follows:

    /dev/rbd0 on /veeam type xfs (rw,relatime,attr2,inode64,sunit=8192,swidth=8192,noquota,_netdev)
    [source]

9 Adding Veeam Linux Repository

  1. Within the Veeam Console, click "Backup Infrastructure" on the left-hand menu bar. Right-click "Backup Repositories" followed by "Add Backup Repository".

  2. Provide a friendly name to distinguish the multiple repositories.

  3. Choose a repository type and click "Next".

  4. Click "Add New", enter the details and click "Next".

  5. Click "Add" to add credentials that have Read, Write, and Execute permissions to the mounted storage location and the ability to execute Perl code. Then click "OK" and "Finish".

  6. Ensure the credentials are selected and click "Next".

  7. Click "Browse" and select the path to the mounted RBD with the XFS file system. Then click "Advanced" to select Use per-VM backup files.

  8. Finish the process by selecting a mount server (Veeam Backup Server or proxy) and enabling a vPower NFS service as desired. Then select "Finish".

For the most accurate steps, review the latest Veeam documentation.

9.1 Disabling Multiple Streams

Multiple streams are designed to enhance performance for higher latency environments. It may be desirable to disable this for the local deployment. This can be done when defining the job, by setting it for the proxy, or globally. In all cases, it involves selecting the "Network Traffic Rules" and de-selecting "Multiple Streams".

9.2 Defining a Backup Job

Create a backup job. On the Storage setting tab, select the correct proxy and repository. On the Storage screen, select "Advanced". On the Storage tab, set the appropriate rules for your environment.

10 S3 Environment for Scale-Out Backup Repository

Object storage repositories augment your scale-out backup abilities. This simplifies offloading existing backup data directly to cloud-based object storage. In our case, Veeam can leverage SUSE Enterprise Storage to offload to S3 compatible environments such as Amazon S3, Microsoft Azure Blob Storage, IBM Cloud Object Storage.

Configuring SUSE Enterprise Storage as an S3 target for a Veeam Object Storage Repository usually requires only a few steps.

10.1 SUSE Enterprise Storage Preparation

The following steps need to be completed to prepare SUSE Enterprise Storage:

  1. Install Rados Gateway

    • You can add a Rados Gateway role to an existing monitor node or dedicated node for larger environments (recommended).

    • In our case, we used a monitor named "example.ses5". This name will be specific to the name you set for your Rados Gateway.

    • For more information, see the Rados Gateway Installation Guide.

  2. Navigate to /srv/pillar/ceph/proposals/policy.cfg and match the existing host with a new role.

    root@master # role-rgw/cluster/example.ses5.sls
  3. Run stage 2 to update the pillar.

    root@master # salt-run state.orch ceph.stage.2
  4. After making these custom changes, you should run stage 3 and 4 to apply the updates. For additional details, see the SUSE Enterprise Storage Guide.

    root@master # salt-run state.orch ceph.stage.3
    root@master # salt-run state.orch ceph.stage.4

10.1.1 Installing and Configuring the RGW Daemons

  • IMPORTANT: Ensure that HTTPS/SSL is enabled on the target pool to allow Veeam Object Storage Repository to connect. This allows for secure communication supported by Veeam. For more details, see the following section about enabling HTTPS/SSL for Object Gateways.

  • Modify the rgw.conf to allow port 443 (or 80 + 443). Navigate to the srv/salt/ceph/configuration/files/ceph.conf.d directory to edit the rgw.conf file.

    root@master # cd /srv/salt/ceph/configuration/files/ceph.conf.d
    root@master # vi rgw.conf
  • Edit the contents of this file with the appropriate information listed below. The following example represents what was used during our testing process (parameter values below will vary):

    [client{{ client }}]
    rgw frontends = "civetweb port=80+443s ssl_certificate=/etc/ceph/rgw.pem"
    rgw dns name = {{ fqdn }}
    rgw enable usage log = false
    rgw thread pool size = 512
    rgw ops log rados = false
    rgw max chunk size = 4194304
    rgw num rados handles = 4
    rgw usage max user shards = 4
    rgw cache lru size = 100000
  • Validate the Rados Gateway is in "active (running)" state by running systemctl. In our example, the Rados Gateway is called "example.ses5". Exchange that with the name of your Rados Gateway.

    systemctl status ceph-radosgw@example.ses5

10.1.2 Configuring the Storage Pools

  • Storage pools need to be created to host the Object Storage Repository data. Create a Ceph Object Pool for the Rados Gateway. You can do this via the openATTIC dashboard or command line. The dashboard can help with your Placement Groups (PG) calculation. In our example, we use 2048 (depends on environment). Examples of both are as follows:

SES Pool
  • Create an erasure code profile from the command line on the admin node.

    ceph osd erasure-code-profile set veeam_ec plugin=isa k=4 m=2
  • Create the required pools.

    ceph osd pool create default.rgw.veeam.data 2048 2048 erasure veeam_ec
    ceph osd pool create default.rgw.veeam.index 2048 2048 erasure veeam_ec
    ceph osd pool create default.rgw.veeam.non-ec 2048 2048 replicated
10.1.2.1 Creating an S3 User
  • When accessing the Object Gateway through the S3 interface, you need to create an S3 user by running the below command and adjusting the options in <> brackets. This can also be done using the openATTIC dashboard by going to 'Object Gateway > User' tab.

    root@master # radosgw-admin user create --uid=<username> \
    --display-name=<display-name> --email=<email>
  • Configure a placement policy and set the user placement.

    radosgw-admin zonegroup placement add --rgw-zonegroup default --placement-id veeam
    
    radosgw-admin zone placement add --rgw-zone default --placement-id veeam --data-pool default.rgw.veeam.data --index-pool default.rgw.veeam.index --data-extra-pool default.rgw.veeam.non-ec
    
    radosgw-admin metadata get user:veeam > user.json
  • Edit the user.json and change default_placement to the placement ID created.

    "default_placement":"veeam"
  • Next, save the changes and commit them.

    radosgw-admin metadata put user:<user-id> <user.json

10.2 Veeam Object Storage Repository Configuration

After successfully completing the steps above for SUSE Enterprise Storage preparation, you can proceed to properly configuring Veeam Object Storage Repository. Veeam documents this process very well. Follow the step-by-step instructions at the Veeam help center.

Tips and reminders
  • Veeam will prompt you for a service point. Use the IP of the gateway node.

  • Provide the access and secret keys, which can be found in 'Open Attic > Object > user' tab.

  • The Veeam software wizard may ask for a self-signed certificate. You will receive an error if the self-signed certificate is not properly imported to Veeam Server.

  • A bucket can be created with the openATTIC dashboard or any S3.

  • Verify the connection to your bucket via S3 browser or any S3-compatible tools.

11 Object Storage Repository Tuning Parameters

The following steps can be implemented to improve performance specific to your workload. Refer to the SUSE Enterprise Storage Tuning guide for additional instructions.

  • Modify the ceph.conf file found in the directory /srv/salt/ceph/configuration/files/ceph.conf.d/rgw.conf.

    [client.{{ client }}]
    rgw frontends = "civetweb port=80+443s ssl_certificate=/etc/ceph/rgw.pem error_log_file=/var/log/ceph/dl360-3.rgw.error.log"
    #rgw frontends = "beast port=80 ssl_port=443 ssl_certificate=/etc/ceph/rgw.pem"
    rgw dns name = {{ fqdn }}
    rgw enable usage log = false
    rgw thread pool size = 512
    rgw max chunk size = 4194304
    #abhi changes
    rgw_obj_stripe_size = 4194304 # (default 4M for luminous)
    rgw_list_bucket_min_readahead = 4000 #(default 1000)
    rgw_max_listing_results = 4000
    rgw_cache_expiry_interval = 1800 #(default 900s)
    rgw_enable_usage_log = false
    rgw_enable_ops_log = false
    rgw dynamic resharding = false
    rgw override bucket index max shards = 50 # alternatively we reshard the bucket manually after creation
    rgw bucket index max aio = 16 # default 8
    rgw cache lru size = 50000
    # GC settings
    rgw_gc_obj_min_wait = 21600 #(default 2_hr), decreasing will more actively purge objects
    rgw gc processor period = 7200 #(default 1hr, decreasing more actively purges deletion)
    rgw objexp gc interval = 3600 # default 10_min, we dont run swift objexp. so no need to run this
    objecter inflight op bytes = 1073741824 # default 100_M
    objecter inflight ops = 24576
  • This configuration needs to be pushed out to all Rados Gateways that may be running in the SUSE Enterprise Storage environment.

    salt 'salt_master_hostname' state.apply ceph.configuration.create
    salt '*' state.apply ceph.configuration

12 Special Note

When using SSDs directly on the ESX server as a restoration target, you may want to disable VAAI for VMware to perform optimally. For more information, see https://kb.vmware.com/s/article/1033665.

13 Conclusion

Veeam Backup & Replication represents a strong option for data center backup when combined with SUSE Enterprise Storage. The benefits to customers include increased efficiency and performance, while achieving industry leading cost efficiency.

14 Resources