Jump to contentJump to page navigation: previous page [access key p]/next page [access key n]
documentation.suse.com / Documentação do SUSE Linux Enterprise Server / Administration Guide / System / Persistent Memory
Applies to SUSE Linux Enterprise Server 12 SP4

24 Persistent Memory

This chapter contains additional information about using SUSE Linux Enterprise Server with non-volatile main memory, also known as Persistent Memory, comprising one or more NVDIMMs.

24.1 Introduction

Persistent memory is a new type of computer storage, combining speeds approaching those of normal dynamic RAM (DRAM) along with RAM's byte-by-byte addressability, plus the permanence of solid-state disks (SSDs).

Like conventional RAM, it is installed directly into motherboard memory slots. As such, it is supplied in the same physical form factor as RAM—as DIMMs. These are known as NVDIMMs: non-volatile dual inline memory modules.

Unlike RAM, though, persistent memory is also similar to flash-based SSDs in several ways. Both are based on forms of solid-state memory circuitry, but despite this, both provide non-volatile storage: their contents are retained when the system is powered off or restarted. For both forms of medium, writing data is slower than reading it, and both support a limited number of rewrite cycles. Finally, also like SSDs, sector-level access to persistent memory is possible if that is more suitable for a particular application.

Different models use different forms of electronic storage medium, such as Intel 3D XPoint, or a combination of NAND-flash and DRAM. New forms of non-volatile RAM are also in development. This means that different vendors and models of NVDIMM offer different performance and durability characteristics.

Because the storage technologies involved are in an early stage of development, different vendors' hardware may impose different limitations. Thus, the following statements are generalizations.

Persistent memory is up to ten times slower than DRAM, but around a thousand times faster than flash storage. It can be rewritten on a byte-by-byte basis rather than flash memory's whole-sector erase-and-rewrite process. Finally, while rewrite cycles are limited, most forms of persistent memory can handle millions of rewrites, compared to the thousands of cycles of flash storage.

This has two important consequences:

  • It is not possible with current technology to run a system with only persistent memory and thus achieve completely non-volatile main memory. You must use a mixture of both conventional RAM and NVDIMMs. The operating system and applications will execute in conventional RAM, with the NVDIMMs providing very fast supplementary storage.

  • The performance characteristics of different vendors' persistent memory mean that it may be necessary for programmers to be aware of the hardware specifications of the NVDIMMs in a particular server, including how many NVDIMMs there are and in which memory slots they are fitted. This will obviously impact hypervisor use, migration of software between different host machines, and so on.

This new storage subsystem is defined in version 6 of the ACPI standard. However, libnvdimm supports pre-standard NVDIMMs and they can be used in the same way.

24.2 Terms


A region is a block of persistent memory that can be divided up into one or more namespaces. You cannot access the persistent memory of a region without first allocating it to a namespace.


A single contiguously-addressed range of non-volatile storage, comparable to NVM Express SSD namespaces, or to SCSI Logical Units (LUNs). Namespaces appear in the server's /dev directory as separate block devices. Depending on the method of access required, namespaces can either amalgamate storage from multiple NVDIMMs into larger volumes, or allow it to be partitioned into smaller volumes.


Each namespace has a mode that defines which NVDIMM features are enabled for that namespace. Sibling namespaces of the same parent region will always have the same type, but might be configured to have different modes. Namespace modes include:


A memory disk. Does not support DAX. Compatible with other operating systems.


For legacy file systems which do not checksum metadata. Suitable for small boot volumes. Compatible with other operating systems.


File system-DAX mode. Default if no other mode is specified. Creates a block device (/dev/pmemX [.Y]) which supports DAX for ext4 or XFS.


Device-DAX mode. Creates a single-character device file ( /dev/daxX.Y ). Does not require file system creation.


Each namespace and region has a type that defines the way in which the persistent memory associated with that namespace or region can be accessed. A namespace always has the same type as its parent region. There are two different types: Persistent Memory and Block Mode.

Persistent Memory (PMEM)

PMEM storage offers byte-level access, just like RAM. This enables Direct Access (DAX), meaning that accessing the memory bypasses the kernel's page cache and goes direct to the medium. Additionally, using PMEM, a single namespace can include multiple interleaved NVDIMMs, allowing them all to be accessed as a single device.

Block Mode (BLK)

BLK access is in sectors, usually of 512 bytes, through a defined access window, the aperture. This behavior is more like a traditional disk drive. This also means that both reads and writes are cached by the kernel. With BLK access, each NVDIMM is accessed as a separate namespace.

Some devices support both PMEM and BLK modes. Additionally, some allow the storage to be split into separate namespaces, so that some can be accessed using PMEM and some using BLK.

Apart from devdax namespaces, all other types must be formatted with a file system such as ext2, ext4 or XFS, just as with a conventional drive.

Direct Access (DAX)

DAX allows persistent memory to be directly mapped into a process's address space, for example using the mmap system call. This is suitable for directly accessing large amounts of PMEM without using any additional RAM, for registering blocks of PMEM for RDMA, or for directly assigning it to virtual machines.

DIMM Physical Address (DPA)

A memory address as an offset into a single DIMM's memory; that is, starting from zero as the lowest addressable byte on that DIMM.


Metadata stored on the NVDIMM, such as namespace definitions. This can be accessed using DSMs.

Device-specific method (DSM)

ACPI method to access the firmware on an NVDIMM.

24.3 Use Cases

24.3.1 PMEM with DAX

It is important to note that this form of memory access is not transactional. In the event of a power outage or other system failure, data may not be completely written into storage. PMEM storage is only suitable if the application can handle the situation of partially-written data. Applications that benefit from large amounts of byte-addressable storage.

If the server will host an application that can directly use large amounts of fast storage on a byte-by-byte basis, the programmer can use the mmap system call to place blocks of persistent memory directly into the application's address space, without using any additional system RAM. Avoiding Use of the Kernel Page Cache

If you wish to conserve the use of RAM for the page cache, and instead give it to your applications. For instance, non-volatile memory could be dedicated to holding virtual machine (VM) images. As these would not be cached, this would reduce the cache usage on the host, allowing more VMs per host.

24.3.2 PMEM with BTT

This is useful when you want to use the persistent memory on a set of NVDIMMs as a disk-like pool of very fast storage.

To applications, such devices just appear as very fast SSDs and can be used like any other storage device. For example, LVM can be layered on top of the non-volatile storage and will work as normal.

The advantage of BTT is that sector write atomicity is guaranteed, so even sophisticated applications that depend on data integrity will keep working. Media error reporting works through standard error-reporting channels.

24.3.3 BLK storage

Although it is more robust against single-device failure, this requires additional management, as each NVDIMM appears as a separate device. Thus, PMEM with BTT is generally preferred.


BLK storage is deprecated and is not supported in later versions of SUSE Linux Enterprise Server.

24.4 Tools for Managing Persistent Memory

To manage persistent memory, it is necessary to install the ndctl package. This also installs the libndctl package, which provides a set of user-space libraries to configure NVDIMMs.

These tools work via the libnvdimm library, which supports three types of NVDIMMs:

  • PMEM

  • BLK

  • Simultaneous PMEM and BLK

The ndctl utility has a helpful set of man pages, accessible with the command:

ndctl help subcommand

To see a list of available subcommands, use:

ndctl --list-cmds

The available subcommands include:


Displays the current version of the NVDIMM support tools.


Makes the specified namespace available for use.


Prevents the specified namespace from being used.


Creates a new namespace from the specified storage devices.


Removes the specified namespace.


Makes the specified region available for use.


Prevents the specified region from being used.


Erases the metadata from a device.


Retrieves the metadata of the specified device.


Displays available devices.


Displays information about using the tool.

24.5 Setting Up Persistent Memory

24.5.1 Viewing Available NVDIMM Storage

The ndctl list command can be used to list all available NVDIMMs in a system.

In the following example, the system has three NVDIMMs which are in a single, triple-channel interleaved set.

root # ndctl list --dimms


With a different parameter, ndctl list will also list the available regions.


Regions may not appear in numerical order.

Note that although there are only three NVDIMMs, they appear as four regions.

root # ndctl list --regions


The space is available in two different forms: either as three separate 64 GB regions of type BLK, or as one combined 189 GB region of type PMEM which presents all the space on the three interleaved NVDIMMs as a single volume.

Note that the displayed value for available_size is the same as that for size. This means that none of the space has been allocated yet.

24.5.2 Configuring the Storage as a Single PMEM Namespace with DAX

For the first example, we will configure our three NVDIMMs into a single PMEM namespace with Direct Access (DAX).

The first step is to create a new namespace.

root # ndctl create-namespace --type=pmem --mode=fsdax --map=memory

This creates a block device /dev/pmem3, which supports DAX. The 3 in the device name is inherited from the parent region number, in this case region3.

The --map=memory option sets aside part of the PMEM storage space on the NVDIMMs so that it can be used to allocate internal kernel data structures called struct pages. This allows the new PMEM namespace to be used with features such as O_DIRECT I/O and RDMA.

The reservation of some persistent memory for kernel data structures is why the resulting PMEM namespace has a smaller capacity than the parent PMEM region.

Next, we verify that the new block device is available to the operating system:

root # fdisk -l /dev/pmem3
Disk /dev/pmem3: 186 GiB, 199764213760 bytes, 390164480 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Before it can be used, like any other drive, it must be formatted. In this example, we format it with XFS:

root # mkfs.xfs /dev/pmem3
meta-data=/dev/pmem3      isize=256    agcount=4, agsize=12192640 blks
         =                sectsz=4096  attr=2, projid32bit=1
         =                crc=0        finobt=0, sparse=0
data     =                bsize=4096   blocks=48770560, imaxpct=25
         =                sunit=0      swidth=0 blks
naming   =version 2       bsize=4096   ascii-ci=0 ftype=1
log      =internal log    bsize=4096   blocks=23813, version=2
         =                sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none            extsz=4096   blocks=0, rtextents=0

Next, we can mount the new drive onto a directory:

root # mount -o dax /dev/pmem3 /mnt/pmem3

Then we can verify that we now have a DAX-capable device:

root # mount | grep dax
/dev/pmem3 on /mnt/pmem3 type xfs (rw,relatime,attr2,dax,inode64,noquota)

The result is that we now have a PMEM namespace formatted with the XFS file system and mounted with DAX.

Any mmap() calls to files in that file system will return virtual addresses that directly map to the persistent memory on our NVDIMMs, completely bypassing the page cache.

Any fsync or msync calls on files in that file system will still ensure that modified data has been fully written to the NVDIMMs. These calls flush the processor cache lines associated with any pages that have been modified in userspace via mmap mappings. Removing a Namespace

Before creating any other type of volume that uses the same storage, we must unmount and then remove this PMEM volume.

First, unmount it:

root # umount /mnt/pmem3

Then disable the namespace:

root # ndctl disable-namespace namespace3.0
disabled 1 namespace

Then delete it:

root # ndctl destroy-namespace namespace3.0
destroyed 1 namespace

24.5.3 Creating a PMEM Namespace with BTT

In the next example, we create a PMEM namespace that uses BTT.

root # ndctl create-namespace --type=pmem --mode=sector

Next, verify that the new device is present:

root # fdisk -l /dev/pmem3s
Disk /dev/pmem3s: 188.8 GiB, 202738135040 bytes, 49496615 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Like the DAX-capable PMEM namespace we previously configured, this BTT-capable PMEM namespace consumes all the available storage on the NVDIMMs.


The trailing s in the device name (/dev/pmem3s) stands for sector and can be used to easily distinguish PMEM and BLK namespaces that are configured to use the BTT.

The volume can be formatted and mounted as in the previous example.

The PMEM namespace shown here cannot use DAX. Instead it uses the BTT to provide sector write atomicity. On each sector write through the PMEM block driver, the BTT will allocate a new sector to receive the new data. The BTT atomically updates its internal mapping structures after the new data is fully written so the newly written data will be available to applications. If the power fails at any point during this process, the write will be completely lost and the application will have access to its old data, still intact. This prevents the condition known as "torn sectors".

This BTT-enabled PMEM namespace can be formatted and used with a file system just like any other standard block device. It cannot be used with DAX. However, mmap mappings for files on this block device will use the page cache.


In both these examples, space from all the NVDIMMs is combined into a single volume. Just as with a non-redundant disk array, this means that if any individual NVDIMM suffers an error, the contents of the entire volume could be lost. The more NVDIMMs are included in the volume, the higher the chance of such an error. Removing the PMEM Volume

As in the previous example, before re-allocating the space, we must first remove the volume and the namespace:

root # ndctl disable-namespace namespace3.0
disabled 1 namespace

root # ndctl destroy-namespace namespace3.0
destroyed 1 namespace

24.5.4 Creating BLK Namespaces

In this example, we will create three separate BLK devices: one per NVDIMM.

One advantage of this approach is that if any individual NVDIMM fails, the other volumes will be unaffected.


The commands must be repeated for each namespace.

root # ndctl create-namespace --type=blk --mode=sector
root # ndctl create-namespace --type=blk --mode=sector
root # ndctl create-namespace --type=blk --mode=sector

Next, we can verify that the new devices exist:

root # fdisk -l /dev/ndblk*
Disk /dev/ndblk0.0s: 63.4 GiB, 68115001344 bytes, 16629639 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/ndblk1.0s: 63.4 GiB, 68115001344 bytes, 16629639 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

Disk /dev/ndblk2.0s: 63.4 GiB, 68115001344 bytes, 16629639 sectors
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

The block devices generated for BLK namespaces are named /dev/ndblkX.Y where X is the parent region number and Y is a unique namespace number within that region. So, /dev/ndblk2.0s is child namespace number 0 of region 2.

As in the previous example, the trailing s means that this namespace is configured to use the BTT—in other words, for sector-based access. Because they are accessed via a block window, programs cannot use DAX, but accesses will be cached.

As ever, these devices must all be formatted and mounted before they can be used.

24.6 For More Information

More about this topic can be found in the following list:

  • Persistent Memory Wiki

    Contains instructions for configuring NVDIMM systems, information about testing, and links to specifications related to NVDIMM enabling. This site is developing as NVDIMM support in Linux is developing.

  • Persistent Memory Programming

    Information about configuring, using and programming systems with non-volatile memory under Linux and other operating systems. Covers the NVM Library (NVML), which aims to provide useful APIs for programming with persistent memory in userspace.

  • LIBNVDIMM: Non-Volatile Devices

    Aimed at kernel developers, this is part of the Documentation folder in the current Linux kernel tree. It talks about the different kernel modules involved in NVDIMM enablement, lays out some technical details of the kernel implementation, and talks about the sysfsinterface to the kernel that is used by the ndctl tool.

  • GitHub: pmem/ndctl

    Utility library for managing the libnvdimm subsystem in the Linux kernel. Also contains userspace libraries, as well as unit tests and documentation.