Filesystem Recommendations
Introduction
CloudBD disks have higher performance and are more cost efficient when used with a recommended filesystem type and formatting options. Ideal filesystems for CloudBD can support large disk capacities, issue trim/discard ops, and have low metadata overhead.
CloudBD recommends using Ext4 with the following mkfs.ext4 and mount options
# example ext4 filesystem format mkfs.ext4 -b 4096 -T largefile \ -E stride=512,stripe_width=512,lazy_itable_init=0,lazy_journal_init=0,packed_meta_blocks=1 \ /dev/mapper/remote:testdisk
# example /etc/fstab entry /dev/mapper/remote:testdisk /mnt ext4 _netdev,discard,commit=30
Enable periodic fstrim on your server (either through cron or systemd) to reclaim empty disk space
General Mount Options
_netdev
This mount option is REQUIRED for all filesystem types. It tells Linux to not mount this filesystem until after the network has started. CloudBD disks are network devices and will fail to start without the network.
discard
This mount option is recommended for all filesystem types. It tells the filesystem to remove unused storage blocks after files are deleted. This reduces the remote storage usage and lowers your storage costs more quickly. Periodic fstrim is also recommended.
Configuring fstab
When creating a filesystem on Linux, an entry can be added to the /etc/fstab file so that the filesystem will automatically mount during startup.
CloudBD disks are network devices and the fstab line for a filesystem on a network device REQUIRES the _netdev mount option. Omitting the _netdev option in the fstab entry for a filesystem on a CloudBD disk (or any network disk) can cause the server to fail to start completely and enter its emergency maintainence mode. Some cloud VMs can become unreachable by ssh in their emergency mode. Always be sure to add the _netdev option for CloudBD fstab mount entries to avoid this problem.
Example CloudBD fstab entries:
/dev/mapper/remote:disk1 /mnt/disk1 ext4 _netdev,discard,commit=30
Periodic fstrim
Periodic fstrim is strongly recommended in addition to mounting with the discard mount option. Depending on the filesystem type and the size/alignment of deleted data, the discard mount option may not be able to delete all possible blocks. fstrim is able to identify and free all unused blocks on the disk.
Filesystem Types
Recommended
-
Supports disks up to 128 TiB, trim/discard, low metadata overhead, excellent crash recovery, handles large quantities of files well
NOT Recommended
Btrfs
Metadata writes scale in size and quantity over time
ZFS
Currently no trim/discard support for Linux
NOT Supported
XFS
XFS can deadlock in the kernel when used by a user space device driver.
Ext4 Recommendations
Mkfs Options
-b 4096
Set the fundamental block size of the filesystem to 4k.
-T largefile or -T largefile4 or -i <bytes-per-inode>
Set the bytes-per-inode ratio. A larger ratio will have less metadata overhead on the filesystem. A conservative ratio is half the average expected file size.
For filesystem sizes <= 16 TiB, use -T largefile (equivalent to -i 1048576)
For filesystem sizes > 16 TiB, use -T largefile4 (equivalent to -i 4194304)
-E stride=512,stripe_width=512,lazy_itable_init=0,lazy_journal_init=0,packed_meta_blocks=1
Match filesystem stride and stripe_width sizes (in blocks) to CloudBD's blocksize. The stride should be set to the CloudBD blocksize (2 MiB default) divided by the filesystem sector size (set to 4k with -b option) and the stripe_width should be set to equal the stride (Note: If using a non-default blocksize for your CloudBD disk, adjust the stride and stripe_width values to match your blocksize).
Disable the lazy inode and journal init for faster and more efficient writing of the metadata.
Pack all metadata blocks together to help CloudBD be more cost efficient.
For mke2fs tools that do not support the -E packed_meta_blocks option, use
-G 4096
Set a large flex blockgroup value to group more metadata together. Helps CloudBD to be more cost efficient.
mkfs.ext4 example:
mkfs.ext4 -b 4096 -T largefile \
-E stride=512,stripe_width=512,lazy_itable_init=0,lazy_journal_init=0,packed_meta_blocks=1 \
/dev/mapper/remote:disk1
Mount Options
_netdev
Required for all filesystems on CloudBD disks. Delays mount until after the network has started.
discard
Recommended for all filesystems on CloudBD disks. Sends trim/discard requests as soon as files are deleted to recover remote storage space sooner. Periodic fstrim is also recommended.
commit=30
Set the journal commit frequency to be more cost efficient with storage ops
/etc/fstab entry example:
/dev/mapper/remote:disk1 /mnt/disk1 ext4 _netdev,discard,commit=30
manual mount command example:
mount -o _netdev,discard,commit=30 /dev/mapper/remote:disk1 /mnt/disk1