BTIER-1.3.3 and what it can do for you

Introduction

Btier is an automated tiering block device for the Linux kernel. It works with any type of file-system and it’s purpose is to automatically move data between different block devices in such a way that the performance is improved while the costs for storage are reduced.

More information about automated tiered storage can be found here:Automated_tiered_storage

Features

  • Can use up to 16 devices as part of a btier device.
  • Supports using files or (preferred) blockdevices as part of the btier device.
  • Provides a user-space API that can be used to control data migration or retrieve information about data allocation.
  • Comes with a build-in kernel space data migration engine as well as example scripts to show how data migration can be controlled from user-space.
  • Handles up to 210.000 random 4k IOPS provided that the underlying hardware supports it.
  • Sequential data transfers up to 1100MB/sec.
  • Sequentializes random IO patterns since writes are written in a log structured manner which may boost performance on IOPS constrained storage devices.
  • Supports discard / trim when a recent >= 3.0 kernel is used.
  • Can expand the size of the tier when one of the underlying devices is expanded.
  • Supports kernels ranging from 2.6.32 up to 3.16
  • Most configuration changes can be made with the help of sysfs

Maturity

Btier is currently considered to be stable and people are reporting to use it for production. btier-1.2.6 is still recommended for production since active development is taking place on the 1.3.x code. Once btier-1.4.0 is released then btier-1.3.latest will be the recommended version for production.

Using btier

Using btier is fairly straight forward. Make sure that the packages that are required for compiling the Linux kernel and the kernel sources are installed.
Download btier from sourceforge:

wget http://sourceforge.net/projects/tier/files/btier-1.3.3/btier-1.3.3.tar.gz
tar xvf btier-1.3.3.tar.gz
cd btier-1.3.3
make
--- make output ---
make -Wall -C /lib/modules/3.2.0-64-generic/build M=/tmp/btier-1.3.3/kernel/btier modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-64-generic'
CC [M] /tmp/btier-1.3.3/kernel/btier/btier_common.o
CC [M] /tmp/btier-1.3.3/kernel/btier/btier_sysfs.o
CC [M] /tmp/btier-1.3.3/kernel/btier/btier_main.o
LD [M] /tmp/btier-1.3.3/kernel/btier/btier.o
Building modules, stage 2.
MODPOST 1 modules
CC /tmp/btier-1.3.3/kernel/btier/btier.mod.o
LD [M] /tmp/btier-1.3.3/kernel/btier/btier.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-64-generic'
gcc -O2 -D_FILE_OFFSET_BITS=64 cli/btier_setup.c -o cli/btier_setup
gcc -O2 -D_FILE_OFFSET_BITS=64 cli/btier_inspect.c -o cli/btier_inspect
gcc -O2 tools/writetest.c -o tools/writetest
gcc -O2 tools/show_block_details.c -o tools/show_block_details
---
sudo make install
sudo make install
[sudo] password for USER:
install -D -m 755 kernel/btier/btier.ko /lib/modules/`uname -r`/kernel/drivers/block/btier.ko
install -D -m 755 -s cli/btier_setup /sbin/btier_setup
install -D -m 755 -s cli/btier_inspect /sbin/btier_inspect
install -D -m 755 rc/btier /etc/init.d/btier
install -D -m 600 rc/bttab /etc/bttab_example
gzip -c man/btier_setup.1 > man/btier_setup.1.gz
gzip -c man/btier_inspect.1 > man/btier_inspect.1.gz
install -D -m 644 man/btier_setup.1.gz /usr/share/man/man1/btier_setup.1.gz
install -D -m 644 man/btier_inspect.1.gz /usr/share/man/man1/btier_inspect.1.gz
depmod -a

Assuming that a user has a md RAID1 that consists of two SSD’s (md0) and a md RAID5/6 that contains a number of SATA drives (md1) then we can now create a btier device like this:

btier_setup -f /dev/md0:/dev/md1 -B -c

Note that the -c flag is only used when the btier device is initially created.
Using ‘-c’ writes the initial metadata to the disks.
The system should now show a new device : /dev/sdtiera

Alternatively we can also create a btier device by using files instead of real block devices. You can even place these files on an NFS share. However the NFS share must be mount -o hard in this case.
Using btier with files or a mix of files and devices is supported and would work like this:

# Create a file
truncate --size=1TB /nfs/tier-nfs.img
btier_setup -f /dev/md0:/dev/md1:/tier-nfs.img -B -c

This creates a btier device that uses md0 as first tier. So md0 would be the fastest device that is available. The second tier is md1. And the last tier is the file on the nfs share (tier-nfs.img).

Configuring data migration

btier supports two different ways when it comes to automatically migrating data. The oldest method is the build-in data migration engine. To use this engine effectively it needs to be tuned to that data access pattern that you have.

To see where the data is currently residing:

cat /sys/block/sdtiera/tier/device_usage
TIER DEVICE SIZE MB ALLOCATED MB AV. READS AV. WRITES TOTAL_READS TOTAL_WRITES
0 ssd.img 96 56 0 37 82 3612
1 sas.img 147 0 0 0 0 0
2 sata.img 197 0 0 0 0 0

Build-in data migration

The active data migration policy can be obtained like this:

cat /sys/block/sdtiera/tier/migration_policy
tier device max_age hit_collecttime
0 ssd.img 86400 43200
1 sas.img 86400 43200
2 sata.img 86400 43200

These default values are usually not what we want.
Let’s see what all this means. Data from tier device 0 is migrated to tier device 1 when it has not been accessed for 86400 seconds (1 day). After being migrated the block is not considered for movement for at least 43200 seconds to avoid a ping-pong effect.
To change these values simply echo the new values to the sysfs file:

echo "1 sas.img 604800 518400" >/sys/block/sdtiera/tier/migration_policy
# cat /sys/block/sdtiera/tier/migration_policy
tier device max_age hit_collecttime
0 ssd.img 86400 43200
1 sas.img 604800 518400
2 sata.img 86400 43200

There is once more important attribute that needs to be tuned.
This is ‘migration_interval’. It determines how frequent the btier data migration thread will run and inspect the individual data chunks to see if they need to be migrated. Since there is a modest performance impact when this job runs we should set this parameter so that data migration does not run to often, but also runs frequently enough to get the job done.


cat /sys/block/sdtiera/tier/migration_interval
14400
# Migrate only once per day
echo 86400 >/sys/block/sdtiera/tier/migration_interval
cat /sys/block/sdtiera/tier/migration_interval
86400

We can also choose to disable / enable data migration.
Please note that whenever data migration is enabled it will perform a scan / migration job.

echo 1 >/sys/block/sdtiera/tier/migration_enable
tail -1 /var/log/syslog
Aug 24 18:29:21 sys kernel: [ 6272.812954] btier: migration is enabled for sdtiera

Should you want to inspect the details about data migration then:

echo 1 >/sys/block/sdtiera/tier/migrate_verbose
tail -1 /var/log/syslog
Aug 24 18:31:16 sys kernel: [ 6388.125519] btier: migrate_verbose is enabled

User controlled data migration

A user can query btier to show where a chunk of data is stored. Btier divides the block device into 1MB chunks. Information about each chunk can be obtained like this:

# Show details about block 0
echo 0 >/sys/block/sdtiera/tier/show_blockinfo
#
cat /sys/block/sdtiera/tier/show_blockinfo
0,1048576,1408897761,75,133

The result is that we now know that block 0 is stored on device 0 at offset 1048576. The blockinfo structure shows what we show here:

struct blockinfo {
unsigned int device;
u64 offset;
time_t lastused;
unsigned int readcount;
unsigned int writecount;
}

The tools directory in the distribution tar contains examples that store this information about every block in a sql database and the use queries to determine where a block should be located.

Migrating a chunk is very simple and can be done like this:

# Move block 0 to device 1
echo 0/1 >/sys/block/sdtiera/tier/migrate_block
tail -f /var/log/syslog
Aug 24 18:40:33 kernel: [ 6944.094873] btier: data_migrator woke up
Aug 24 18:40:33 kernel: [ 6944.094879] btier: do_migrate_direct
Aug 24 18:40:33 kernel: [ 6944.094885] btier: migration is disabled for sdtiera due to user controlled data migration
Aug 24 18:40:33 kernel: [ 6944.094891] btier: sysfs request migrate blocknr 0 to 1
Aug 24 18:40:33 kernel: [ 6944.207499] btier: migrated blocknr 0 from device 0-1048576 to device 1-1048576

NOTE: This logging was obtained by enabling ‘migrate_verbose’.

This API allows us to create our own data migration engine in your favorite programming language. Examples in C, shell and python are included.

TRIM / DISCARD

btier supports trim when a recent > (stock) 3.0 kernel is used.
By default trim is disabled and once has to enable it preferable before formatting the btier device with a file-system that supports trim.
To enable trim:

echo 1 >/sys/block/sdtiera/tier/discard
# Now btier uses discard internally.
echo 1 >/sys/block/sdtiera/tier/discard_to_devices
# Now btier passes discard to underlying devices

Please note that it can even be beneficial to enable trim even when the underlying devices do not support it. In that case btier only uses the discard information to keep it’s own metadata up to date.

Conclusion

Btier is still in development, but already has a respectable number of features.

Enjoy!

 

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to BTIER-1.3.3 and what it can do for you

  1. Riccardo says:

    Hi Maru!
    Just released a resource agent script for managing BTIER devices in Pacemaker Cluster Environment.
    Hope someone find it useful :)

    http://think-brick.blogspot.it/2014/09/btier-resource-agents-for-pacemaker.html

  2. Berto Obarrio says:

    Hello. One question .
    I set the btier on CentOS , and I did run correctly , two levels of storage, files are copied first to the second , ok .
    The problem : I restarted the machine, and was dismantled everything!
    I had to redo insmod btier , reconfigure the disks, and tier 1 data were lost.
    What I have to do to make when starting the machine is everything I had left ?
    Thank You

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>