Open Source data deduplication for less.

lessfs – A high performance inline data deduplicating filesystem for Linux.

lessfs is released under the GNU GPLv3 license and can be downloaded from:

Lessfs mailing list :

EPRD – An eventually persistent kernel based ram disk or disk cache.

TIER – A tiered storage blockdevice

Tier mailing list :

Buy Me a Beer Twitter

Continue reading

Posted in Main | Comments Off

btier-2.0.0 and btier-1.3.10

Two new btier releases have become available.
btier-1.3.10 is hopefully a final bug fix release.
It should be stable and ends btier-1.3.x series development.
Btier-1.3 will still be maintained should bug fixes be necessary.

All the new development will take place on 2.0. This release will be different in a number of ways.

Btier-2.0 will not support VFS / using files as backend store.
It will only support recent kernels. Any time ifdefs should be required to make it work with an older kernel the support for that older kernel will be dropped.

The design of btier-2.0 is cleaner and allowed much better performance.
This can easily be illustrated with the help of two RAM drives and putting either btier-1.3 or btier-2.0 on top.

Btier-2.0 seems to be stable. However this release will undergo some overhauls in the time to come. So 1.3.10 should be the current choice for production.


Mark Ruijter

Posted in Uncategorized | 3 Comments

btier-1.3.9 has been released

btier-1.3.9 solves a number of bugs, including one early free() that would cause random read failures. Ouch!

This release has been extensively tested on 32bit and 64bit systems.
Btier-1.3.9 will become the next stable production version, assuming that no bugs will be reported in the next 4 weeks.



Posted in Uncategorized | Leave a comment

BTIER-1.3.3 and what it can do for you


Btier is an automated tiering block device for the Linux kernel. It works with any type of file-system and it’s purpose is to automatically move data between different block devices in such a way that the performance is improved while the costs for storage are reduced.

More information about automated tiered storage can be found here:Automated_tiered_storage


  • Can use up to 16 devices as part of a btier device.
  • Supports using files or (preferred) blockdevices as part of the btier device.
  • Provides a user-space API that can be used to control data migration or retrieve information about data allocation.
  • Comes with a build-in kernel space data migration engine as well as example scripts to show how data migration can be controlled from user-space.
  • Handles up to 210.000 random 4k IOPS provided that the underlying hardware supports it.
  • Sequential data transfers up to 1100MB/sec.
  • Sequentializes random IO patterns since writes are written in a log structured manner which may boost performance on IOPS constrained storage devices.
  • Supports discard / trim when a recent >= 3.0 kernel is used.
  • Can expand the size of the tier when one of the underlying devices is expanded.
  • Supports kernels ranging from 2.6.32 up to 3.16
  • Most configuration changes can be made with the help of sysfs


Btier is currently considered to be stable and people are reporting to use it for production. btier-1.2.6 is still recommended for production since active development is taking place on the 1.3.x code. Once btier-1.4.0 is released then btier-1.3.latest will be the recommended version for production.

Using btier

Using btier is fairly straight forward. Make sure that the packages that are required for compiling the Linux kernel and the kernel sources are installed.
Download btier from sourceforge:

tar xvf btier-1.3.3.tar.gz
cd btier-1.3.3
--- make output ---
make -Wall -C /lib/modules/3.2.0-64-generic/build M=/tmp/btier-1.3.3/kernel/btier modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-64-generic'
CC [M] /tmp/btier-1.3.3/kernel/btier/btier_common.o
CC [M] /tmp/btier-1.3.3/kernel/btier/btier_sysfs.o
CC [M] /tmp/btier-1.3.3/kernel/btier/btier_main.o
LD [M] /tmp/btier-1.3.3/kernel/btier/btier.o
Building modules, stage 2.
MODPOST 1 modules
CC /tmp/btier-1.3.3/kernel/btier/btier.mod.o
LD [M] /tmp/btier-1.3.3/kernel/btier/btier.ko
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-64-generic'
gcc -O2 -D_FILE_OFFSET_BITS=64 cli/btier_setup.c -o cli/btier_setup
gcc -O2 -D_FILE_OFFSET_BITS=64 cli/btier_inspect.c -o cli/btier_inspect
gcc -O2 tools/writetest.c -o tools/writetest
gcc -O2 tools/show_block_details.c -o tools/show_block_details
sudo make install
sudo make install
[sudo] password for USER:
install -D -m 755 kernel/btier/btier.ko /lib/modules/`uname -r`/kernel/drivers/block/btier.ko
install -D -m 755 -s cli/btier_setup /sbin/btier_setup
install -D -m 755 -s cli/btier_inspect /sbin/btier_inspect
install -D -m 755 rc/btier /etc/init.d/btier
install -D -m 600 rc/bttab /etc/bttab_example
gzip -c man/btier_setup.1 > man/btier_setup.1.gz
gzip -c man/btier_inspect.1 > man/btier_inspect.1.gz
install -D -m 644 man/btier_setup.1.gz /usr/share/man/man1/btier_setup.1.gz
install -D -m 644 man/btier_inspect.1.gz /usr/share/man/man1/btier_inspect.1.gz
depmod -a

Assuming that a user has a md RAID1 that consists of two SSD’s (md0) and a md RAID5/6 that contains a number of SATA drives (md1) then we can now create a btier device like this:

btier_setup -f /dev/md0:/dev/md1 -B -c

Note that the -c flag is only used when the btier device is initially created.
Using ‘-c’ writes the initial metadata to the disks.
The system should now show a new device : /dev/sdtiera

Alternatively we can also create a btier device by using files instead of real block devices. You can even place these files on an NFS share. However the NFS share must be mount -o hard in this case.
Using btier with files or a mix of files and devices is supported and would work like this:

# Create a file
truncate --size=1TB /nfs/tier-nfs.img
btier_setup -f /dev/md0:/dev/md1:/tier-nfs.img -B -c

This creates a btier device that uses md0 as first tier. So md0 would be the fastest device that is available. The second tier is md1. And the last tier is the file on the nfs share (tier-nfs.img).

Configuring data migration

btier supports two different ways when it comes to automatically migrating data. The oldest method is the build-in data migration engine. To use this engine effectively it needs to be tuned to that data access pattern that you have.

To see where the data is currently residing:

cat /sys/block/sdtiera/tier/device_usage
0 ssd.img 96 56 0 37 82 3612
1 sas.img 147 0 0 0 0 0
2 sata.img 197 0 0 0 0 0

Build-in data migration

The active data migration policy can be obtained like this:

cat /sys/block/sdtiera/tier/migration_policy
tier device max_age hit_collecttime
0 ssd.img 86400 43200
1 sas.img 86400 43200
2 sata.img 86400 43200

These default values are usually not what we want.
Let’s see what all this means. Data from tier device 0 is migrated to tier device 1 when it has not been accessed for 86400 seconds (1 day). After being migrated the block is not considered for movement for at least 43200 seconds to avoid a ping-pong effect.
To change these values simply echo the new values to the sysfs file:

echo "1 sas.img 604800 518400" >/sys/block/sdtiera/tier/migration_policy
# cat /sys/block/sdtiera/tier/migration_policy
tier device max_age hit_collecttime
0 ssd.img 86400 43200
1 sas.img 604800 518400
2 sata.img 86400 43200

There is once more important attribute that needs to be tuned.
This is ‘migration_interval’. It determines how frequent the btier data migration thread will run and inspect the individual data chunks to see if they need to be migrated. Since there is a modest performance impact when this job runs we should set this parameter so that data migration does not run to often, but also runs frequently enough to get the job done.

cat /sys/block/sdtiera/tier/migration_interval
# Migrate only once per day
echo 86400 >/sys/block/sdtiera/tier/migration_interval
cat /sys/block/sdtiera/tier/migration_interval

We can also choose to disable / enable data migration.
Please note that whenever data migration is enabled it will perform a scan / migration job.

echo 1 >/sys/block/sdtiera/tier/migration_enable
tail -1 /var/log/syslog
Aug 24 18:29:21 sys kernel: [ 6272.812954] btier: migration is enabled for sdtiera

Should you want to inspect the details about data migration then:

echo 1 >/sys/block/sdtiera/tier/migrate_verbose
tail -1 /var/log/syslog
Aug 24 18:31:16 sys kernel: [ 6388.125519] btier: migrate_verbose is enabled

User controlled data migration

A user can query btier to show where a chunk of data is stored. Btier divides the block device into 1MB chunks. Information about each chunk can be obtained like this:

# Show details about block 0
echo 0 >/sys/block/sdtiera/tier/show_blockinfo
cat /sys/block/sdtiera/tier/show_blockinfo

The result is that we now know that block 0 is stored on device 0 at offset 1048576. The blockinfo structure shows what we show here:

struct blockinfo {
unsigned int device;
u64 offset;
time_t lastused;
unsigned int readcount;
unsigned int writecount;

The tools directory in the distribution tar contains examples that store this information about every block in a sql database and the use queries to determine where a block should be located.

Migrating a chunk is very simple and can be done like this:

# Move block 0 to device 1
echo 0/1 >/sys/block/sdtiera/tier/migrate_block
tail -f /var/log/syslog
Aug 24 18:40:33 kernel: [ 6944.094873] btier: data_migrator woke up
Aug 24 18:40:33 kernel: [ 6944.094879] btier: do_migrate_direct
Aug 24 18:40:33 kernel: [ 6944.094885] btier: migration is disabled for sdtiera due to user controlled data migration
Aug 24 18:40:33 kernel: [ 6944.094891] btier: sysfs request migrate blocknr 0 to 1
Aug 24 18:40:33 kernel: [ 6944.207499] btier: migrated blocknr 0 from device 0-1048576 to device 1-1048576

NOTE: This logging was obtained by enabling ‘migrate_verbose’.

This API allows us to create our own data migration engine in your favorite programming language. Examples in C, shell and python are included.


btier supports trim when a recent > (stock) 3.0 kernel is used.
By default trim is disabled and once has to enable it preferable before formatting the btier device with a file-system that supports trim.
To enable trim:

echo 1 >/sys/block/sdtiera/tier/discard
# Now btier uses discard internally.
echo 1 >/sys/block/sdtiera/tier/discard_to_devices
# Now btier passes discard to underlying devices

Please note that it can even be beneficial to enable trim even when the underlying devices do not support it. In that case btier only uses the discard information to keep it’s own metadata up to date.


Btier is still in development, but already has a respectable number of features.



Posted in Uncategorized | 2 Comments

btier-1.2.2 : Things that have changed


It has been a while since I updated this blog about btier development. At a slow but steady pace I am however still fixing bugs and adding features. This article will explain what has changed as well as what is going to change in the future.


Btier-1.2.x is the new development branch of btier. For some time to come I will keep solving bugs in btier-1.x, however no new features will be implemented in the 1.2 branch.

Recently support to do direct io to underlying devices without using VFS was added for underlying devices that support it. When btier_setup is done with the new -B option writes to real underlying block devices will be done with submit_bio instead of vfs_write. Should a btier contain both devices and files then it will of course still use vfs_write to the underlying files. The benefit is not using vfs is the reduced overhead and therefore increased speed when the underlying devices are flash or ram based. In this case btier will now support speeds up to 170K IOPS.

Another new feature is that btier will now detect zero filled blocks coming in. When possible it will avoid writing them which will easily boost the speed up to more then 1GB/sec even when the underlying disk is a slow USB drive. What happens is that when a zero filled block comes in that a check is done to see if there is metadata for this block. When no existing metadata is present then btier will directly return success without doing anything else. We can do this since reads for blocks without metadata will be returned zero filled anyway.

Things to come:

In the near future more data awareness will be added. When btier starts to migrate a block from one tier to another it will detect a zero filled block. And instead of migrating this block it will  remove the metadata and issue a discard to the underlying device.Snapshots and native asynchronous replication will be next.

Posted in Uncategorized | Leave a comment

btier-1.1.2 has been released

This version of btier seems to be very stable indeed. One of the problems that has now been solved was that btier could deadlock on high vfs cache pressure. Mostly when btier was not used with writethrough enabled on a system with limited memory.

On older releases tuning vfs_cache_pressure to 150 or higher will greatly reduce the risk of running into this problem.
echo 150 > /proc/sys/vm/vfs_cache_pressure
However upgrading to the latest version is of course the best option!

The latest release also comes with great performance.
When btier is given a PCI-e SSD as first tier, it will reach speeds of 130k random 4k iops when writethrough is disabled. It will provide around 80k of random 4k iops with writethrough enabled.

btier provides an API that provides users full control over the placement of individual data blocks.
Example code in the distribution illustrates how users can create their own data migration scripts or tools to query data placement.


Posted in Uncategorized | Leave a comment

BTIER-1.0.0 stable has been released

For some time people have been using btier in production. Some even use complex configurations that stack drbd and Oracle on top of btier. The good news is that even heavy users have not reported problems with btier.

Therefore the time has come to announce the first stable release.


Posted in Uncategorized | 15 Comments

btier- has been released

This release solves a bug that can lead to loss of data when btier is used with SSD’s and a recent kernel that supports discard / trim.

Upgrading is highly recommended.

Posted in Uncategorized | 3 Comments

Btier- has been released

This release dramatically improves random read performance. This has been achieved by introducing a thread pool and implementing aio for random reads.

The performance of the release was:

seq_read : io=12288MB, bw=501371KB/s, iops=125342 , runt= 25097msec
rnd_read : io=2985.3MB, bw=50946KB/s, iops=12736 , runt= 60003msec
seq_write: io=12288MB, bw=493912KB/s, iops=123478 , runt= 25476msec
rnd_write: io=9301.5MB, bw=158731KB/s, iops=39682 , runt= 60005msec

And with the help of the new aio code now shows these numbers:

seq_read : io=12288MB, bw=376373KB/s, iops=94093 , runt= 33432msec
rnd_read : io=10419MB, bw=177818KB/s, iops=44454 , runt= 60001msec
seq_write: io=12288MB, bw=496113KB/s, iops=124028 , runt= 25363msec
rnd_write: io=9242.2MB, bw=157724KB/s, iops=39431 , runt= 60003msec

On the same server bcache (configuration tuned by Kent Overstreet ) shows these numbers:

seq_read : io=12288MB, bw=228324KB/s, iops=57080 , runt= 55110msec
rdn_read : io=11250MB, bw=191989KB/s, iops=47997 , runt= 60001msec
seq_write: io=7394.5MB, bw=126195KB/s, iops=31548 , runt= 60002msec
rnd_write: io=7924.2MB, bw=135237KB/s, iops=33809 , runt= 60001msec

New in this release is that migration policy settings are now stored persistently on disk.



Posted in Uncategorized | Leave a comment

TIER has been renamed to BTIER

What’s in a name

TIER has been renamed to BTIER to improve the relevance of results returned by search engines.

BTIER current status

The btier code is now at and I hope to release a stable 1.0 version within weeks.

BTIER performance tested with Vmware IO analyzer 1.5.0

To test the current performance capability of BTIER I conducted the following test. A server with a single STEC Zeus drive and a LSI controller with 5 Hitachi SAS drives is used to export a btier volume via iSCSI (SCST).

BTIER Server   : Supermicro
Processor      : E5606  @ 2.13GHz
Memory         : 8GB
iSCSI network  : 2 * 10Gbe
LSI controller : MegaRAID SAS 9280-4i4e ( 5 * Hitachi SAS in RAID 5)
LSI controller : SAS2008 PCI-Express Fusion-MPT SAS-2
                 ( 1 * STEC Zeus 800GB SSD)

The native IOPS performance of the 5 Hitachi drives in RAID5 is approx 375 IOPS for writes. The native performance of the SSD can be found here : STEC ZeusIOPS

Vmware server       : Intel 2500HC
Vmware version      : 5.1.0
Vmware io analyzer  :
iSCSI NIC           : 2 * 10Gbe

In this test both bcache and btier are used to have an idea how btier compares with others.
bcache was setup with these commands:

make-bcache -B /dev/sda
make-bcache -C -b1M /dev/sdd
modprobe bcache
echo /dev/sda >/sys/fs/bcache/register
echo /dev/sdd >/sys/fs/bcache/register
ls /sys/fs/bcache/
echo a38f0944-e439-4607-8222-7f5dfbbcf05e >/sys/block/sda/bcache/attach
echo 1 >/sys/block/sda/bcache/writeback_running

Setting up tier:

insmod ./btier.ko
./btier_setup -f /dev/sdd:/dev/sda -c
echo 0 >/sys/block/sdtiera/tier/sequential_landing

And finally SCST:

setup_id 0x1234

HANDLER vdisk_blockio {
        DEVICE disk01 {
                t10_dev_id "v-crsimp01 e951d814"
                usn e951d814
                # ONE OF THESE
                #filename /dev/bcache0
                #filename /dev/sdtiera

	enabled 1
        rel_tgt_id 1

		LUN 0 disk01
		enabled 1

Two vmware guests where started with an iometer IOPS workload.
One guest doing 100% random reads and the other 100% random writes.
The test results are shown below.


Btier max iops








Testing btier and bcache with fio

To ensure that the test results are valid I also tested both btier and bcache with fio.

---------------------------- BTIER ------------------------------
Jobs: 1 (f=1): [___w] [89.3% done] [0K/145.3M /s] [0 /36.4K iops]   
read : io=12288MB, bw=435651KB/s, iops=108912 , runt= 28883msec
read : io=2398.6MB, bw=40935KB/s, iops=10233 , runt= 60001msec
write: io=12288MB, bw=498412KB/s, iops=124603 , runt= 25246msec
write: io=9218.6MB, bw=157306KB/s, iops=39326 , runt= 60006msec

-----------------------------BCACHE (writeback) -----------------
Jobs: 1 (f=1): [___w] [57.2% done] [0K/6541K /s] [0 /1597  iops]   
read : io=10245MB, bw=174850KB/s, iops=43712 , runt= 60001msec
read : io=146684KB, bw=2443.9KB/s, iops=610 , runt= 60021msec
write: io=7253.4MB, bw=123785KB/s, iops=30946 , runt= 60003msec
write: io=2192.4MB, bw=37410KB/s, iops=9352 , runt= 60008msec

The fio test results confirm the results from the Vmware io analyzer test.

Just before finishing up on this post I read the announcement of Enchance IO

A fio test shows that this project is serious competition:

eio_cli create -d /dev/sda4 -s /dev/sdd4 -m wb -c EIO

Jobs: 1 (f=1): [___w] [81.9% done] [0K/135.5M /s] [0 /33.9K iops] [eta 00m:51s]
read : io=12288MB, bw=253913KB/s, iops=63478 , runt= 49556msec
read : io=3885.4MB, bw=66303KB/s, iops=16575 , runt= 60001msec
write: io=7681.1MB, bw=131088KB/s, iops=32772 , runt= 60001msec
write: io=6639.5MB, bw=113312KB/s, iops=28327 , runt= 60001msec




When no major bugs are reported in the weeks to come a btier stable release can be expected soon. btier performs very well and comes with more then enough features to justify a first major release.

Posted in Uncategorized | 5 Comments

TIER-0.4.3 has been released


This version of TIER comes with a significant number of changes. The meta data has changed to support resizing of the TIER device. Therefore this version of TIER is not compatible with previous releases.

New features

This version of TIER introduces support for resizing the underlying devices. When the underlying devices grow TIER can be instructed to grow as well.

LVM will now work with TIER without modifications to the configuration of the system. The device name that TIER registers has changed from /dev/tierN to /dev/sdtierN. The LVM device filters would otherwise have to be changed to use a tier device with LVM. Although this is possible it would have been inconvenient for most users. In this case pvcreate /dev/sdtiera will work with most distributions.

How resizing TIER works

#First create a tier device
insmod ./tier.ko
dd if=/dev/zero of=/data/ssd.img bs=1M count=100
dd if=/dev/zero of=/data/sas.img bs=1M count=150
./tier_setup -f /data/ssd.img:/data/sas.img -c
mkfs.xfs /dev/sdtiera
mount /dev/sdtiera /mnt

df /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdtiera            243008     12548    230460   6% /mnt

truncate --size=10000M /data/sas.img
echo 1 >/sys/block/sdtiera/tier/resize

xfs_growfs /mnt
meta-data=/dev/sdtiera  isize=256    agcount=4, agsize=15488 blks
 =              sectsz=512   attr=2
data     =              bsize=4096   blocks=61952, imaxpct=25
 =              sunit=0      swidth=0 blks
naming   =version 2      bsize=4096   ascii-ci=0
log      =internal       bsize=4096   blocks=1200, version=2
 =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none           extsz=4096   blocks=0, rtextents=0
data blocks changed from 61952 to 2585600

df /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdtiera          10337600     17764  10319836   1% /mnt

The example above uses files instead of LVM devices. However this also works when tier is created with LVM devices.

./tier_setup -f /dev/mapper/meta-ssdlv:/dev/mapper/datavg-datalv -c
And afterwards : lvextend -L+10G /dev/mapper/meta-ssdlv


The next feature that will be added to TIER is the ability to add and even remove devices from TIER.
Performance will be enhanced by loading the meta data into memory whenever sufficient memory is available.
Redundant (meta) data and data checksumming.

Posted in Uncategorized | 9 Comments