Open Source data deduplication for less.

lessfs – A high performance inline data deduplicating filesystem for Linux.

lessfs is released under the GNU GPLv3 license and can be downloaded from:

http://sourceforge.net/project/showfiles.php?group_id=257120

Lessfs mailing list : lessfs@googlegroups.com

EPRD – An eventually persistent kernel based ram disk or disk cache.

http://sourceforge.net/projects/eprd/files/

TIER – A tiered storage blockdevice

http://sourceforge.net/projects/tier/files/

Tier mailing list : tier-users@lists.sourceforge.net

Buy Me a Beer Twitter

Continue reading

Posted in Main | Comments Off

btier-0.9.9.9-3 has been released

This release solves a bug that can lead to loss of data when btier is used with SSD’s and a recent kernel that supports discard / trim.

Upgrading is highly recommended.

Posted in Uncategorized | 1 Comment

Btier-0.9.9.9-1 has been released

This release dramatically improves random read performance. This has been achieved by introducing a thread pool and implementing aio for random reads.

The performance of the 0.9.9.9 release was:

seq_read : io=12288MB, bw=501371KB/s, iops=125342 , runt= 25097msec
rnd_read : io=2985.3MB, bw=50946KB/s, iops=12736 , runt= 60003msec
seq_write: io=12288MB, bw=493912KB/s, iops=123478 , runt= 25476msec
rnd_write: io=9301.5MB, bw=158731KB/s, iops=39682 , runt= 60005msec

And with the help of the new aio code 0.9.9.9-1 now shows these numbers:

seq_read : io=12288MB, bw=376373KB/s, iops=94093 , runt= 33432msec
rnd_read : io=10419MB, bw=177818KB/s, iops=44454 , runt= 60001msec
seq_write: io=12288MB, bw=496113KB/s, iops=124028 , runt= 25363msec
rnd_write: io=9242.2MB, bw=157724KB/s, iops=39431 , runt= 60003msec

On the same server bcache (configuration tuned by Kent Overstreet ) shows these numbers:

seq_read : io=12288MB, bw=228324KB/s, iops=57080 , runt= 55110msec
rdn_read : io=11250MB, bw=191989KB/s, iops=47997 , runt= 60001msec
seq_write: io=7394.5MB, bw=126195KB/s, iops=31548 , runt= 60002msec
rnd_write: io=7924.2MB, bw=135237KB/s, iops=33809 , runt= 60001msec

New in this release is that migration policy settings are now stored persistently on disk.

Enjoy,

Mark

Posted in Uncategorized | Leave a comment

TIER has been renamed to BTIER

What’s in a name

TIER has been renamed to BTIER to improve the relevance of results returned by search engines.

BTIER current status

The btier code is now at 0.9.9.2 and I hope to release a stable 1.0 version within weeks.

BTIER performance tested with Vmware IO analyzer 1.5.0

To test the current performance capability of BTIER I conducted the following test. A server with a single STEC Zeus drive and a LSI controller with 5 Hitachi SAS drives is used to export a btier volume via iSCSI (SCST).

BTIER Server   : Supermicro
Processor      : E5606  @ 2.13GHz
Memory         : 8GB
iSCSI network  : 2 * 10Gbe
LSI controller : MegaRAID SAS 9280-4i4e ( 5 * Hitachi SAS in RAID 5)
LSI controller : SAS2008 PCI-Express Fusion-MPT SAS-2
                 ( 1 * STEC Zeus 800GB SSD)

The native IOPS performance of the 5 Hitachi drives in RAID5 is approx 375 IOPS for writes. The native performance of the SSD can be found here : STEC ZeusIOPS
specifications

Vmware server       : Intel 2500HC
Vmware version      : 5.1.0
Vmware io analyzer  : http://labs.vmware.com/flings/io-analyzer
iSCSI NIC           : 2 * 10Gbe

In this test both bcache and btier are used to have an idea how btier compares with others.
bcache was setup with these commands:

make-bcache -B /dev/sda
make-bcache -C -b1M /dev/sdd
modprobe bcache
echo /dev/sda >/sys/fs/bcache/register
echo /dev/sdd >/sys/fs/bcache/register
ls /sys/fs/bcache/
echo a38f0944-e439-4607-8222-7f5dfbbcf05e >/sys/block/sda/bcache/attach
echo 1 >/sys/block/sda/bcache/writeback_running

Setting up tier:

insmod ./btier.ko
./btier_setup -f /dev/sdd:/dev/sda -c
echo 0 >/sys/block/sdtiera/tier/sequential_landing

And finally SCST:

setup_id 0x1234

HANDLER vdisk_blockio {
        DEVICE disk01 {
                t10_dev_id "v-crsimp01 e951d814"
                usn e951d814
                # ONE OF THESE
                #filename /dev/bcache0
                #filename /dev/sdtiera
                WRITE_THROUGH
        }
}

TARGET_DRIVER iscsi {
	enabled 1
        rel_tgt_id 1

	TARGET iqn.2006-11.net.storagedata:tgt-ctrl02 {
		LUN 0 disk01
                allowed_portal 192.168.1.20
                allowed_portal 192.168.2.20
		enabled 1
	}
}

Two vmware guests where started with an iometer IOPS workload.
One guest doing 100% random reads and the other 100% random writes.
The test results are shown below.

vmware-io-analyzer

Btier max iops

BTIER MAX IOPS

BCACHE MAX IOPS

BCACHE MAX IOPS

BTIER MAX LATENCY

BTIER MAX LATENCY

BCACHE MAX LATENCY

BCACHE MAX LATENCY

Testing btier and bcache with fio

To ensure that the test results are valid I also tested both btier and bcache with fio.

---------------------------- BTIER ------------------------------
Jobs: 1 (f=1): [___w] [89.3% done] [0K/145.3M /s] [0 /36.4K iops]   
read : io=12288MB, bw=435651KB/s, iops=108912 , runt= 28883msec
read : io=2398.6MB, bw=40935KB/s, iops=10233 , runt= 60001msec
write: io=12288MB, bw=498412KB/s, iops=124603 , runt= 25246msec
write: io=9218.6MB, bw=157306KB/s, iops=39326 , runt= 60006msec

-----------------------------BCACHE (writeback) -----------------
Jobs: 1 (f=1): [___w] [57.2% done] [0K/6541K /s] [0 /1597  iops]   
read : io=10245MB, bw=174850KB/s, iops=43712 , runt= 60001msec
read : io=146684KB, bw=2443.9KB/s, iops=610 , runt= 60021msec
write: io=7253.4MB, bw=123785KB/s, iops=30946 , runt= 60003msec
write: io=2192.4MB, bw=37410KB/s, iops=9352 , runt= 60008msec

The fio test results confirm the results from the Vmware io analyzer test.

Just before finishing up on this post I read the announcement of Enchance IO

A fio test shows that this project is serious competition:

eio_cli create -d /dev/sda4 -s /dev/sdd4 -m wb -c EIO

Jobs: 1 (f=1): [___w] [81.9% done] [0K/135.5M /s] [0 /33.9K iops] [eta 00m:51s]
read : io=12288MB, bw=253913KB/s, iops=63478 , runt= 49556msec
read : io=3885.4MB, bw=66303KB/s, iops=16575 , runt= 60001msec
write: io=7681.1MB, bw=131088KB/s, iops=32772 , runt= 60001msec
write: io=6639.5MB, bw=113312KB/s, iops=28327 , runt= 60001msec

 

 

Conclusion

When no major bugs are reported in the weeks to come a btier stable release can be expected soon. btier performs very well and comes with more then enough features to justify a first major release.

Posted in Uncategorized | 5 Comments

TIER-0.4.3 has been released

Introduction

This version of TIER comes with a significant number of changes. The meta data has changed to support resizing of the TIER device. Therefore this version of TIER is not compatible with previous releases.

New features

This version of TIER introduces support for resizing the underlying devices. When the underlying devices grow TIER can be instructed to grow as well.

LVM will now work with TIER without modifications to the configuration of the system. The device name that TIER registers has changed from /dev/tierN to /dev/sdtierN. The LVM device filters would otherwise have to be changed to use a tier device with LVM. Although this is possible it would have been inconvenient for most users. In this case pvcreate /dev/sdtiera will work with most distributions.

How resizing TIER works

#First create a tier device
insmod ./tier.ko
dd if=/dev/zero of=/data/ssd.img bs=1M count=100
dd if=/dev/zero of=/data/sas.img bs=1M count=150
./tier_setup -f /data/ssd.img:/data/sas.img -c
mkfs.xfs /dev/sdtiera
mount /dev/sdtiera /mnt

df /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdtiera            243008     12548    230460   6% /mnt

truncate --size=10000M /data/sas.img
echo 1 >/sys/block/sdtiera/tier/resize

xfs_growfs /mnt
meta-data=/dev/sdtiera  isize=256    agcount=4, agsize=15488 blks
 =              sectsz=512   attr=2
data     =              bsize=4096   blocks=61952, imaxpct=25
 =              sunit=0      swidth=0 blks
naming   =version 2      bsize=4096   ascii-ci=0
log      =internal       bsize=4096   blocks=1200, version=2
 =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none           extsz=4096   blocks=0, rtextents=0
data blocks changed from 61952 to 2585600

df /mnt
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/sdtiera          10337600     17764  10319836   1% /mnt

The example above uses files instead of LVM devices. However this also works when tier is created with LVM devices.

./tier_setup -f /dev/mapper/meta-ssdlv:/dev/mapper/datavg-datalv -c
And afterwards : lvextend -L+10G /dev/mapper/meta-ssdlv

Roadmap

The next feature that will be added to TIER is the ability to add and even remove devices from TIER.
Performance will be enhanced by loading the meta data into memory whenever sufficient memory is available.
Redundant (meta) data and data checksumming.

Posted in Uncategorized | 9 Comments

TIER-0.2.3 is available for download

This release of tier makes it possible to disable or enable data migration via sysfs.

To disable migration:
echo 1>/sys/block/tiera/tier/disable_migration
To enable migration:
echo 0>/sys/block/tiera/tier/disable_migration

When migration is re-enabled the migration process will immediately wake up and start. This feature allows to schedule block migration to take place at a convenient time. In future releases the sysfs interface will be expanded so that all migration related parameters can be managed via sysfs.

Enjoy,

Mark Ruijter

 

Posted in Uncategorized | 13 Comments

TIER OPTIMIZATION EFFECTS

How TIER works

Tier is a Linux kernel block device that aggregates multiple devices of different nature into one virtual block device. The idea is to combine ( expensive ) fast and ( affordable ) slow devices to build a high performance virtual device. TIER is different from Flashcache and Bcache because it does not only use a fast medium for caching. In some ways TIER and bcache use comparable techniques. Both for example will try to handle random writes sequentially. However TIER goes one step further. It keeps track of data access patterns and will over time migrate aged data to a lower tier. It will also detect that some blocks may be used more often then others and migrate these up to a higher tier.

The effects of data migration on performance

In a previous post I published some performance numbers that compare TIER to bcache and flashcache. This time the test with fio was repeated on TIER for several hours which allowed optimization to take place.

FLASHCACHE

read-seq : io=16635MB, bw=56778KB/s, iops=14194 , runt=300017msec
read-rnd : io=872528KB, bw=2908.4KB/s, iops=727 , runt=300007msec
write-seq: io=8237.5MB, bw=28117KB/s, iops=7029 , runt=300001msec
write-rnd: io=6038.4MB, bw=20611KB/s, iops=5152 , runt=300001msec

BCACHE

read-seq : io=20480MB, bw=103370KB/s, iops=25842 , runt=202878msec
read-rnd : io=936760KB, bw=3122.4KB/s, iops=780 , runt=300014msec
write-seq: io=15604MB, bw=53263KB/s, iops=13315 , runt=300001msec
write-rnd: io=6453.1MB, bw=22025KB/s, iops=5506 , runt=300016msec

TIER : FIRST RUN

read-seq : io=11911MB, bw=203277KB/s, iops=50819 , runt= 60001msec
read-rnd : io=116236KB, bw=1936.1KB/s, iops=484 , runt= 60009msec
write-seq: io=10507MB, bw=179324KB/s, iops=44831 , runt= 60001msec
write-rnd: io=1653.5MB, bw=24989KB/s, iops=6247 , runt= 67756msec

TIER : AFTER SEVERAL HOURS

read-seq : io=13506MB, bw=230496KB/s, iops=57623 , runt= 60001msec
read-rnd : io=273316KB, bw=4554.6KB/s, iops=1138 , runt= 60010msec
write-seq: io=12675MB, bw=216311KB/s, iops=54077 , runt= 60001msec
write-rnd: io=2588.7MB, bw=44117KB/s, iops=11029 , runt= 60085msec

The price of optimization

As hardly anything in life comes for free optimization comes with a price as well. When a volume is not being used continuously optimization can take place in periods of relative low traffic. In this case optimization works very well. When a volume is under a continuous high load choices will have to be made. Optimization will impact performance in this case for as long as the optimization takes place. After optimization the performance will most likely increase.  The trick is therefore to do optimization in such a way that the performance impact is acceptable while still allowing the optimization interval not to be to low. This part of TIER is still a work in progress and may require different policies for different workloads. The graph below clearly shows the advantages and disadvantages of the optimization process. During this 24 hour test the optimization took place once per hour. There are however a still a number of things that can be done to further reduce this negative impact an future releases will focus on diminishing this effect as well as possible.

TIER OPTIMIZATION

Posted in Uncategorized | 6 Comments

TIER-0.2.0 has been released

Tier-0.2.0 adds crash recovery and some bug fixes.

A brief benchmark of tier, flashcache and bcache with fio shows these results:
flashcache
read : io=16635MB, bw=56778KB/s, iops=14194 , runt=300017msec
read : io=872528KB, bw=2908.4KB/s, iops=727 , runt=300007msec
write: io=8237.5MB, bw=28117KB/s, iops=7029 , runt=300001msec
write: io=6038.4MB, bw=20611KB/s, iops=5152 , runt=300001msec

bcache
read : io=20480MB, bw=103370KB/s, iops=25842 , runt=202878msec
read : io=936760KB, bw=3122.4KB/s, iops=780 , runt=300014msec
write: io=15604MB, bw=53263KB/s, iops=13315 , runt=300001msec
write: io=6453.1MB, bw=22025KB/s, iops=5506 , runt=300016msec

tier
read : io=20480MB, bw=167819KB/s, iops=41954 , runt=124965msec
read : io=528236KB, bw=1760.8KB/s, iops=440 , runt=300012msec
write: io=20480MB, bw=172857KB/s, iops=43214 , runt=121323msec
write: io=5091.7MB, bw=17371KB/s, iops=4342 , runt=300141msec

The SSD used in this test had a size of 10GB while the SAS drive had a size of 100GB.

The fio configuration file that was used is:
[global]
bs=4k
ioengine=libaio
iodepth=4
size=20g
direct=1
runtime=60
directory=/mnt/fio
filename=test.file
[seq-read]
rw=read
stonewall
[rand-read]
rw=randread
stonewall
[seq-write]
rw=write
stonewall
[rand-write]
rw=randwrite
stonewall
Posted in Uncategorized | 1 Comment

TIER-0.1.7 has been released.

This version of tier comes with some major changes. The caching layer has been removed from the code. EPRD can be used in cases where caching is needed. Also the block size has been changed so that TIER now uses a 1MB block size. This greatly reduces the amount of meta data that has to be stored. TIER will now automatically migrate the data between the different tiers. The policy that determines when a block should be migrated is still hard coded in this release but will be adjustable per tier in future releases. TIER will detect unclean shutdowns and unfinished migrations after unclean shutdown. However this release does not yet handle recovery.

Posted in Uncategorized | 1 Comment

Introducing TIER

Tier is a Linux kernel module that can be used to create a block device that allows automatically tiered storage. Tier can be used to aggregate up to 16 devices as one virtual device. Tier investigates access patterns to decide on which device the data should be written. It keeps track of how frequently data has been accessed as well as when it was used. Tier uses this information to decide if the data needs to be written to for example SSD/SAS or SATA.

One advantage of tier when compared to SSD caching only is that the total capacity of the tiered device is the sum of all attached devices. Kernel modules like flashcache use the SSD as cache only and therefore the capacity of the SSD is not available as part of the total size of the device.

Since TIER combines the RAM caching techniques of EPRD it is very fast. Even faster then what can be achieved with SSD only.

To get an impression of TIER performance I tested tier in this configuration.
An Intel SSD with a 160GB size is used as first tier and the second tier is made up of 6 * 300GB SAS in software RAID10.

The iometer test that is used comes from : http://vmktree.org/iometer/
Tier was configured with these parameters:

./tier_setup -f /dev/sdb:/dev/md1 -p 1000M -m 5 -b -c
                              TIER - SSD  - MD1(R10)
Max-throughput-100%read    : 32540 - 3796 - 2746
Reallife-60%rand-65%read   : 1927  - 3185 - 226
Max-Throughput-50%read     : 6890  - 1753 - 470
Random-8k-70%read          : 937   - 2870 - 401

As shown in the results table above TIER outperforms the MD raid10 on all tests. The SSD is faster in most cases but not all. TIER can outperform the SSD because it was configured to use 1GB of RAM for caching and TIER uses the speed advantage that raid10 will give on sequential reads and writes.

tier-iometer

Posted in Uncategorized | 11 Comments

EPRD & lessfs

To get an idea of the efficiency of EPRD caching I repeated the lessfs benchmark test with EPRD caching the Intel 320 SSD.

The Intel 320SSD was registered as /dev/sdc.
EPRD was setup like this : ./eprd_setup -f /dev/sdc -m 3 -b -p 2048M
The databases eventually reach a size of 8.5 GB during this test.

Lessfs with and without EPRD

Lessfs with and without EPRD 2nd write

As the graph’s show a user space application like Lessfs speeds up with EPRD even when it is used to cache a relatively fast medium like an Intel 320 SSD. I intend to test EPRD with a number of other applications as well. Candidates that come to mind are for example openldap and Mysql.

Posted in Uncategorized | 6 Comments