What is lessfs data deduplication and why would I use it?

Data deduplication (often called “intelligent compression” or “single-instance storage”) is a method of reducing storage needs by eliminating redundant data. Data deduplication is often used for backup purposes and for virtual machine image storage. lessfs can determine if data is redundant by calculating an unique (192 bit) tiger hash of each block of data that is written. When lessfs has determined that a block of data needs to be stored it first compresses the block with LZO or QUICKLZ compression. The combination of these two techniques results in a very high overall compression rate for many types of data. Multimedia files like mp3, avi or jpg files can not be compressed by lessfs when they are only stored once on the filesystem.

How does lessfs store data?

Lessfs stores files as combination of an inode – blocknumber in the fileblock database. The key inode-blocknumber has a tigerhash as its value. This tigerhash can be used as a key for the blockdata database. The resulting value is a LZO or QUICKLZ compressed block of data.

What about performance?

On modern hardware with 12 low end SATA drives lessfs can reach speeds up to 130MB/s for an 100% unique dataset. It will reach speeds up to 170MB/s for previously stored data blocks.

iozone write

iozone write

The figure shown above displays the write performance of lessfs, reiserfs, reiserfs from a loop file on lessfs and ietd from a sparse file on lessfs. The benchmark above comes from a laptop with a modest speed ( ATA TOSHIBA MK3252GS ) disk drive.

When to use lessfs?

Lessfs has been designed as a filesystem for backup purposes. It can however also be used as storage for virtual machine images with low to medium storage performance demands. This works best with ietd http://iscsitarget.sourceforge.net/ as an intermediate. See the FAQ file in the distribution tar.gz file for details. Last but not least, lessfs is opensource and licensed under GPLv3. Other licenses may be acquired by contacting the author.