lessfs – A high performance inline data deduplicating filesystem for Linux.
lessfs is released under the GNU GPLv3 license and can be downloaded from:
http://sourceforge.net/project/showfiles.php?group_id=257120
lessfs – A high performance inline data deduplicating filesystem for Linux.
lessfs is released under the GNU GPLv3 license and can be downloaded from:
http://sourceforge.net/project/showfiles.php?group_id=257120
Key value databases are becoming more popular lately. And for good reason. Although these databases offer a limited set of features they do come with one important feature : high performance. Not all key value databases are created equal though. You can choose from at least a dozen or so. Since I started to develop lessfs I have tried more then a view of them. Let me share my experiences with the most important key value databases.
A more complete list of key value databases can be found on : NOSQL
Lessfs development started with using Berkeley DB. This database excels in reliability. No matter how the server crashes or goes down, bdb always recovers as long as it has the log files to do so. I have used Berkeley DB extensively with OpenLdap and it never let us down. Berkeley DB does have a dark side though. But a typical Ldap server will not reveal this dark side. Ldap servers are mostly used and optimized for read operations. This is why Berkeley DB and Ldap work fine. Everything changes when the application needs write performance. In this case Berkeley DB is very slow. Way to slow for the kind of performance that Lessfs needs.
These are the old school databases. These databases a both slow and not reliable enough for modern applications. They do not support transactions / ACID operations and are easily damaged upon for example an unexpected power down. See the benchmark below for the performance numbers.
Tokyocabinet is currently used with Lessfs. Tokyocabinet really excels in speed. As far as I can tell it holds the record on speed for a key value database that supports both read and write operations. Dan Berstein’s constant database (cdb) is just a bit faster. But cdb is a constant database and therefore not what we are looking for. Tokyocabinet comes with a very nice API and is very well documented. So this appears to be the perfect key value database, right? Well, no. Tokyocabinet comes with a few problems of it’s own. In the beginning TC did not come with transactions. The author relies on replication to keep his data safe. Later on support for transactions was added to TC making it somewhat more resilient to crashes and unexpected power-downs. I am deliberately using the word ’somewhat’ here. In practice it is still very easy to damage a TC database beyond repair. Just pressing the power switch when the database is under load usually does the trick. I am not complaining that I might loose a some data in this case. But the fact that the database might be damaged beyond repair is not acceptable in my opinion. It also looks like development on TC has stopped or at least slowed down in favor of Kyotocabinet. Last but no least my attempts to discuss problems with the author mostly results in one way traffic. Especially when it comes to discussing data corruption problems the silence is deafening.
Recently I came across hamsterdb. A key value database that targets the embedded market. Hamsterdb comes with very decent performance and is highly crash resilient. At least I have not been able to corrupt hamsterdb with for example deliberate power-downs when the database is under high load. Hamsterdb is well documented and the hamsterdb source code is easy to read, which makes this database very attractive. It is also possible to buy a commercial license for a reasonable price which makes it even better.
Tokyocabinet comes with a set of performance tests in the /bros directory. I have created a few additional tests to be able to test TC/BDB and hamsterdb with and without transaction support.
| NAME | COMMENT | RECORDS | WRITE(SEC) | READ(SEC) |
| gdbm | 1000000 | 7.385 | 2.505 | |
| TC | Without transactions | 1000000 | 0.339 | 0.387 |
| BDB | Without transactions | 1000000 | 3.666 | 2.258 |
| HAMSTERDB | Without transactions | 1000000 | 1.376 | 1.360 |
|
|
|
|
|
|
| TC | With transactions | 10000 | 0.376 | 0.024 |
| TC | HDBOTSYNC+transactions | 10000 | 1440.078 | 0.029 |
| BDB | With transactions | 10000 | 116.897 | 2.312 |
| HAMSTERDB | With transactions | 10000 | 0.484 | 0.068 |
I tested TC twice, once with and without HDBOTSYNC. With HDBOTSYNC TC synchronizes the disk after every transaction. This should ensure that the database survives unexpected termination without loss of data. But since it then takes 1440 seconds to write 10000 records this can hardly be seen as a valid solution to the database corruption problem.
There is a large number of very good open source key value databases available today. Each database has its own benefits and dark sides. Choosing one that is optimal for your application depends on the performance that is required as well as reliability and support that you can expect from the community. As always the perfect solution does not exist. I would have liked to see a key value database that works with a log based approach. PBXT is an example of a database design that works this way but regrettably PBXT is not available as merely a key value database.
This versions adds two new features to lessfs.
A. Synchronous replication.
Lessfs now supports a master / slave configuration. In this case the master will synchronously replicate the data to the slave. The slave is read-only.
The master will freeze for ever when the slave becomes unreachable. Many options will be added in the future, including maintaining a replication backlog and async replication. The replication feature should be considered alpha code.
B. Background delete.
When enabled lessfs will spawn a background thread that will take care of the truncation/delete task. Lessfs will therefore return immediately after a delete or truncate operation. This is a new feature and is therefore not enabled by default.
Enjoy,
Mark Ruijter
Today I decided to upload a pre-release of lessfs-1.1.6. It has a new feature that a number of users have been lobbying fore. The feature is background deletion / truncation of files.
Let me explain this a bit further. Up till now deleting large files has been an operation that required patience. Lessfs will have to go through the deleted chunks to see if they are reference by any other file. If not it can be put on the freelist (file_io) or deleted from the database with the tc data store.
This test release now marks the file as being deleted and spawns a thread that will do the actual work in the background. While the thread is running it sets a lock on the inode and a lock per hash / per block. The only disadvantage is that when you unmount lessfs, the actual lessfs process may stay around for some time until it has finished it’s background tasks. In the future these delete/truncate operations may become restartable but for now they are not.
You can download the new code from:
lessfs_background_delete
Enjoy,
Mark Ruijter
This version improves crash recovery and solves a potential deadlock when lessfs is used with samba.
Lessfs now inserts test transactions in the databases when lessfs is mounted. When a transactions fails the database it automatically optimized. Instead of halting when finding ‘non fatal inconsistency’ lessfs will now repair the problem at runtime and log the inode and blocknr of the file that has been corrupted.
This release fixes a problem where lessfs would leave orphaned data chunks in the system under high load.
A number of issues with lessfschk has also been solved.
Everyone who is using a 1.1.x release is encouraged to upgrade to this version. It solves a race condition that will crash lessfs when it is put under heavy load.
This release also improves performance.
Lessfs-1.1.0 is now available for download.
As with any .0 release there may still be small issues that can pop up.
The good news is that lessfs-1.1.0 is very fast, both for reading and writing. Compression has improved in many ways when compared to lessfs-1.0.8. Lessfs now does not compress a whole (BLKSIZE) block of data when only a small part of the block is actually filled with data. Lessfs-1.0.x would fill a block with zero’s, copy the data in the block and then compress the whole block. This resulted in bad compression for small files.
Lessfs-1.1.x now supports wide a range of compression algorithms, like deflate,gzip,lzo,qlz and zlib.
For people who would like to reduce the possibility of a hash collision to almost nil lessfs now supports up to 64 byte long hashes like whirlpool.
The popular sha256 is now also supported.
A start has been made to provide better compression statistics in .lessfs/lessfs_stats. Lessfs can now deliver very decent performance with ietd, especially when ietd has been patched to write with the same blocksize that lessfs uses.
Enjoy,
Mark Ruijter
PLEASE NOTE THAT LESSFS-1.1.x is not compatible with LESSFS-1.0.x.
Lessfs-1.1.0-beta6 is now available for download. This release brings much improved read performance. Lessfs-1.1.0-beta6 should be the last beta release. When no bugs are reported the coming week this code will be released as 1.1.0. People who would like to download a full comparison between zfs,sdfs and lessfs can download the iozone graphs from this location:
Enjoy,
Mark Ruijter
Lessfs-1.1.0-beta4 is now available for download. This release improves write performance when writes are done with small block sizes. The graph below displays the write performance of XFS, the fastest filesystem that is available for Linux and all other de-duplicating filesystems that are available for Linux. Both Lessfs and ZFS-fuse are deduplicating and compressing the data. SDFS does deduplicate but does not compress the data.

This version of lessfs now comes with a ‘per hash’ lock which protects a hash / data chunk while it is still being written to disk. The way this is done does not effect performance because it is highly granular. This said, I should still check if using a spinlock instead of mutex locking improves performance. Other then that this code looks pretty stable.

For those who are interested in performance. The graph below compares XFS with ZFS-fuse and Lessfs-1.1.0-betaX On read performance Lessfs is outperformed by ZFS. But when it comes to writing, Lessfs is a clear winner. Lessfs will be tuned for metadata operations in the upcoming 2.x release. LFS now uses the high level fuse API but will switch to the lowlevel API for better metadata performance.
