Lessfs-1.1.9.1 has been released.

This is mostly a bugfix release.

Issues that have been solved are:

A memory corruption bug in the replication code which would garble the replication log as well as general replication improvements.

Lessfs now maintains accurate information on st_blocks of the stat structure. When used with file_io lessfs_stats now reports the right amount of bytes for the actual size of a file. Even when this file has been truncated.

Posted in Uncategorized | 3 Comments

Lessfs-1.1.9 has been released.

New features

Lessfs now supports asynchronous master/slave replication. Replication can be enabled and disabled with a simple ‘ echo 1 >/.lessfs/replication/enabled’ where 1 is enabled and 0 is disabled. Lessfs uses CRC32 checksums to ensure data integrity during data transfer between master and slave. See the README.replication file in the distribution for more details. Replication should still be considered alpha code. That said, it works for me. ;-)

Improvements

The background delete/truncation code has stabilized to the point where it will become the default with the next release.

Posted in Uncategorized | 4 Comments

Lessfs-1.1.8 is available for download.

This release contains a number of small bug fixes as well as improved truncation code. Lessfs can now truncate files as a background process. Files that are being truncated are no longer read/write locked but are now only write locked. During truncation a file with the inodenumber of the truncated file as filename will be visible in /.lessfs/locks/ to indicate that the inode is being processed.

Posted in Uncategorized | 3 Comments

A tale about key value databases.

Introduction

Key value databases are becoming more popular lately. And for good reason. Although these databases offer a limited set of features they do come with one important feature : high performance. Not all key value databases are created equal though. You can choose from at least a dozen or so. Since I started to develop lessfs I have tried more then a view of them. Let me share my experiences with the most important key value databases.

A list of the key value databases evaluated for lessfs

A more complete list of key value databases can be found on : NOSQL

Berkeley DB

Lessfs development started with using Berkeley DB. This database excels in reliability. No matter how the server crashes or goes down, bdb always recovers as long as it has the log files to do so. I have used Berkeley DB extensively with OpenLdap and it never let us down. Berkeley DB does have a dark side though. But a typical Ldap server will not reveal this dark side. Ldap servers are mostly used and optimized for read operations. This is why Berkeley DB and Ldap work fine. Everything changes when the application needs write performance. In this case Berkeley DB is very slow. Way to slow for the kind of performance that Lessfs needs.

gdbm/ndbm

These are the old school databases. These databases a both slow and not reliable enough for modern applications. They do not support transactions / ACID operations and are easily damaged upon for example an unexpected power down. See the benchmark below for the performance numbers.

Tokyocabinet

Tokyocabinet is currently used with Lessfs. Tokyocabinet really excels in speed. As far as I can tell it holds the record on speed for a key value database that supports both read and write operations. Dan Berstein’s constant database (cdb) is just a bit faster. But cdb is a constant database and therefore not what we are looking for. Tokyocabinet comes with a very nice API and is very well documented. So this appears to be the perfect key value database, right? Well, no. Tokyocabinet comes with a few problems of it’s own. In the beginning TC did not come with transactions. The author relies on replication to keep his data safe. Later on support for transactions was added to TC making it somewhat more resilient to crashes and unexpected power-downs. I am deliberately using the word ‘somewhat’ here. In practice it is still very easy to damage a TC database beyond repair. Just pressing the power switch when the database is under load usually does the trick. I am not complaining that I might loose a some data in this case. But the fact that the database might be damaged beyond repair is not acceptable in my opinion. It also looks like development on TC has stopped or at least slowed down in favor of Kyotocabinet. Last but no least my attempts to discuss problems with the author mostly results in one way traffic. Especially when it comes to discussing data corruption problems the silence is deafening.

Hamsterdb

Recently I came across hamsterdb. A key value database that targets the embedded market. Hamsterdb comes with very decent performance and is highly crash resilient. At least I have not been able to corrupt hamsterdb with for example deliberate power-downs when the database is under high load. Hamsterdb is well documented and the hamsterdb source code is easy to read, which makes this database very attractive. It is also possible to buy a commercial license for a reasonable price which makes it even better.

Performance numbers.

Tokyocabinet comes with a set of performance tests in the /bros directory. I have created a few additional tests to be able to test TC/BDB and hamsterdb with and without transaction support.

NAME COMMENT RECORDS WRITE(SEC) READ(SEC)
gdbm 1000000 7.385 2.505
TC Without transactions 1000000 0.339 0.387
BDB Without transactions 1000000 3.666 2.258
HAMSTERDB Without transactions 1000000 1.376 1.360





TC With transactions 10000 0.376 0.024
TC HDBOTSYNC+transactions 10000 1440.078 0.029
BDB With transactions 10000 116.897 2.312
HAMSTERDB With transactions 10000 0.484 0.068

I tested TC twice, once with and without HDBOTSYNC. With HDBOTSYNC TC synchronizes the disk after every transaction. This should ensure that the database survives unexpected termination without loss of data. But since it then takes 1440 seconds to write 10000 records this can hardly be seen as a valid solution to the database corruption problem.

Conclusion

There is a large number of very good open source key value databases available today. Each database has its own benefits and dark sides. Choosing one that is optimal for your application depends on the performance that is required as well as reliability and support that you can expect from the community. As always the perfect solution does not exist. I would have liked to see a key value database that works with a log based approach. PBXT is an example of a database design that works this way but regrettably PBXT is not available as merely a key value database.

Posted in Uncategorized | 12 Comments

Lessfs-1.1.6 is available for download.

This versions adds two new features to lessfs.
A. Synchronous replication.
Lessfs now supports a master / slave configuration. In this case the master will synchronously replicate the data to the slave. The slave is read-only.
The master will freeze for ever when the slave becomes unreachable. Many options will be added in the future, including maintaining a replication backlog and async replication. The replication feature should be considered alpha code.
B. Background delete.
When enabled lessfs will spawn a background thread that will take care of the truncation/delete task. Lessfs will therefore return immediately after a delete or truncate operation. This is a new feature and is therefore not enabled by default.

Enjoy,

Mark Ruijter

Posted in Uncategorized | 13 Comments

Lessfs-1.1.6-testing is available for download.

Lessfs with background deletion and truncation.

Today I decided to upload a pre-release of lessfs-1.1.6. It has a new feature that a number of users have been lobbying fore. The feature is background deletion / truncation of files.

Let me explain this a bit further. Up till now deleting large files has been an operation that required patience. Lessfs will have to go through the deleted chunks to see if they are reference by any other file. If not it can be put on the freelist (file_io) or deleted from the database with the tc data store.

This test release now marks the file as being deleted and spawns a thread that will do the actual work in the background. While the thread is running it sets a lock on the inode and a lock per hash / per block. The only disadvantage is that when you unmount lessfs, the actual lessfs process may stay around for some time until it has finished it’s background tasks. In the future these delete/truncate operations may become restartable but for now they are not.

You can download the new code from:
lessfs_background_delete

Enjoy,

Mark Ruijter

Posted in Uncategorized | 7 Comments

Lessfs-1.1.5 is now available for download

This version improves crash recovery and solves a potential deadlock when lessfs is used with samba.
Lessfs now inserts test transactions in the databases when lessfs is mounted. When a transactions fails the database it automatically optimized. Instead of halting when finding ‘non fatal inconsistency’ lessfs will now repair the problem at runtime and log the inode and blocknr of the file that has been corrupted.

Posted in Uncategorized | 6 Comments

Lessfs-1.1.3 is available.

This release fixes a problem where lessfs would leave orphaned data chunks in the system under high load.
A number of issues with lessfschk has also been solved.

Posted in Uncategorized | 16 Comments

Lessfs-1.1.2 has been released.

Everyone who is using a 1.1.x release is encouraged to upgrade to this version. It solves a race condition that will crash lessfs when it is put under heavy load.
This release also improves performance.

Posted in Uncategorized | 25 Comments

Lessfs-1.1.0 has been released.

Lessfs-1.1.0 is now available for download.
As with any .0 release there may still be small issues that can pop up.

The good news is that lessfs-1.1.0 is very fast, both for reading and writing. Compression has improved in many ways when compared to lessfs-1.0.8. Lessfs now does not compress a whole (BLKSIZE) block of data when only a small part of the block is actually filled with data. Lessfs-1.0.x would fill a block with zero’s, copy the data in the block and then compress the whole block. This resulted in bad compression for small files.

Lessfs-1.1.x now supports wide a range of compression algorithms, like deflate,gzip,lzo,qlz and zlib.

For people who would like to reduce the possibility of a hash collision to almost nil lessfs now supports up to 64 byte long hashes like whirlpool.
The popular sha256 is now also supported.

A start has been made to provide better compression statistics in .lessfs/lessfs_stats. Lessfs can now deliver very decent performance with ietd, especially when ietd has been patched to write with the same blocksize that lessfs uses.

Enjoy,

Mark Ruijter

PLEASE NOTE THAT LESSFS-1.1.x is not compatible with LESSFS-1.0.x.

Posted in Uncategorized | 22 Comments