Introduction
Key value databases are becoming more popular lately. And for good reason. Although these databases offer a limited set of features they do come with one important feature : high performance. Not all key value databases are created equal though. You can choose from at least a dozen or so. Since I started to develop lessfs I have tried more then a view of them. Let me share my experiences with the most important key value databases.
A list of the key value databases evaluated for lessfs
A more complete list of key value databases can be found on : NOSQL
Berkeley DB
Lessfs development started with using Berkeley DB. This database excels in reliability. No matter how the server crashes or goes down, bdb always recovers as long as it has the log files to do so. I have used Berkeley DB extensively with OpenLdap and it never let us down. Berkeley DB does have a dark side though. But a typical Ldap server will not reveal this dark side. Ldap servers are mostly used and optimized for read operations. This is why Berkeley DB and Ldap work fine. Everything changes when the application needs write performance. In this case Berkeley DB is very slow. Way to slow for the kind of performance that Lessfs needs.
gdbm/ndbm
These are the old school databases. These databases a both slow and not reliable enough for modern applications. They do not support transactions / ACID operations and are easily damaged upon for example an unexpected power down. See the benchmark below for the performance numbers.
Tokyocabinet
Tokyocabinet is currently used with Lessfs. Tokyocabinet really excels in speed. As far as I can tell it holds the record on speed for a key value database that supports both read and write operations. Dan Berstein’s constant database (cdb) is just a bit faster. But cdb is a constant database and therefore not what we are looking for. Tokyocabinet comes with a very nice API and is very well documented. So this appears to be the perfect key value database, right? Well, no. Tokyocabinet comes with a few problems of it’s own. In the beginning TC did not come with transactions. The author relies on replication to keep his data safe. Later on support for transactions was added to TC making it somewhat more resilient to crashes and unexpected power-downs. I am deliberately using the word ‘somewhat’ here. In practice it is still very easy to damage a TC database beyond repair. Just pressing the power switch when the database is under load usually does the trick. I am not complaining that I might loose a some data in this case. But the fact that the database might be damaged beyond repair is not acceptable in my opinion. It also looks like development on TC has stopped or at least slowed down in favor of Kyotocabinet. Last but no least my attempts to discuss problems with the author mostly results in one way traffic. Especially when it comes to discussing data corruption problems the silence is deafening.
Hamsterdb
Recently I came across hamsterdb. A key value database that targets the embedded market. Hamsterdb comes with very decent performance and is highly crash resilient. At least I have not been able to corrupt hamsterdb with for example deliberate power-downs when the database is under high load. Hamsterdb is well documented and the hamsterdb source code is easy to read, which makes this database very attractive. It is also possible to buy a commercial license for a reasonable price which makes it even better.
Performance numbers.
Tokyocabinet comes with a set of performance tests in the /bros directory. I have created a few additional tests to be able to test TC/BDB and hamsterdb with and without transaction support.
| NAME |
COMMENT |
RECORDS |
WRITE(SEC) |
READ(SEC) |
| gdbm |
|
1000000 |
7.385 |
2.505 |
| TC |
Without transactions |
1000000 |
0.339 |
0.387 |
| BDB |
Without transactions |
1000000 |
3.666 |
2.258 |
| HAMSTERDB |
Without transactions |
1000000 |
1.376 |
1.360 |
|
|
|
|
|
| TC |
With transactions |
10000 |
0.376 |
0.024 |
| TC |
HDBOTSYNC+transactions |
10000 |
1440.078 |
0.029 |
| BDB |
With transactions |
10000 |
116.897 |
2.312 |
| HAMSTERDB |
With transactions |
10000 |
0.484 |
0.068 |
I tested TC twice, once with and without HDBOTSYNC. With HDBOTSYNC TC synchronizes the disk after every transaction. This should ensure that the database survives unexpected termination without loss of data. But since it then takes 1440 seconds to write 10000 records this can hardly be seen as a valid solution to the database corruption problem.
Conclusion
There is a large number of very good open source key value databases available today. Each database has its own benefits and dark sides. Choosing one that is optimal for your application depends on the performance that is required as well as reliability and support that you can expect from the community. As always the perfect solution does not exist. I would have liked to see a key value database that works with a log based approach. PBXT is an example of a database design that works this way but regrettably PBXT is not available as merely a key value database.