How lessfs plays tetris and wins
Many data de-duplicating file systems or backup solutions struggle with the same thing. Deleting data from such a file system is complicated since a single chunk of data can be used by many files. Things become even more complicated when the file system also compresses the data. In that case the chunks no longer have a nice equal size but instead they can have any size between zero and the maximum allowed block size. This makes reusing the available space similar to playing tetris.
Solving the puzzle
The current versions of Lessfs have two ways to handle garbage collection. The file_io backend simply keeps a freelist with offsets that can be reused by the file system. It does not free up space to the underlying file system. It just no longer grows when free space can be found in the file. This strategy comes with a number of drawbacks though. One disadvantage is that finding and filling holes in the file takes time and causes the IO to become very random. Which of course is bad for the throughput. And sadly your disk will still be full with a large blockdata file, even when you have removed most of the data from the filesystem. The chunk_io backend does not have this disadvantage. When you delete the data from the file system the individual chunks are simply removed and all is well. Or is it? The disadvantage in this case is that millions of chunks will result in millions of files that have to be stored on the underlying file system. Btrfs does this pretty efficient and is therefore usable. However all file systems suffer when many millions of files have to be stored or deleted.
Problem solved : Lessfs multifile_io
Lessfs 1.6 introduces a new backend : multifile_io that addresses all the problems that where mentioned before.
Data is now stored in chunks that are rounded at 512 bytes. So a compressed chunk with a size of 4000 bytes will allocate 4096 bytes on disk. Lessfs simply opens 256 files. One file for chunks that are 512 bytes in size, one file for chunks that are 1024 bytes in size and so on. This simplifies our game of tetris quite a bit since you can now easily move a block from the top of the file to a hole somewhere at the bottom of the file. However doing so with a life file system would be rather complicated and not safe at all. Therefore Lessfs opens two sets of 256 files. The first file set is active for writing data, while the second is being optimized. When then Lessfs is done with optimizing the second file set it switches to the first file set and the writes are done to the second fileset. Since Lessfs uses transactions it switches the fileset used for writing at a moment when Lessfs is stable, the transactions are committed and no writes are done. Lessfs also waits before actually truncating the optimized files so that it is certain that this can be done safely because the databases have already been committed to disk.
To be able to relocate a chunk of data that is stored at the end of a file we need to be able to determine the hash of the data. In theory it is possible to uncompress the chunk and recalculate the hash. In this case Lessfs simply stores the hash before the data chunk. This also makes it possible to easily verify or even relocate the data with a separate program.
In fact this makes a tiered solution with data automatically migrating between SSD/SAS/SATA very simple to implement. Although data migration is even easier with the chunk_io backend.
So there you have it. Lessfs now supports online space reclamation that is safe and performance efficient even though lessfs uses data compression.
Things to come
On the top of my list is now switching to the lowlevel fuse interface. This will make Lessfs much faster in combination with SAMBA or NFS. Also improving replication and support for data tiering are high on the list. When Lessfs switches to the lowlevel API support for tokyocabinet as database will most likely be dropped. Support for using TC as data store will however disappear for sure. This removes a lot of obsoleted code from the project which is always a good thing.
Lessfs-1.6.0-alpha0 is the first release that contains multifile_io. This is still alpha quality code and replication does not yet work with multifile_io.