Lessfs-1.5.0 has been released

This release contains a number of performance improvements for the Berkeley DB backend. It also comes with an alternative to file_io, named chunk_io. chunk_io has one big advantage over file_io. Where file_io stores all the data in one big file, chunk_io stores the data as 4~128k chunks. When data is deleted from lessfs with chunk_io, it immediately results in free space for the underlying filesystem. The file_io backend only marks space in the blockdata file as being available to be reused, but does not actually shrink the file. To keep the number of files in a single directory to an acceptable maximum the depth of the directory structure can be specified with the CHUNK_DEPTH directive. By default the data will be hashed into 256 directories which is also the minimal required depth.

The chunk_io backend works well with btrfs and to some extend with reiserfs.


Mark Ruijter

This entry was posted in Uncategorized. Bookmark the permalink.

17 Responses to Lessfs-1.5.0 has been released

  1. cw says:

    I had some issues with bdb on several 1.4.x’s, where any file delete would result in an out of memory error and lessfs would bail. I wasn’t sure if I was really out of memory or not though (it wasn’t swapping), I was trying a smaller block size to try and get better dedup on vmdk files. theoretically, matching the block size of the filesystem in the vmdk would get the best dedup. What’s the memory overhead in lessfs per chunk?

  2. cw says:

    just tried using the same cfg file for 1.5 and, both with lzo and 1.5 with bdb.
    copied a 9GB sparse file that has 1.9GB in it, 1.3 works fine, but with 1.5 I get the following and lessfs crashes:

    Aug 6 15:18:29 file5 lessfs[30046]: bin_write_dbdata : database 2 keylen 16 datalen 24
    Aug 6 15:18:29 file5 lessfs[30046]: Database write failed : Cannot allocate memory

    OS is ubuntu 10.04 x64, 8GB ram with free showing 7.4GB in buffers+cache and 32KB in swap
    debug is at 5 and that’s all the info it’s giving.

    • Richard says:

      When you tried using 1.5 with bdb did you put a DB_CONFIG file in /data/mta?
      also, I am not sure if using the same config file from 1.3 series would be best because 1.3 did not support berkley db (bdb).

      • cw says:

        I did not, but i did with 1.4.x. forgot to this time. Same symptoms though.

        • Richard says:

          was there any information in /var/log/lessfs-bdb_err.txt?

          • cw says:

            sorry for the long delay in responding…
            “Lock table is out of available lock entries”

            bdb is sql right? what advantages in your usage does it have over sqlite3? I know sqlite3 can handle a couple million rows without issue, and handles locking decently too.

          • Richard says:

            If you get “Lock table i sout of available lock entries” then you need to increase then numbers in /data/mta/DB_CONFIG

            the default DB_CONFIG in the lessfs-1.4.x series always needed to be modified. I did not have to modify the 1.5.0/DB_CONFIG, but depending on how many files you have, you may still need to increase many of the allowed #’s.

            I believe that while bdb does support sql, sql would not be optimal in a deduplicating file system, so it is probably being used as a key based database for faster processing.

            myself, I am using Chuck_IO. I don’t know if that bypasses bdb. I’m going to email Mark some of my questions.

  3. cw says:

    recompiled without bdb and it worked.

  4. Chris says:

    Is HamsterDB no longer supported? After compiling 1.5.0, the lessfs command does not mount the store any more. It does not show any error either.

    • Richard says:

      I don’t know if it is not supported, but Maru is not that fond of HamsterDB anymore. See his 1.4.0 announcement:
      Reason for adding support for Berkeley DB is not that it was sexy to introduce it. Nor that it was fun to write the code. I needed a very reliable back-end and speed was less important then reliability. While hamsterdb still looks promising, for now it does not fulfill the requirements.

    • Mark Ruijter says:

      Lessfs-1.5.0 will not work with older releases of hamsterdb. The hamsterdb interface changed recently. Lessfs-1.5.0 should work with hamsterdb-1.1.13, if not please let me know.

      • Chris says:

        Unfortunately my backup hard drive holding the copy of the VM crashed and I had to rebuild the whole system. I set it up with BDB, since the database was damaged and had to be refilled anyway. So I won’t be able to test that.

  5. David says:

    I’ve compiled with BerkeleyDB and now have it working. I still have a question where I cannot find an answer. How do you see free space after deletes?

    If I copy a 3GB ISO to the dedupe store and run ‘df’, I can see the added 3GB. If I delete it, df does not change. I understand that df may not be able to see the changes (due to whatever various reason), but there must be some way to determine the amount of space the dedeup store is taking up? Or, is .lessfs/lessfs_stats the only way? My issue is the dedupe store gets to 100% with the test backups I’m sending to it yet I have no way to estimate the disk space I need as ‘df’ is showing the size of /data/dta/blockdata.dta. Does this file not shrink once files are deleted? Is there a manual way to shrink it ?
    Sorry if this has been answered elsewhere but I have not seen it.

  6. David says:

    Just saw the answer to my question in README.file_io. The blockdata file does not shrink.


    • Richard says:

      You could try to use chunk_io (see README.chunk_io in the lessfs-1.5.0 dir). I’m testing it right now, but have not yet deleted anything. You would also have to have /data be reformatted to btrfs or reiserfs so don’t forget to backup your data.

  7. Sylar says:

    I just found lessfs recently and think it’s a cool tool to save my disk spaces.
    I have a question about the usage of lessfs.
    It seems that lessfs could be used in the POSIX system.
    But can lessfs be used in the Non-POSIX system as well?
    Because the filesystem we are using is MogileFS and it can only be accessed by API calls, not POSIX.
    So I am wondering if lessfs could fit in the MogileFS?
    If it couldn’t, will it be available in the future?

  8. David says:

    Another question about disk usage. I am using file_io and my OS partition space for my dedupe is slowly creeping towards 100 % even though the data in the lessfs dta file is only using 70% of the space. When it reaches 100%, lessfs will ‘crap out’ even though there is space availabe in the dta file.
    Is there anyway around this without using chunk_io ?

    If I do use chuck_io, can I use ext4 or xfs reliably as I’m not sure I trust btrfs yet? It crapped out on me in a test VM using lessfs.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>