Lessfs-2.0.0 pre-alpha2 has been released

About lessfs-2.0.0-prealpha2

Lessfs-2.0.0 pre-alpha2 is a development release only. It’s only purpose is to show what is coming and to receive some early feedback. Do not use this code for valuable data.

Lessfs-2.0.0 pre-alpha2 implements some basic snapshot code. A number of functions are either completely removed, like hardlinks, others or not or only partially implemented like truncation.

How to use this code.

Untar the code and chdir to it.

cd lessfs-2.0.0-palpha2
./configure; make

You can use the same lessfs configuration file that you use for the lessfs-1.x series.
Format and mount the filesystem

./mklessfs -f -c /etc/lessfs.cfg
./lessfs /fuse  -o configfile=/etc/lessfs.cfg

Copy a file to the filesystem
root@saturn:/tmp/lessfs-2.0.0-palpha2# cp Makefile /fuse/
root@saturn:/tmp/lessfs-2.0.0-palpha2# ls -al /fuse/
total 87
drwxr--r--  4 root root  4096 2011-01-29 21:50 .
drwxr-xr-x 32 root root  4096 2011-01-27 23:07 ..
drwxr--r--  5 root root  4096 2011-01-29 21:50 .lessfs
-rw-r--r--  1 root root 83503 2011-01-29 21:50 Makefile

Now let’s create a snapshot:

root@saturn:/tmp/lessfs-2.0.0-palpha2# telnet localhost 100
Trying ::1...
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
>snapshot create TEST
create snapshot 2 with label TEST
+OK
>exit
+OK bye

Now we are going to remove the copied data from the root of the filesystem and retrieve it from the snapshot:

root@saturn:/tmp/lessfs-2.0.0-palpha2# rm /fuse/Makefile
root@saturn:/tmp/lessfs-2.0.0-palpha2# cat /fuse/.snapshot-TEST/Makefile | tail -1
.NOEXPORT:
root@saturn:/tmp/lessfs-2.0.0-palpha2# ls -al /fuse/.snapshot-TEST/
total 84
drwxr--r-- 4 root root  4096 2011-01-29 21:50 .
drwxr--r-- 4 root root  4096 2011-01-29 21:52 ..
drwxr--r-- 5 root root  4096 2011-01-29 21:50 .lessfs
-rw-r--r-- 1 root root 83503 2011-01-29 21:50 Makefile

Implementation details

Lessfs-1.x used to store the internal structure of a file as:
(key) inode-blocknr : (value) hash
With lessfs-2.x this has changed to:
(key) inode-blocknr : (value) btree
The btree contains: (key) snapshotnr : (value) hash

Now all this is pretty straight forward. The difficult part is how to expose the snapshot data to the filesystem without the need to remount lessfs. As you can see lessfs now shows the snapshots in the root of the filesystem as : .snap-NAME
As soon as you enter .snap-NAME the context of the filesystem will have to change to the snapshot in that .snap-NAME resembles. Fuse does have a notion about contexts but this mechanism is not usable to maintain snapshot context.

After some experimenting I came up with the following solution to the problem.
An inode has a length of 64 bits of which we reserve 20 bits for snapshots, which leaves 44 bits for the inodes themselves. Reserving 20 bits for snapshots will mean that Lessfs can hold 1048576 snapshots.

Now for the clever stuff. When you enter a snapshot directory this directory should refer to the ‘real’ directory. Let me illustrate this with an example:
We create : /mydir/myfile after which we create a snapshot with name TEST and delete myfile. Let’s assume that snapshot TEST has snapshot number 5. When we want to restore the file we chdir to /.snapshot-TEST then /.snapshot-TEST should refer to / which has inode 1 and snapshot context 5.
To be able to retrieve the snapshot number as well as the inode number lessfs changes the inode number 1 to
70368744177668 What has happened is that the 20 bits reserved for the snapshot number have now been added to the inode. The inode now appears to be completely different while in fact it is just a combination of the snapshot and inode number. This makes it possible to retrieve the snapshot number for any given inode at any moment.

Lessfs-2 has two routines that take care of the inode-snapshot conversions:

unsigned long inode_to_snapinode(unsigned int snapshot, fuse_ino_t inode);
INOSNAP snapinode_to_inode(fuse_ino_t snap_inode);

You can find these routines in lib_common.c

Problems to solve

The problem with the approach that I just explained will be garbage collection when we remove a snapshot. As long as we remove or change data that is created and removed within the same context everything works identical to lessfs-1.x
However when we need to purge a snapshot with higher snapshots in existence the fun begins. Now every inode that has references to the snapshot needs to be checked. And the blocks that are no longer in use can be put on the freelist.
Without a doubt this will be a slow process that can be very time consuming. I am afraid that the jury is still out on this one…..

Posted in Uncategorized | Leave a comment

Lessfs-1.3.1 has been released

This version of Lessfs introduces hamsterdb as an alternative to tokyocabinet. It also solves a memory leak as well as a problem with encryption.

Posted in Uncategorized | 10 Comments

Lessfs2 development has started

Whenever we create something new for the first time, there is always this feeling afterwards that you could have done better. Lessfs is no exception to the rule. And for some time I have been wanting to rewrite the code. One of the weaknesses of the current lessfs code is that it uses the high level Fuse API. This API only works with NFS to some extend, but it is far from optimal. Metadata operations are not as fast as they can be when Lessfs would use the low level API. And there are a number of features that Lessfs1 still lacks.

How will lessfs2 be different?

Obsoleted:

  • Support for tokyocabinet as datastore has been removed.

New features:

  • Support for snapshots
  • Uses the FUSE low level API
  • Fast inode cloning
  • Support for hamsterdb and maybe BerkeleyDB to store the metadata as well as tokyocabinet
  • Self healing raid*
  • Multimaster as well as master slave.
  • Automatic storage tiering based up-on usage, per chunk.

Self healing raid

Let me explain what I mean with ‘self healing’ raid. Traditional file systems will store any chunk of data that it is requested to store. Since lessfs only stores unique data chunks it will be a catastrophe when this one chunk of data is lost or corrupted. Since lessfs uses strong hashes to identify each chunk of data and with this hash it can also detect data corruption. However to be able to repair the corruption it will need to add some redundancy to the data. Traditional raid systems are no longer safe when large amounts of data are stored on todays high capacity SATA drives. Although we still rely on this today, data corruption will occur and remain unnoticed. Lessfs will support redundancy mechanisms that will allow a user to define how many copies of a data chunk have to be stored. Research indicates that keeping 3 copies of your data is needed to be safe. Anyway, lessfs will allow you to decide if and how much parity will be added to the data. And when possible Lessfs2 will repair corrupted chunks.

Lessfs2 will also support additional backends to tokyocabinet. At least support for hamsterdb will be added. BerkelyDB support will be added when there is enough demand for it.

Current development status

The lessfs2 project has been created on sourceforge. I have uploaded a lessfs2 pre alpha release which is in essence a port from lessfs to the low level FUSE API. People who are interested in metadata performance can test it. This code does not yet pass all the Posix tests but it does already do everything you would expect a filesystem to do. If you do decide to test it then please select the file_io backend, otherwise it will not work. Many things will change in the near future to the metadata structures. This is needed to implement snapshot support and other new features.

Posted in Uncategorized | 15 Comments

Lessfs-1.2.2.6 is available for download

This release fixes a silly mistake the would lead to freeing a NULL pointer.
Oops..

Posted in Uncategorized | 9 Comments

Lessfs-1.2.2.5 is available for download.

This release improves a part of the cache eviction code in lessfs. When you copy a large number of small files that exceed the read cache size then lessfs needs to remove entries from the cache. In this case the new code improves the performance by 40%.

For example :
tar xjf /home/maru/Downloads/linux-2.6.34.tar.bz2 now takes 1m11.810s where it took 2m9.838s with lessfs 1.2.2.3.

This release solves a potential problem where a system could run out of memory when lessfs is mounted and the filesystem contained millions of files on a 32bit system. It also solves a problem with lessfs_stats that was introduced with lessfs-1.2.2.3.

Posted in Uncategorized | 1 Comment

Lessfs-1.2.2.3 is available for download

This release improves the performance of copying small files (metadata operations) by at least 30~50%. It now takes only 2.3 seconds to copy my /etc directory over to lessfs where it took 8 seconds with lessfs-1.2.2.2

time cp -ar /etc /fuse/
real	0m2.250s
Posted in Uncategorized | 1 Comment

Lessfs performance

ZFS-FUSE, XFS, NTFS3G and LESSFS

A PERFORMANCE COMPARISON

Introduction

Once in a while I run a set of tests to compare Lessfs with other filesystems. The results of these tests are used to determine where lessfs can be improved and how it does in general. The upcoming 1.2.2.2 release of lessfs again improves some performance aspects which made it interesting to perform a set of performance tests. This article describes the tests and shows the results.

Iozone

As usual I started with iozone. In the past lessfs would do great on writes but would be weak on reads. The latest code completely changes this as can be observed from these iozone results.

read

read

Sequential throughput test:

This is a simple test that shows how fast the filesystem can write a data stream to disk.

Test : time (dd if=/dev/sda1 of=/fs bs=1M; sync)

ntfs3g : 999292928 bytes (999 MB) copied, 33.803 s, 29.6 MB/s
real    0m39.394s

lessfs : 999292928 bytes (999 MB) copied, 9.2262 s, 108 MB/s
real    0m17.347s

xfs : 999292928 bytes (999 MB) copied, 2.55242 s, 392 MB/s
real    0m12.722s

zfs-fuse : 999292928 bytes (999 MB) copied, 40.9082 s, 24.4 MB/s
real    0m41.451s

Extract kernel archive

Test : time tar xjf /Downloads/linux-2.6.34.tar.bz

ntfs3g : 0m32.314s

lessfs : 2m51.920s

xfs: 2m24.540

zfs-fuse: 1m29.172s

Test : cp -ar /etc /fs

ntfs3g: 0m2.338s

lessfs: 0m8.347s

xfs: 0m4.528s

zfs-fuse: 0m2.088s

Conclusion

What does all this tell us? Lessfs has originally been designed for backup purposes and has therefore been tuned to store a small number of fairly large files as fast as possible. This still is a point in which lessfs excels. Although the creation and handling of many small files can still be improved the last releases come with acceptable performance, even on this point. Read and write performance are excellent. The next release will focus on further improvement of file creation / metadata performance. But all things considered it does not look bad at all.

Posted in Uncategorized | 3 Comments

Lessfs-1.2.0 has been released.

This version of lessfs again improves metadata performance by approximately 8%.

Posted in Uncategorized | 15 Comments

Lessfs-1.1.9.10 is available for download.

This release improves the performance for metadata operations. A metadata cache layer has been added which typically increases copying of small files with a factor 12. It also fixes a memory leak when lessfs was used with encryption enabled. A routine used for debugging was still active even when lessfs was compiled without –enable-debug. As a result the normal write speed has now increased a bit as well.

Enjoy,

Mark Ruijter

Posted in Uncategorized | Leave a comment

Lessfs-1.1.9.6 has been released.

This release enables lessfs background truncation tasks to resume after an umount or an unexpected shutdown. Background truncation is now enabled by default. Since lessfs uses the freelist database to store the state of truncation this database now needs to be configured in lessfs.cfg even when the tc datastore is used.

Posted in Uncategorized | 9 Comments