This version of Lessfs adds support for Googles snappy compression.
http://code.google.com/p/snappy/
Although QuickLZ and LZO are very fast, snappy is truly amazing.
Lessfs with snappy:
time cp /tmp/Some_software_DVD.iso /fuse/
real 0m11.214s
user 0m0.021s
sys 0m2.863s
Lessfs with QuickLZ 1.5.0:
time cp /tmp/some_software_DVD.iso /fuse/
real 0m30.292s
user 0m0.027s
sys 0m3.272s
ls -alh /tmp/some_software_DVD.iso
-rw-r--r-- 1 root root 3.3G Sep 18 02:07 /tmp/some_software_DVD.iso
Other changes
Lessfs-1.5.7 comes with an important bug fix.
A bug introduced in lessfs-1.5.6 can cause deadlocks. When you use lessfs-1.5.6 then you are advised to upgrade.
hello,
I use qlz compression, io_chunk, and i do not understant stats from lessfs :
/data is backend ( 557 Mo )
/fuse is lessfs mount point . ( 1.1Go )
compression seems to be work but stats are strange.
Can we get more information about dedup stat ( how block is sharing …. )
Product is great.
DEV-10.98.98.1:/data# cat /fuse/.lessfs/lessfs_stats
INODE SIZE COMPRESSED_SIZE RATIO FILENAME
10 0 0 0.00 lessfs_stats
14 0 0 0.00 enabled
15 0 0 0.00 backlog
16 0 0 0.00 sequence
17 0 0 0.00 rotate_replog
160 197120 0 1000.00 vl.0pmJQ9
161 197120 0 1000.00 vl.Raujfr
162 197120 0 1000.00 vl.lYeY4B
163 197120 0 1000.00 vl.pCU7fZ
164 197120 0 1000.00 vl.DuWFTU
165 197120 0 1000.00 vl.AGCzJn
166 197120 0 1000.00 vl.bH3vyd
167 197120 0 1000.00 vl.3LKdRL
168 197120 0 1000.00 vl.oAWTP7
169 197120 0 1000.00 vl.Aru20M
170 197120 0 1000.00 vl.8fneq2
171 197120 0 1000.00 vl.MFIkrO
172 197120 0 1000.00 vl.0rpCJD
173 197120 0 1000.00 vl.d42X1v
174 197120 0 1000.00 vl.6XX2L3
175 197120 0 1000.00 vl.1y4fF1
176 197120 0 1000.00 vl.WcFLwj
177 197120 0 1000.00 vl.aZvz3m
178 197120 0 1000.00 vl.MyW93S
179 197120 0 1000.00 vl.evVbC2
DEV-10.98.98.1:/data# ls -lh /fuse/
total 1.1G
-rw——- 1 root root 448K Sep 17 14:57 vl.0pmJQ9
-rw——- 1 root root 448K Sep 17 14:57 vl.0rpCJD
-rw——- 1 root root 111M Sep 17 20:29 vl.1y4fF1
-rw——- 1 root root 111M Sep 17 20:28 vl.3LKdRL
-rw——- 1 root root 448K Sep 17 14:57 vl.6XX2L3
-rw——- 1 root root 448K Sep 17 14:57 vl.8fneq2
-rw——- 1 root root 110M Sep 17 20:27 vl.AGCzJn
-rw——- 1 root root 110M Sep 17 20:29 vl.Aru20M
-rw——- 1 root root 448K Sep 17 14:57 vl.DuWFTU
-rw——- 1 root root 112M Sep 17 20:29 vl.MFIkrO
-rw——- 1 root root 448K Sep 17 14:57 vl.MyW93S
-rw——- 1 root root 110M Sep 17 20:28 vl.Raujfr
-rw——- 1 root root 448K Sep 17 14:57 vl.WcFLwj
-rw——- 1 root root 110M Sep 17 20:29 vl.aZvz3m
-rw——- 1 root root 448K Sep 17 14:57 vl.bH3vyd
-rw——- 1 root root 111M Sep 17 20:25 vl.d42X1v
-rw——- 1 root root 110M Sep 17 20:28 vl.evVbC2
-rw——- 1 root root 448K Sep 17 14:57 vl.lYeY4B
-rw——- 1 root root 448K Sep 17 14:56 vl.oAWTP7
-rw——- 1 root root 111M Sep 17 20:28 vl.pCU7fZ
DEV-10.98.98.1:/data# du -csh .
557M .
557M total
The files that show a ratio of 1:1000 all share 100% contents from files that have been removed. The problem is that it’s hard, if not impossible to update the statistics from files that had chunks in common with a file that was stored first on the file system and is removed later. When this file, let’s call it the parent, is removed, the chunks used by other files are obviously not removed. This leaves you with a statistic of the child saying that they hardly consume diskspace, while with the parent removed they should report the space that they shared with the parent.
Let me think about a solution for this problem. I don’t keep track of the inodes that use a specific chunk. We only keep track of the number of times that a chunk is used.
So updating the child to correct the statistics is not feasible at present.
OK, that’s good.
you said : “We only keep track of the number of times that a chunk is used” , how can see this information.
I do some benchmark in order to use into our production server,
i need to easily see deduplication ration ( without compression) at different block size.
Unmount lessfs and type : listdb /etc/lessfs.cfg
Under blockusage you will see the hashes and how often the hash is used.
What version of BDB is lessfs supposed to compile on?
It errors out on SL6 (which uses db4-4.7.25):
lib_bdb.c: In function ‘bdb_open':
lib_bdb.c:434: error: ‘DB’ has no member named ‘set_partition’
lib_bdb.c:436: error: ‘DB’ has no member named ‘set_partition’
lib_bdb.c:450: error: ‘DB’ has no member named ‘set_partition’
lib_bdb.c:452: error: ‘DB’ has no member named ‘set_partition’
You need db-4.8 or higher. db-4.7.x does not support database partitioning.
I have tested it with db-4.8.30 and db-5.x
I noticed that SL6 /RH6 still comes with this _very_ old version of BDB. Can anyone explain why?
The current version is 5.2.28!
When using Lessfs (or any other fuse filesystem) with SL6/RH6 you also need to disable transparant huge pages. I did this by changing grub.conf:
title Scientific Linux (2.6.32-71.el6.x86_64)
root (hd0,0)
kernel /vmlinuz-2.6.32-71.el6.x86_64 ro root=/dev/mapper/VolGroup-lv_root rd_LVM_LV=VolGroup/lv_root rd_DM_UUID=ddf1_Array0 rd_LVM_LV=VolGroup/lv_swap rd_NO_LUKS rd_NO_MD LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYBOARDTYPE=pc KEYTABLE=us crashkernel=auto transparent_hugepage=never rhgb quiet
initrd /initramfs-2.6.32-71.el6.x86_64.img
Could you please tell us, why it is needed to disable transparent huge pages with RH6 when using lessfs?
Fuse file systems in general have been known to deadlock with transparent huge pages enabled. I have seen this happen with Lessfs as well. I did not recently check if this is still the case. The problem is not limited to Lessfs. All fuse filesystems are effected:
http://zfs-fuse.net/issues/123
http://us.generation-nt.com/answer/deadlocks-transparent-huge-pages-userspace-fs-daemons-help-200924121.html
Greetings,
For DB4.8 on RHEL6/SL6/Centos6, you can use the puias packages:
http://pkgs.org/centos-6-rhel-6/puias-computational-x86_6/db48-4.8.30-2.puias6.x86_64.rpm.html
PUIAS is another RHEL clone, but they provide updated packages for certain components. I think I grabbed the SRPM & rebuilt it on SL6 – worked like a charm! (Personal prference, I don’t think a rebuild of the SRPM was required)
Dunno why RHEL *doesn’t* provide this, as it can be installed along existing versions without breaking apps dependant on the older version
Cheers,
Rob.
Since you are talking about fast compression, have you tried to compare Snappy with LZ4 ? It seems LZ4 overtakes it, at least for decompression speed, if not on compression speed and ratio (depending on block size).
Thanks for the tip. I had never heard of LZ4. I’ll take a look to see if it makes sense to support it.
hmm – i tried with a lessfs compiled against db-5.2.36 – lessfs dies after copying about 1GB of data. Debug keeps quiet – lessfs dies silently, the rsync i used starts to complain about “transport endpoint not connected”.
Heiko,
Did you copy DB_CONFIG in to /data/mta?
Usually this happens because there’s no DB_CONFIG or DB_CONFIG has values that need to be increased. This is a very common issue with berkely db. There should be a DB_CONFIG that you can use in the lessfs-1.5.7 directory you untarred. If you still get lock ups check the /var/log/bdg-errors log. you may have to increase some of the locks in the DB_CONFIG file.
Also, a general FYI, I have heard that it is a very bad idea to use btrfs with vmdk’s because btrfs creates sub threads for writes (paraphrase) which makes it work for big writes but very bad for lots of very small writes.
at what stage do I need to copy a DB_CONFIG? Before or after ‘mklessfs’?
before mklessfs, but it can work after as well
Can you share you config and compilation options?
root@saturn:/usr/src/lessfs-1.5.8# while true; do ls -l /proc/21077/fd|wc -l; sleep 10; done
17
17
As you see there must be something different on your system.
The conclusion here was that you can not run lessfs as a non root user without changing the LISTEN_PORT option to a higher non privileged port. If you do so lessfs will leak the file descriptor and crash.
Aha. Actually I thought about that before and tried to comment out the LISTEN_* settings, but it didn’t work because the default is not to disable this feature :p
Setting port to a value above 1024 works indeed and stops the leak, thanks!
What about the compression rate of snappy?
Are there some comparison to qlz aviable?
As far as I can tell qlz compresses approx 30% better then snappy.
There has been a lengthy discussion about snappy v.s. quicklz here
LessFS crashed
Hi all,
Ubuntu 10.04.2 LTS
LessFS lessfs-1.3.3.1
During copying process LessFS crashed.
And rebooting the server doesn’t help.
When I ls -l the mount point it just hang.
Any workaround? Thank you.
The log only shows this:
Sep 23 20:38:17 server1 lessfs[1185]: The selected data store is tokyocabinet.
Sep 23 20:38:17 server1 lessfs[1185]: Lessfs transaction support is enabled.
Sep 23 20:38:17 server1 lessfs[1185]: Hash MHASH_TIGER192 has been selected
Sep 23 20:38:17 server1 lessfs[1185]: Lessfs uses a 24 bytes long hash.
Sep 23 20:38:17 server1 lessfs[1185]: Automatic defragmentation is enabled.
Sep 23 20:38:17 server1 lessfs[1185]: cache 8192 data blocks
Sep 23 20:38:17 server1 lessfs[1185]: Tuning the bucketsize for /data/db/mta/fileblock.tch to 1048576
Sep 23 20:38:18 server1 lessfs[1185]: Tuning the bucketsize for /data/db/mta/blockusage.tch to 1048576
Sep 23 20:38:18 server1 lessfs[1185]: Tuning the bucketsize for /data/db/mta/metadata.tcb to 1048576
Sep 23 20:38:19 server1 lessfs[1185]: Tuning the bucketsize for /data/db/dta/blockdata.tch to 1048576
Sep 23 20:38:20 server1 lessfs[1185]: Tuning the bucketsize for /data/db/mta/symlink.tch to 1048576
Sep 23 20:38:22 server1 lessfs[1187]: Lessfs has not been unmounted cleanly.
Sep 23 20:38:22 server1 lessfs[1187]: Rollback to : Fri Sep 23 19:27:37 2011
Thank you.
Did anyone else have problems getting snappy to run? I’ve compiled 1.5.8 without problems, but if COMPRESSION=snappy is specified the mklessfs command does not create the .dta/mta folders/files.
The command runs and does not return any error or information, but the following lessfs command does not mount anything as there is nothing to mount.
If I change the configuration to another compression or no compression everything works fine.
Any ideas?
did you make sure you enabled snappy when you compiled? Did you get any warnings in your compile?
Hi Richard,
no I did not enable it manually, I thought it would be enabled by default like the other compression algos.
Is there a documentation with all the switches available for ./configure?
What would be the parameter to enable snappy support?
When you do your
./configure –with-snappy
I found this by doing ./configure –help.
FYI, I built mine doing rpmbuild and I don’t think I had to specify this.
please not this is two dashes for – -with-snappy
(Two dashes in front of “with”)
it looks like only one dash in my comments
it’s also two dashes in ./configure –help
Hello,
Could you show me how should i set the various paramters for the highest performance? (lessfs.cfg and DB_CONFIG)
Actually, when i ran the filesystem, i only get 25MB/s even in a ramdisk.
I guess this is due to the miss configuration of those files.
Hi Mark,
I tested lessfs a couple of years ago when it was pre 1.0 and couldn’t continue with it because it was pre-release.
So now it’s at 1.5.9 I thought I’d give it a go again and hopefully use it in a production environment, but I’m not too sure where I’m going wrong with performance.
I am trying Berkelely DB 4.8 as you recommend and lzo. It is sitting on a fairly fast xfs partition which a dd from /dev/zero to at 850+MB/s.
Howevere, lessfs 1.5.9 using BLOCKDATA_IO_TYPE=file_io only writes at 30MB\s and gets slower and slower. With only 200GB of completely random data to test a worst case scenario, it slows down to 7MB\s. Is this to be expected?
There are 8 cores and 16GB RAM
I would like to get about 5TB of data in there, but at the rate it is getting slower and slower it is just not a viable option.
Hope you can help where I am going wrong.
Thanks,
Mark