A Linux data deduplication wishlist

Open discussion about data deduplication for Linux in general.

Would you like a data deduplicating blockdevice in userspace or in the kernel?

Would you prefer lessfs-2.0 with replication and snapshots?

Your opinion counts!

This entry was posted in Uncategorized. Bookmark the permalink.

33 Responses to A Linux data deduplication wishlist

  1. Matt says:

    I’d love to see the ability to go from inline de-duplication to offline. If creating a VTL using cheap disk and a Linux system with Lessfs doing the dedupe, it’d be great to have it dereference duplicated blocks after file operations are complete, or at a scheduled time. This would allow for a higher backup throughput.
    Oh, and replication and snapshots would be great, and kernel! Kernel!

    • maru says:

      You can already do this with http://linuxvtl.googlepages.com/ and lessfs.
      Just stage the data to a disk partition and cp the files over to lessfs with a cron job afterwards.

      Lessfs is not going to be of much use though. The problem with tape backup software is that data from different streams is combined which makes deduplication impossible.
      To get dedup to work you will need to unravel this proprietary data stream and write it in a way that enables deduplication.
      I you would like to use tar to archive backup’s on lessfs then you will need to patch tar so that it aligns new files on a new block.

      • Matt says:

        Ahh yes, but with a VTL you can create multiple libraries and thus get around multi-plexing to a tape, so you would be using data from only one client in each virtual tape. But I like your idea with the VTL. Now if you can point out something for Linux that works like COMSTAR on the Opensolaris platform, I would then be set for building my own fibre channel VTL. :)

  2. phess says:

    Kernel! Kernel! :)
    Oh, and I guess it should be implemented on the filesystem level (maybe the virtual filesystem?), not the block device level.

    • maru says:

      Implementing a deduplicating blockdevice does have a few advantages.
      1. Relatively easy to implement, even in the kernel.
      2. Any filesystem can be used on top.
      And if we do a proper job we might even be able to implement snapshots in a way that performance remains at a decent level with more then a handful of snapshots. Which is currently a huge problem with LVM.

      Of course we can use lessfs as a blockdevice to. Just use nbd, ietd or the loop device driver on top. The only problem with lessfs today is that because it is implemented with fuse it suffers from higher latency and bad NFS performance.

      Being in the kernel does have some advantages. I think that performance will go up maybe even with 25%. But the downside is that it will take much longer to develop new features.

      • Sean says:

        I also vote for a blockdevice in the kernel. I already have at least 3 ideas for the use of this in a professional setting, and if we can gain performance by putting it in the kernel, it would be an easier sell.

  3. jordi says:

    Oh… yes, we want more sugar: kernel, replication and snapshots :D

    Btw, how could you implement it on kernel side? I am a complete ignorant on kernel code, but writing it at kernel side would mean implementing a file system from scratch? Or that would be like making a layer on top of other file system (like ecryptfs does) ?

  4. Adam Miller says:

    I think kernel space would be better because we would be able to avoid the overhead of FUSE. That being said, it would be a LOT harder to implement because you would then need to develop a non FUSE filesystem (or fork one) and apply the dedup logic to it including some sort of built in book keeping mechanism. So while it would be cool to have this be kernel level, I could see it as being a major pain to develop and could potentially remove (at least some of) the “fun” factor … and when the fun factor is gone, we don’t see anymore amazing code like we do now in the current LessFS.

    -Adam

    • maru says:

      Developing a brand new, in the kernel, filesystem would surely be a daunting task.
      Modifying an existing filesystem might be easier. What Net.p did with Waffle is probably the easiest solution and it should be possible to implement something similar with an existing filesystem.

  5. Alexander Horwatt says:

    Offline block-level dedup would be a nice to have. In theory, it should provide us with quick realtime I/O at the expense of slower after hours “deduping” operations.

  6. Erik Van Hamme says:

    I agree to the “kernel! kernel!” sentiments expressed above.

    I can see advantages and disadvantages to both the FS or Block device setup.

    If a block device is used, the dedup will strip the redundant superblocks out of the filesystem, making raid1 a requirement.

    If a filesystem is used, you would have to implement a filesystem, which seems like no easy task (think ACL’s, meta information, …)

    I would lean towards the block device + raid setup. Offline dedup is not a priority for me as doing the dedup online is more suitable to my workload.

    The coolest possible solution would be to add your dedup functions to ext4 or btrfs, but I do not know if you are interested in joining those projects.

    Anyway, thanks for all the hard work on lessfs. It is much appreciated.

  7. Chris-U says:

    I think it’s one more time time to say thanks ;-)
    Very great work until now. Your project is realy unique!
    I hope your projet finds the way to the linux kernel!

  8. MasterTH says:

    hi guys,

    i’m looking at this project for a while now. its great how work is going on. now my question, isn’t it easier to run dedup-process at a specific time and not all the time in background?

    the advantage would be, the filesystem is as fast as other ones because the dedup is made at a time the fs is not used so much (roots can do it with cron or something else)

    “kernel! kernel!” is my oppinion too :)

    • maru says:

      There are lengthy discussions on the Internet about inline and post-process deduplication. Adding post-process deduplication to the current lessfs code would be relatively easy. I would only have to implement a queue which contains the not yet deduped data. When lessfs receives a signal it would start to process this queue.

      This would surely increase the performance of lessfs when deduplication is not turned on. Hmmm, this might actually be something to think about….

      • MasterTH says:

        implementing lessfs into the kernel would also enable to use the storage as an iscsi target. at this point (i tried it yesterday) it isn’t possible

        • maru says:

          Hi,

          I don’t agree on this. You can create a (sparse) file on lessfs and use this file with ietd.

          Example : ietd.conf
          saturn:/usr/src/lessfs-0.8.4 # cat /etc/ietd.conf
          # Example iscsi target configuration
          #
          # Everything until the first target definition belongs
          # to the global configuration.
          # Right now this is only the user configuration used
          # during discovery sessions. “IncomingUser” specifies credentials the
          # initiator has to provide – several of these are supported. If mutual
          # CHAP shall be employed, “OutgoingUser” specifies the user/pass
          # combination the target will provide – only one is supported.
          # Leave them alone (keep them commented out) if you don’t want to use
          # authentication for discovery sessions.

          #iSNSServer 192.168.1.16
          #iSNSAccessControl No

          #IncomingUser joe secret
          #OutgoingUser jack 12charsecret

          # Targets definitions start with “Target” and the target name.
          # The target name must be a globally unique name, the iSCSI
          # standard defines the “iSCSI Qualified Name” as follows:
          #
          # iqn.yyyy-mm.[:identifier]
          #
          # “yyyy-mm” is the date at which the domain is valid and the identifier
          # is freely selectable. For further details please check the iSCSI spec.

          Target iqn.2001-04.com.example:storage.disk2.sys1.xyz
          # Users, who can access this target. The same rules as for discovery
          # users apply here.
          # Leave them alone if you don’t want to use authentication.
          #IncomingUser joe secret
          #OutgoingUser jim 12charpasswd
          # Logical Unit definition
          # You must define one logical unit at least.
          # Block devices, regular files, LVM, and RAID can be offered
          # to the initiators as a block device.
          Lun 0 Path=/fuse/ietddata,Type=fileio
          # Alias name for this target
          # Alias Test
          # various iSCSI parameters
          # (not all are used right now, see also iSCSI spec for details)
          #MaxConnections 1
          #InitialR2T Yes
          #ImmediateData No
          #MaxRecvDataSegmentLength 8192
          #MaxXmitDataSegmentLength 8192

          —————————————–

          dd if=/dev/zero of=/fuse/ietddata bs=1 count=0 seek=5000M
          0+0 records in
          0+0 records out
          0 bytes (0 B) copied, 0.000120756 s, 0.0 kB/s

          saturn:/usr/src/lessfs-0.8.4 # /etc/init.d/iscsi-target start
          saturn:/usr/src/lessfs-0.8.4 # /etc/init.d/open-iscsi start
          Starting iSCSI initiator service: done
          Setting up iSCSI targets: unused

          saturn:/usr/src/lessfs-0.8.4 # iscsiadm -m discovery -p 127.0.0.1 -t sendtargets
          127.0.0.1:3260,1 iqn.2001-04.com.example:storage.disk2.sys1.xyz
          saturn:/usr/src/lessfs-0.8.4 # iscsiadm -m node -T iqn.2001-04.com.example:storage.disk2.sys1.xyz -p 127.0.0.1 -l
          Logging in to [iface: default, target: iqn.2001-04.com.example:storage.disk2.sys1.xyz, portal: 127.0.0.1,3260]
          Login to [iface: default, target: iqn.2001-04.com.example:storage.disk2.sys1.xyz, portal: 127.0.0.1,3260]: successful
          saturn:/usr/src/lessfs-0.8.4 # tail -5 /var/log/messages
          Nov 25 10:30:17 saturn kernel: sd 6:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn’t support DPO or FUA
          Nov 25 10:30:17 saturn kernel: sdb: unknown partition table
          Nov 25 10:30:17 saturn kernel: sd 6:0:0:0: [sdb] Attached SCSI disk
          Nov 25 10:30:17 saturn kernel: sd 6:0:0:0: Attached scsi generic sg2 type 0
          Nov 25 10:30:17 saturn iscsid: connection1:0 is operational now
          saturn:/usr/src/lessfs-0.8.4 # fdisk -l /dev/sdb

          Disk /dev/sdb: 5242 MB, 5242880000 bytes
          162 heads, 62 sectors/track, 1019 cylinders
          Units = cylinders of 10044 * 512 = 5142528 bytes
          Disk identifier: 0x00000000

          Disk /dev/sdb doesn’t contain a valid partition table
          saturn:/usr/src/lessfs-0.8.4 # fdisk /dev/sdb
          Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
          Building a new DOS disklabel with disk identifier 0xca0115d2.
          Changes will remain in memory only, until you decide to write them.
          After that, of course, the previous content won’t be recoverable.

          Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

          Command (m for help): n
          Command action
          e extended
          p primary partition (1-4)
          p
          Partition number (1-4): 1
          First cylinder (1-1019, default 1):
          Using default value 1
          Last cylinder or +size or +sizeM or +sizeK (1-1019, default 1019):
          Using default value 1019

          Command (m for help): p

          Disk /dev/sdb: 5242 MB, 5242880000 bytes
          162 heads, 62 sectors/track, 1019 cylinders
          Units = cylinders of 10044 * 512 = 5142528 bytes
          Disk identifier: 0xca0115d2

          Device Boot Start End Blocks Id System
          /dev/sdb1 1 1019 5117387 83 Linux

          Command (m for help): w
          The partition table has been altered!

          Calling ioctl() to re-read partition table.
          Syncing disks.
          saturn:/usr/src/lessfs-0.8.4 # mke2fs /dev/sdb1
          mke2fs 1.40.8 (13-Mar-2008)
          Warning: 256-byte inodes not usable on older systems
          Filesystem label=
          OS type: Linux
          Block size=4096 (log=2)
          Fragment size=4096 (log=2)
          320000 inodes, 1279346 blocks
          63967 blocks (5.00%) reserved for the super user
          First data block=0
          Maximum filesystem blocks=1312817152
          40 block groups
          32768 blocks per group, 32768 fragments per group
          8000 inodes per group
          Superblock backups stored on blocks:
          32768, 98304, 163840, 229376, 294912, 819200, 884736

          Writing inode tables: done
          Writing superblocks and filesystem accounting information: done

          This filesystem will be automatically checked every 35 mounts or
          180 days, whichever comes first. Use tune2fs -c or -i to override.
          saturn:/usr/src/lessfs-0.8.4 # mount /dev/sdb1 /mnt
          saturn:/usr/src/lessfs-0.8.4 # df /mnt
          Filesystem 1K-blocks Used Available Use% Mounted on
          /dev/sdb1 5037000 10008 4771124 1% /mnt

          The only problem that we have here is that ietd will only do synchronous IO with a 4k block size. This is really bad performance wise..

          saturn:/usr/src/lessfs-0.8.4 # time (dd if=/dev/sda1 of=/mnt/boot.img bs=1M; sync)
          70+1 records in
          70+1 records out
          73995264 bytes (74 MB) copied, 16.5213 s, 4.5 MB/s

          real 0m19.395s

          Directly to lessfs:
          saturn:/usr/src/lessfs-0.8.4 # time (dd if=/dev/sda1 of=/fuse/boot.img bs=1M; sync)
          70+1 records in
          70+1 records out
          73995264 bytes (74 MB) copied, 0.848914 s, 87.2 MB/s

          real 0m4.661s

          It still puzzles me why Linux continues to use 4k as a default block size (with the exception of Itanium I believe). For example FreeBSD uses 16k.

          There are two possible fixes:
          1. Change ietd to support a larger then 4k block size.
          2. Put lessfs in the kernel and queue the data there. This removes the overhead of fuse.

          Mark.

      • MasterTH says:

        i forgot it would be also cool if theres a possibility to show how much space is safed by the compression & deduplication process

        another question is:
        is it possible to write a windows-Applikation to access a external harddisk which is used in a server as a backup-disk and lessfs?

  9. Alex says:

    Hi folks,

    two things, first, working at the kernel level may give more performance, but may not allow everybody to use lessfs so early (as it won’t be included by default on hosted servers). For ex, i manage some servers whose kernel are “monolithic kernel”, so far, i can not insert module or easily set up lessfs as it is actually. Well, it’s only a point of vue, but through fuse, it allow almost all configuration, without tweak, to use the project.

    Second thing about FS deduplication rather than blocklevel deduplication. Here come the question about “backuping” the lessfs architecture.
    As it exists today, it is mostly “simple” to backup the “storage blocks” using rsync from one server to another. (through the internet mostly, in my case managing servers i’ll ever physically touch, but only access from network connexions).

    So if lessfs come in the future with replication with kind of rsync differential update, there will be no problem. If the only way to backup the lessfs architecture is “dumping” a full block device, it will be difficult to maintain replication of the deduplication on application servers, using words in “tion” make me look crazy a bit ;)

    Well, ok for performances enhancement, but with as much flexibility.
    The important point are, imho :
    – ability to replicate the deduplication system from on server to another (hot deduplication would be such a good thing, syncing without unmouting!)
    – ability to use it on most of the system
    – keep Mark as cool as he is until now :p
    – be able to redirect logs out of /var/log/message :P
    and so on ;)

    But, it’s a great software :) and will make some noice :p

  10. Anonymous says:

    My only wish right now is for somebody to package lessfs for Debian. This would bring you quite a number of users/testers, and would save my lazy ass from compiling it myself.

  11. Roland says:

    i`m missing something like “lessfs scrub” (like “zpool scrub” in solaris/zfs) which tells me, if the whole lessfs repository (data & metadata) is sane.
    that one should also report, if there are corrupt data blocks and which files are affected.

  12. Alex says:

    Just, as we are in a wishlist, and when all the current fixes are ok :p

    A pretty good idea could be to set lessfsck process as a multi thread application, as lessfs can do :)

    I guess, just reading the lessfsck code but i’m not a developer :p , that lessfsck seem to be only a 1 thread process.

    As i am now running it on a 40Gb storage file (about 400Gb deduplicated), it takes ages to check :p ….
    No emergency for this ! :p

  13. Robert LeBlanc says:

    I too have been following this project for some time. Although I’ve love kernel code, I’d much prefer the option to have lessfs present a block device (much like drbd and LVM) that I can put any file system on. I too am looking for a much better iSCSI or FC target that has dedup. I can also see some benefits for a Lustre OST on lessfs. All of these require block devices for any decent performance. I’ve measured ietd using standard disk and loopback on DataDomain. The performance was terrible. I think lessfs could take the lead in this respect. As far as replication, if I wanted it, I’d put it on top of drbd, I don’t think you need to spend your time on it, someone already specializes in it.

    Robert LeBlanc

  14. thank you very much for LessFS!

    How about providing the ability to use a remote database?
    Multiple users could use the same block database and therefore contribute improve deduplication efficiency. LessFS would also become a deduplicated and distributed filesystem.

    Snapshot and replication would be great.
    A have no opinion on Kernel versus FUSE

    • Sean says:

      Great idea!!! Even the major deduplication *cough*DataDomain*cough* vendors don’t have this yet.

    • maru says:

      Actually I have already tested this. And it is not to hard to implement. I changed the code where lessfs stores the actual data chunks in tc so that it uses tokyo tyrant.
      The tyrant database can of course be anywhere on the network.

      Changing lessfs to use tokyotyrant calls is very straight forward. :-)

      There are some problems with latency but nothing that can’t be resolved.

      Mark.

  15. Error reporting.

    When something goes wrong, when a configuration is bad, when a directory is missing, or when it crashes — lessfs should say something about what happened.

    I had trouble letting up lessfs because while mklessfs creates the actual databases, it doesn’t create the directories those databases live in. I also had a problem where a corrupt database prevented lessfs from mounting.

    In all these cases, lessfs simply fails silently. Very important to have some (any) feedback to the user.

  16. prashant jadhav says:

    i am be student and my project is to add deduplication in linux kernel can u help me or give guidance on this topic tell us what we require for this

    • maru says:

      I guess the first question that you should ask yourself the type of solution the you are going to work on.

      A. Create a block device with deduplication.
      or
      B. Create a filesystem with deduplication.

      In both cases you can start on something brand new, but you can also add the feature to existing projects. You can for example enhance btrfs or de linux loop driver.

      It’s all up to you. ;-)

  17. mac.linux.free says:

    I´d really like to have ACL support – cause I´m now using zfs-fuse for my big samba servers.

    cheers

  18. Dusty says:

    Thanks for the development of this software!
    Data deduplication is a sore spot for linux.

    My personal whish list:
    * debian packages
    * offline deduplication
    * robust error handling and reporting
    * file system integration, i.e.: with ext4 or btrfs
    I don’t know the difficulty to develop a common infrastructure to be used in different file systems, but that would be great so that different FSs could share a common block deduplication.
    This way you should not bother with snapshots and other advanced things because they would be implemented natively.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>