Backup or archive your Equallogic with lessfs.

Using lessfs to backup or archive your favorite SAN.

One example of how lessfs can be put to work is to use it to archive the data of your favorite storage array. This is much more space efficient then using Equallogic snapshots or replication. A snapshot requires at least 100% extra storage when one wants to be certain that the backup is always valid. If you want your data to be copied to another equallogic your need at least 105% extra storage. lessfs can be used to do the job much more space efficient. The deduplication and compression that lessfs provides compresses the data in most cases at least 30% for the first copy. When a daily backup is needed then lessfs wil require very little extra storage to store a full backup.

Someone might mention that lessfs is a brand new filesystem and that he doesn’t trust his data to be stored on a new system. This fear is easily removed by creating a simple script that makes it possible to verify the data.

This is how it’s done:

Let’s say the server that runs lessfs hash connected an Equallogic snapshot on /dev/sda

The script would simply do something like:

md5=$(dd if=/dev/sda bs=1M | dd5sum – /pooldata/$VOL/$DATE/$lv 2>&1)

This would read from the Equallogic snapshot and write to /pooldata while calculating the md5 hash of the data that is read from /dev/sda. We can easily store the date, volumename and md5sum in our favorite database. This makes it possible to prove that we have a valid backup. It also makes it possible to verify data correctness after we restored the backup in case of a disaster recovery.

It is also possible to copy the lessfs data to another site. This can be done by putting the lessfs databases on drbd or by copying the databases with rsync. Compared to replication with an equallogic array this requires only a fraction of the bandwidth since the data that has to be transferred is highly compressed.

Enjoy,

Mark Ruijter

p5rn7vb

5 Responses to Backup or archive your Equallogic with lessfs.

  1. Pingback: How to setup EqualLogic Replication « Michael Ellerbeck

  2. Ivan Ivanov says:

    In the following line:
    md5=$(dd if=/dev/sda bs=1M | dd5sum – /pooldata/$VOL/$DATE/$lv 2>&1)
    isn’t the “dd5sum” supposed to be “md5sum”? Also wordpress does weird things with the dash.
    And another thing… looking at the md5sum man (if this dd5sum is actually md5sum) I fail to see how md5sum would calculate the hash of the data coming on the pipe from dd, and then write this same data to /pooldata/$VOL/$DATE/$lv. In this case md5sum simply calculates the md5 hash of the stream data and also the /pooldata/$VOL/$DATE/$lv file (if it exists). Am I missing something?

  3. Ivan Ivanov says:

    I played around a bit and got this to work by using tee as in:
    md5=$(dd if=/dev/sda bs=1M count=1 | tee/pooldata/$VOL/$DATE/$lv | md5sum – 2>&1)

  4. Jeff says:

    Since I was super bored today, I wrote a python tool called ddsum (http://ddsum.com) that is like dd5sum but lets you change the hash algorithm. It let’s you calculate the hash of something before and after compression for example:

    dd if=/dev/sda | ddsum -H sha512 | bzip -9 | ddsum -H sha512 – sda.img 2>sda.img.sha512

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>