Syndicated

Fixed Block vs Variable Block Deduplication — A Quick Primer

PlayPlay

Deduplication technology is quickly becoming the new hotness in the IT industry. Previously, deduplication was delegated to secondary storage tiers as the controller could not always keep up with the storage IO demand. These devices were designed to handle streams of data in and out versus random IO that may show up on primary storage devices. Heck… deduplication has been around in email environments for some time. Just not in the same form we are seeing it today.

However, deduplication is slowly sneaking into new areas of IT… and we are seeing more and more benefit elsewhere. Backup clients, backup servers, primary storage, and who-knows-where in the future.

As deduplication is being deployed across the IT world, the technology continues to advance and become quicker and more efficient. So, in order to try and stay on top of your game, knowing a little about the techniques for deduplication may add another tool in your tool belt and allow you to make a better decision for your company/clients.

Deduplication is accomplished by sharing common blocks of data on storage environments and only storing the changes to the data versus storing a copy of the data AGAIN! This allows for some significant storage savings… especially when you consider that many of file changes are minor adjustments versus major data loads (at least as far as corporate IT user data).

So, how is this magic accomplished? — Great question, I am glad you asked! Enter Fixed Block deduplication and Variable Block deduplication…

Fixed Block deduplication involves determining a block size and segmenting files/data into those block sizes. Then, those blocks are what are stored in the storage subsystem.

Variable Block deduplication involves using algorithms to determine a variable block size. The data is split based on the algorithm’s determination. Then, those blocks are stored in the subsystem.

Check out the following example based on the following sentence: “deduplication technologies are becoming more an more important now.”

image

Notice how the variable block deduplication has some funky block sizes. While this does not look too efficient compared to fixed block, check out what happens when I make a correction to the sentence. Oops… it looks like I used ‘an’ when it should have been ‘and’. Time to change the file: “deduplication technologies are becoming more and more important now.”   File —> Save

After the file was changed and deduplicated, this is what the storage subsystem saw:

image

The red sections represent the changed blocks that have changed. By adding a single character in the sentence, a ‘d’, the sentence length shifted and more blocks suddenly changed. The Fixed Block solution saw 4 out of 9 blocks changed. The Variable Block solution saw 1 out of 9 blocks changed. Variable block deduplication ends up providing a higher storage density.

Now, if you determine you have something doing fixed block deduplication, don’t go and return it right now. It probably rocks and you are definitely seeing value in what you have. However, if you are in the market for something that deduplicates data, it is not going to hurt to ask the vendor if they use fixed block or variable block deduplication. You should find that you get better density and maximize your storage purchase even more.

Happy storing!

About the author

Bill Hill

2 Comments

  • Great article, Bill!

    One thing I would like to highlight for deduplication is the fact that all data is not created equal. Going fixed vs. variable alone is not enough to take advantage of storage optimization provided by data deduplication. The technology should also understand the application well. I have discussed this here: https://www-secure.symantec.com/connect/blogs/power-netbackup-deduplication-application-awareness-and-global-deduplication

    Warm regards,
    Rasheed

    Disclaimer: I work for Symantec, my posts outside Symantec portals need to represent the views of my employer.

  • Hey, great primer on the difference between fixed and variable block de-duplication!  One thing to keep in mind is performance.  Variable block de-duplication is typically much more CPU intensive than fixed block de-duplication, therefore in the real world is still typeically relegated to backup devices and fixed block is typically used for primary storage de-duplication.

Leave a Comment