All Syndicated

Why Thin Provisioning Is Not The Holy Grail for Utilisation

Thin Provisioning (Dynamic Provisioning, Virtual Provisioning, or whatever you prefer to call it) is being heavily touted as a method of reducing storage costs. Whilst at the outset it seems to provide some significant storage savings, it isn’t the answer for all our storage ills.

What is it?

Thin Provisioning (TP) is a way of reducing storage allocations by virtualising the storage LUN. Only the sectors of the LUN which have been written to are actually placed on physical disk. This has the benefit of reducing wastage, in instances where more storage is provisioned to a host than is actually needed. Look a the following figure. It shows five typical 10GB LUNs, allocated from an array. In a “normal” storage configuration, those LUNs would be allocated to a host and configured with a file system. Invariably, the file systems will never be run at 100% utilisation (just try it!) as this doesn’t work operationally and also because users typically order more storage than they actually require, for a many reasons. Typically, host volumes can be anywhere from 30-50% utilised and in an environment where the entire LUN is reserved out for the host, this results in a 50-70% wastage.

Now, contrast this to a Thin Provisioned model. Instead of dedicating the physical LUNs to a host, they now form a storage pool; only the data which has actually been written is stored onto disk. This has two benefits; either the storage pool can be allocated smaller than the theoretical capacity of the now virtual LUNs, or more LUNs can be created from the same size storage pool. Either way, the physical storage can be used much more efficiently and with much less waste.

There are some obvious negatives to the TP model. It is possible to over-provision LUNs and as data is written to them, exhaust the shared storage pool. This is Not A Good Thing and clearly requires additional management techniques to ensure this scenario doesn’t happen and sensible standards for layout and design to ensure a rogue host writing lots of data can’t impact other storage users.

The next problem with TP in this representation is the apparent concentration of risk and performance of many virtual LUNs to a smaller number of physical devices. In my example, the five LUNs have been stored on only three physical LUNs. This may represent a potential performance bottleneck and consequently vendors have catered for this in their implementations of TP. Rather than there being large chunks of storage provided from fixed volumes, TP is implemented using smaller blocks (or chunks) which are distributed across all disks in the pool. The third image visualises this method of allocation.

So each vendor’s implementation of TP uses a different block size. HDS use 42MB on the USP, EMC use 768KB on DMX, IBM allow a variable size from 32KB to 256KB on the SVC and 3Par use blocks of just 16KB. The reasons for this are many and varied and for legacy hardware are a reflection of the underlying hardware architecture.

Unfortunately, the file systems that are created on thin provisioned LUNs typically don’t have a matching block size structure. Windows NTFS for example, will use a maximum block size of only 4KB for large disks unless explicitly overriden by the user. The mismatch between the TP block size and the file system block size causes a major problem as data is created, amended and deleted over time on these systems. To understand why, we need to examine how file systems are created on disk.

The fourth graphic shows a snapshot from one of the logical drives in my desktop PC. This volume hasn’t been defragmented for nearly 6 months and consequently many of the files are fragmented and not stored on disk in contiguous blocks. Fragmentation is seen as a problem for physical disks as the head needs to move about frequently to retrieve fragmented files and that adds a delay to the read and write times to and from the device. In a SAN environment, fragmentation is less of an issue as the data is typically read and written through cache, negating most of the physical issues of moving disk heads. However fragmentation and thin provisioning don’t get along very well and here’s why.

The Problem of Fragmentation and TP

When files are first created on disk, they will occupy contiguous sections of space. If this data resides on TP LUNs, then a new block will be assigned to a virtual TP LUN as soon as a single filesystem block is created. For a Windows system using 4KB blocks on USP storage, this means 42MB each time. This isn’t a problem as the file continues to be expanded, however it is unlikely this file will end neatly on a 42MB boundary. As more files are created and deleted, each 42MB block will become partially populated with 4KB filesystem blocks, leaving “holes” in the filesystem which represent unused storage. Over time, a TP LUN will experience storage utilisation “creep” as new blocks are “touched” and therefore written onto physical disk. Even if data is deleted from an entire 42MB chunk, it won’t be released by the array as data is usually “logically deleted” by the operating system. De-fragmenting a volume makes the utilisation creep issue worse; it writes to unused space in order to consolidate files. Once written, these new areas of physical disk space are never reclaimed.

So what’s the solution?

Fixing the TP Problem

Making TP useful requires a feature that is already available in the USP arrays as Zero Page Reclaim and 3Par arrays as Thin Built In. When an entire “empty” TP chunk is detected, it is automatically released by the system (in HDS’s case at the touch of a button). So, for example as fat LUNs are migrated to thin LUNs, unused space can be released.

This feature doesn’t help however with traditional file systems that don’t overwrite deleted data with binary zeros. I’d suggest two possibilities to cure this problem:

  • Secure Defrag. As defragmentation products re-allocate blocks, they should write binary zeros to the released space. Although this is time consuming, it would ensure deleted space could be reclaimed by the array.
  • Freespace Consolidation. File system free space is usually tracked by maintaining a chain of freespace blocks. Some defragmentation tools can consolidate this chain. It would be an easy fix to simply write binary zeros over each block as it is consolidated up.

One alternative solution from Symantec is to use their Volume Manager software, which is now “Thin Aware”. I’m slightly skeptical about this as a solution as it places requirements on the operating system to deploy software or patches just to make storage operate efficiently. It takes me back to Iceberg and IXFP….

Summary

So in summary, Thin Provisioning can be a Good Thing, however over time, it will lose its shine. We need fixes that allow deleted blocks of data to be consolidated and returned to the storage array for re-use. Then TP will deliver on what it promises.

Footnote

Incidentally, I’m surprised HDS haven’t made more noise about Zero Page Reclaim. It’s a TP feature that to my knowledge EMC haven’t got on DMX or V-Max.

About the author

Chris Evans

Chris M Evans is an independent consultant with over 20 years' experience, specialising in storage infrastructure design and deployment.

2 Comments

  • Hi Chris,

    A really interesting read. Ive done a lot of Hitachi Dynamic Provisioning and dont see the “thin” aspect of such technologies bringing a great deal to the table.

    A couple of points though….

    1. I dont actually think 3Par does zero page reclaim. I know a while back (several months) they listed it on their website as a feature but it had not shipped and was not GA. I was not aware that this had changed. Can you clarify?

    2. As it stands at the moment, Im pretty sure all vendors who support defrag will advise in large bold print not to perform filesystem defrags. Online database defrags like Exchange or AD are fine though.

    3. I think this is more of a clarification. You seem to be suggesting that each NTFS 4K write will requirea new 42MB HDP page/chunk. This is not the case, but its probably just the way the sentence reads – you might want to clarify that.

    4. I hoinestly dont think the main benefits of these technologies is thin. Hence why HDS and EMC market them as Dynamic and Virtual Provisioning. Ive deployed a lot of HDS Dynamic Provisioning and none of the installs have wanted to make use of the “thin” aspect. They want it for the flexibility, ease of management, and performance benefits. Just like with most things virtual these days.

    Nigel

  • Hi Chris,

    A really interesting read. Ive done a lot of Hitachi Dynamic Provisioning and dont see the “thin” aspect of such technologies bringing a great deal to the table.

    A couple of points though….

    1. I dont actually think 3Par does zero page reclaim. I know a while back (several months) they listed it on their website as a feature but it had not shipped and was not GA. I was not aware that this had changed. Can you clarify?

    2. As it stands at the moment, Im pretty sure all vendors who support defrag will advise in large bold print not to perform filesystem defrags. Online database defrags like Exchange or AD are fine though.

    3. I think this is more of a clarification. You seem to be suggesting that each NTFS 4K write will requirea new 42MB HDP page/chunk. This is not the case, but its probably just the way the sentence reads – you might want to clarify that.

    4. I hoinestly dont think the main benefits of these technologies is thin. Hence why HDS and EMC market them as Dynamic and Virtual Provisioning. Ive deployed a lot of HDS Dynamic Provisioning and none of the installs have wanted to make use of the “thin” aspect. They want it for the flexibility, ease of management, and performance benefits. Just like with most things virtual these days.

    Nigel

Leave a Comment