I recently had a customer ask if it was better to use Datrium Data reduction only, or Datrium with a combination of Microsoft Windows Deduplication. Theoretically there should be no difference between using one or both on the VMware datastore, but let’s see for sure!
Datrium Data Reduction reduces data at a block level, across all VMs in the datastore. Deduplication is based on a variable block size and is done in line both on the flash layer as well as the centralized datastore presented by the DataNode cluster. Compression is also done in line, therefore there is no waiting period for data to reduce. It is “thin” as soon as it hits the disk.
Microsoft Windows Deduplication on the other hand is a file based dedup based on chunks of the same size. While it is a “variable” size dedup it is not a true block level process. Additionally MS DDP is run on an hourly job, and garbage collection is done weekly. This means your data will land thickly, then reduce, and as garbage collection is only done weekly there is the possibility of invalid chunks staying in the store taking up space for up to 7 days. Additionally depending on the version of MS DDP you use there are some serious limitations.
So with all of that out of the way let’s look at the test. I built 2x 2012 R2 servers to mimic my client’s environment. I installed the DFS-R role, and built an identical 1TB VMDK on each. Then I used the YRC Public Database Dump to download almost 300GB of TIFF images with .txt files explaining each image. We used TIFFs because that was the primary data type in my client’s DFS-R cluster, but they’re also useful as they don’t compress very well so the data reduction numbers we get out of DVX are going to be more specific to just dedup and not factoring in compression as well. I then made an identical copy of the images so we should see a 2:1 savings from dedup.
After allowing the initial DFS-R sync to complete this is what we saw.

Microsoft shows a 75% data reduction or a total savings of 220GB. Not bad, but let’s see what DVX Shows.

For this single server Datrium shows a 295GB total savings. When we consider that this is also deduping the Windows partition against other 2K12R2 VMs on the same host, it’s pretty clear that Datrium is the winner here.
When we look at the second DFS-R box we can see that there is only 2% unique data and that 98% of the data is shared from other sources.

So at the end of the day, while MS DDP may be good for some situations, you are better off leaving your servers “vanilla” and having Datrium do the heavy lifting of the data reduction.
One of the other benefits with DVX doing the data reduction here would be the ability to have a ReFS volume for your DFS-R shares. You would get to use all the benefits of the ReFS formatted volume, and still have full data reduction.
Like this:
Like Loading...
I recently had a customer ask if it was better to use Datrium Data reduction only, or Datrium with a combination of Microsoft Windows Deduplication. Theoretically there should be no difference between using one or both on the VMware datastore, but let’s see for sure!
Datrium Data Reduction reduces data at a block level, across all VMs in the datastore. Deduplication is based on a variable block size and is done in line both on the flash layer as well as the centralized datastore presented by the DataNode cluster. Compression is also done in line, therefore there is no waiting period for data to reduce. It is “thin” as soon as it hits the disk.
Microsoft Windows Deduplication on the other hand is a file based dedup based on chunks of the same size. While it is a “variable” size dedup it is not a true block level process. Additionally MS DDP is run on an hourly job, and garbage collection is done weekly. This means your data will land thickly, then reduce, and as garbage collection is only done weekly there is the possibility of invalid chunks staying in the store taking up space for up to 7 days. Additionally depending on the version of MS DDP you use there are some serious limitations.
So with all of that out of the way let’s look at the test. I built 2x 2012 R2 servers to mimic my client’s environment. I installed the DFS-R role, and built an identical 1TB VMDK on each. Then I used the YRC Public Database Dump to download almost 300GB of TIFF images with .txt files explaining each image. We used TIFFs because that was the primary data type in my client’s DFS-R cluster, but they’re also useful as they don’t compress very well so the data reduction numbers we get out of DVX are going to be more specific to just dedup and not factoring in compression as well. I then made an identical copy of the images so we should see a 2:1 savings from dedup.
After allowing the initial DFS-R sync to complete this is what we saw.
Microsoft shows a 75% data reduction or a total savings of 220GB. Not bad, but let’s see what DVX Shows.
For this single server Datrium shows a 295GB total savings. When we consider that this is also deduping the Windows partition against other 2K12R2 VMs on the same host, it’s pretty clear that Datrium is the winner here.
When we look at the second DFS-R box we can see that there is only 2% unique data and that 98% of the data is shared from other sources.
So at the end of the day, while MS DDP may be good for some situations, you are better off leaving your servers “vanilla” and having Datrium do the heavy lifting of the data reduction.
One of the other benefits with DVX doing the data reduction here would be the ability to have a ReFS volume for your DFS-R shares. You would get to use all the benefits of the ReFS formatted volume, and still have full data reduction.
Share this:
Like this: