[UPDATE : Congratulations to Violin Memory – this past week they finally released a version of deduplication for their OS, and a pretty good one at that. You can read the post below to understand the history involved and the good news of finally getting to this point].
[ This has been updated – see update below. For more on the new class of flash arrays that fully support deduplication and a full range of storage features, Recommendations for All-Flash Storage Arrays; Aiming Beyond Simply IOPS and Low Latency.]
Sometime ago, I wrote the post and in that article, I pointed out there are places where de-dup fits nicely and, yes, that it is possible to mis-use de-duplication and that companies that don’t offer de-duplication at all (inline or not – as part of their array’s operating environment) don’t even give their customers the choice to use or not use it or for what to use or not use it with. Recently, Violin Memory’s founder wrote an interesting post pointing to some of the issues he sees with inline de-duplication. This post will offer an alternative view.
We are seeing more interesting articles on de-duplication and data reduction generally. First, a really well-rounded article (and talk), All-Flash Storage Efficiency Is About More Than De-Duplication, from George Crump, Lead Analyst at StorageSuisse. In this article he also nicely points to other data reduction technologies. I especially like that he includes replication – a key enterprise technology that is part of the foundation for disaster recovery. Also mentioned is thin-provisioning and compression. All of these are important technologies along with de-duplication. You can also learn more by listening to the Permabit/StorageSuisse video.
You could be forgiven if you thought that Violin Memory didn’t like de-duplication as a technology since they have been offering articles fearing the worst from using de-duplication –
- First, an article from CTO Jonathan Goldick (recently departed from Violin Memory), Thoughts on DDUP and Compression.
- Second, a cautionary article, Storage Myths: Dedupe for Databases from flashdba, who according to his blog is employed by Violin Memory but his views are clearly his own. The article attempts to debunk the storage “myth” about dedup for databases. In it they explain some of the issues with using dedup in conjunction with Oracle. They conclude that while dedup may be great for use cases like VDI it offers limited benefits in database environments.
- Finally, the latest de-dup article comes from (founder and CTO) Jon Bennett of Violin Memory who has offered up, admittedly, some edge use-cases where you might not want to use inline de-duplication. Fair enough. In Bennett’s article he states that and always-on de-duplication used with databases is not a feature but a bug. On Twitter, Violin Memory offered the following tweet :
The unwritten subtext in these articles seems to be a subtle or not-so-subtle argument that, somehow, those that don’t offer or use de-dup – are somehow better off. The subtlety that not everything is a database is somehow missed here. And more interestingly some of the on-going developments with inline de-dup seem to have been overlooked or gone unmentioned. This suggests a company at a competitive disadvantage.
Unlike, SolidFire, Pure Storage, Nimbus Data, NetApp,Tegile, HP, Hitachi Data Systems, EMC, etc, etc – Violin Memory does not offer de-duplication directly in their array’s VMOS operating system – inline or not – so even if the application that is using the array is for a VDI deployment where de-dup might make perfectly good sense – too bad, you can’t leverage native de-duplication even on the latest Violin Memory’s 6264 with the latest vMOS operating system. Ouch.
You would think that Jon’s points would have easily demonstrable proof-points in the real world with architectures like Pure Storage or Tegile where inline de-duplication is being consistently used with databases. Tegile offers some database numbers in their lab tests that I found interesting and excellent (page 12) :
You would also think that this ‘bug’ would kill any chance that companies like Tegile or Pure Storage would have of competing to run Oracle databases. Obviously the ‘bug’ is not a ‘bug’ – if you look at Pure Storage, like other flash/SSD arrays, it has done reasonably well at winning in the database area. Especially with architectures that leverage de-duplication and compression like those that use Delphix (see here). They have done a good job of explaining their support of inline de-duplication :
Permabit who licenses their inline deduplication technologies to vendors has recently demonstrated through an inline deduplication benchmark that crossed the one million IOPS barrier with inline dedup on. Sound familiar ? Only this benchmark is with deduplication turned on.
It’s not like always-on or inline de-duplication is the only de-dup offering one is forced into providing or like database vendors like Oracle have told you not to use it in conjunction with their database – quite the opposite. Oracle actually offers use cases and best practices for using de-duplication around their Oracle Databases and have embedded de-dup in their storage hardware. And vendors, like Nimbus Data, offer de-duplication today on a per LUN basis. So you have a choice and you can have it on or off on a particular LUN. You can choose to have some LUNS de-duped and other LUNS not de-duped. Personally, I like this type of design – it provides flexibility.
To understand the strangeness of all of this and before we go any further – let’s step backwards in time. For some time, Violin Memory has been releasing press releases stating that de-duplication was coming to their vMOS operating system. Dizzying PR and articles around the Symantec/Violin Memory partnership can be found that suggest as much as 80% reduction in the storage footprint. In its absence they have even developed a partnership with Atlantis for among other things, in-line de-duplication, to target VDI, which according to the press release will make use of in-line de-duplication to provide consistently fast VDI. Atlantis seems to understand the value of in-line deduplication.
So the open question, given Jon’s recent article and the antipathy toward de-duplication – will there be a vMOS version of dedup that the Symantec partnership was to provide ?
Summary. The very fact that two CTOs at Violin Memory have fixed their sights on the negatives of de-duplication suggests a company in a defensive posture on the issue of de-duplication in general. All their major competitors have deduplication. From my perspective, if a company thinks that always-on de-dup is such a menace to society, then simply offer a version of de-duplication that you can turn on and off. Nimbus Data’s Halo approach to de-duplication seems to me to be very reasonable – it allows you to turn de-duplication on and off on a LUN/filesystem basis. In any case, at the very least, they should offer their customers the opportunity to use deduplication on good-fit use-cases by including it in their array’s operating system.
With larger vendors like EMC, NetApp and Hitachi offering deduplication technologies, and as well, smaller vendors like Nimbus Data, SolidFire, Tegile and others also successfully offering deduplication technologies – Violin Memory has been for some time and is at a competitive disadvantage because of their lack of de-dup. Instead of arguing against what is a useful technology they simply need to offer a competitive deduplication feature (as they said they would) in vMOS – and let that implementation speak for itself – as all these other vendors have already done.
Update : Violin Memory introduced dedup into their line-up first with the Windows Flash Array but it has been missing in action from their main staple OS, VMOS. Then they offered it in the Concerto 2200 – a hardware appliance solution that includes dedup. However, this is a costly way to get dedup. And surprise, no longer a dedup critic, Violin Memory actually sponsored an IDC white paper, Why Inline Data Reduction Is Required For Enterprise Flash Arrays. which was ironic at the time because they didn’t offer inline dedup in their standard VMOS operating system. Most other all-flash array vendors have been and are offering dedup as part of their array’s operating systems. It’s hard to imagine that this was really their dedup answer – and it wasn’t. With the latest launch their arrays now offer a version of ‘granular’ deduplication. Visit their site to learn more.
Go to more posts on storage and flash storage at http://digitalcld.com/cld/category/storage.