Marketing, FUD and Doing What You Do Best
Rather than leave a lengthy comment on Tom Cook’s blog post from Friday Compression and Dedupe: Business Value and Data Safety (and from a marketing perspective, Friday’s are bad days to post blogs – especially in the summer) – I thought I would respond here (this may get lengthy as Tom made a number of points which I need comment on).
The first thing I do want to say is that when doing technical marketing; the proper strategy would be to not be on defense but rather take an offensive approach. However, given the amount of FUD that Tom put in his latest blog post, I have to defend compression to some degree.
Now, I think we can all agree that data compression and data deduplication are two technologies that can complement one another very well. Avamar (EMC) deduplicates the data at the source and then compresses the data before sending it to the Avamar Data Store gaining tremendous efficiency in network utilization. ProtecTIER (IBM) compresses the data once it is deduplicated at the target device before it stores the data. Other solutions also combine compression and data deduplication.
I’d like to comment on some key point Tom made in his piece where he is just blatantly wrong:
1) Compression identifies redundant data across a very small window, usually 64 KB. – While this may be true for other compression technologies, this is not true for Storwize. Storwize performs compression where the initial window is not fixed in size at all; it is the resultant write that is fixed in size. This size is also specifically mapped to the I/O patter of the data being written. The goal is such that in 1 I/O Storwize can do all the work it needs to on a particular file or LUN and it is for this reason Storwize has no performance penalty.
2) Compression produces data reduction rates at most 2X for most data types. – Seems Tom needs a lesson in the most common answer in IT – “IT DEPENDS”. Data compression ratios are 100% tied to the data type. For a true indication of data compression ratios see Figure 1.
Figure 1
1) Compression alters the underlying data structures and requires compression and decompression of data. – If you look up the definition of LZ Compression (in Wikipedia) you get the following: The Lempel-Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. Deduplication is more of a lossy technology as it throws away pieces of data from other files in order to deduplicate data. If any piece of data cannot be reconstructed for any reason, multiple data sets are now corrupt. Additionally, in order for ANY application to work with any data it must be rehydrated, today’s applications must work with data in their native format.
2) Compression operates in the data path and impacts read/write performance as a ‘bump in the wire’ (kudos to Storwize for their work to improve performance). – So we actually do thank Tom for his kind (and accurate words here).
3) Compression is a potential single point of failure for data retrieval. – How would it be different for data deduplication?
Tom also provided a chart that had a few inaccuracies in it so I took the liberty of fixing some of these as well. See Figure 2
Figure 2
I also want to call Tom out regarding his comment “I am gratified by the response to Albireo by … the press by recent OEM adoption. Albireo is becoming a standard in data optimization…”
Here the term delusional comes to mind only because the links that Tom calls out are links that are focused on Permabit and not OEMs discussing the adoption of deduplication. Additionally, when Permabit announced Albiero there were no OEMs mentioned or quoted as having adopted or even tested the solution. This brings into question how the solution (which was only announced a week ago) is becoming a ‘standard’?
When it comes to providing technical information in any type of written communication, it is important to make sure, if you want to be credible, that the data provided is accurate in order to ensure credibility. A lot of times this required doing research to ensure that the answers you are providing are correct. It is clear that the competitive team or technical marketing team at Permabit has let Tom down here and not provided him with the correct information with regard to how compression works in real life (at least from a Storwize perspective).
This brings me to the tail end of the title of this post. Perhaps a CEO should focus on running their company and driving the ‘OEM’ deals they speak of and doing business development than trying to do any type of marketing or technical marketing. I know our CEO says that he is not a blogger and that some companies have natural bloggers while others do not. I guess this is because he is too busy running the only company that delivers real-time data compression without any performance degradation and helping us to manage hundreds of customers as well as a number of OEM opportunities.







Let me start by writing that I do not endorse your posts on this topic nor Tom’s. Frankly, I think both are misleading and poorly researched. I expect much more from both of you.
I’m not sure how many times I can possibly write this, but clearly more than a dozen times on various storage blogs is it enough: Dedupe and compression are essentially two peas from the same pod.
For the most part, compression is the term used to describe data reduction applied WITHIN individual files (this includes such types as ZIP, SIT, RAR and TAR etc which DO NOT look for commonality across files), and dedupe is the term used to describe data reduction as it is applied ACROSS multiple files and types of data. I prefer to refer to the two by application: intrafile and interfile compression. It’s neat, simple and most people will quickly understand the difference. The two are complementary.
I ask that you please not refer to dedupe as “lossy”. Dedupe (as I have seen it implemented to-date) is no more “lossy” than lossless compression. Today’s dedupe, like lossless compression, allows the EXACT original data to be reconstructed from the compressed data. Any opinion to the contrary is a gross misrepresentation of the facts.
Using the term lossy in the context of dedupe will only confuse readers. Lossy data compression allows an approximation of the original data to be reconstructed in exchange for far superior compression rates. And yes, it takes a lot more computational horsepower and time to execute lossy compression at tolerable speeds.
Your thoughts?
Joseph,
Couple of points – I actually think we are in close agreement unless I missed something. The first line of 3rd paragraph, the meat of the post, I talk about the fact that the two technologies are complementary. I was trying to point out that the posts that are still trying to ‘compare’ the two are a waste of time. However, if you are going to compare, let me at least shed some light on the facts.
Your point regarding lossless and lossy are also right on. I would put them both in the same category, however if you review Tom’s chart – to say that compression is lossy when it is not by definition is a gross misrepresentation of the facts. I was merely trying to point out this fact and how could dedupelication be less lossy than compression?
Anyway, as I said in the beginning of the post, I hate playing the defensive, no one usually wins – I was just trying to set the record straight.
Thanks again, as always for reading and commenting and making the post better.
Steve
Thank you for the clarification, Steve, I appreciated it. It seems we are in agreement.
To be fair, I did post to Tom’s blog as well with some questions about his numbers and reasoning. It remains to be seen if the post will be approved, and if he’ll respond to it. Stay tuned.
Among other things, I told him that his statement “Compression alters the underlying data structures and requires compression and decompression of data.” makes no sense at all.
Thanks for the professional coaching, Steve! You know this is 2010 and in the virtual/social networked world, old axioms about “never on Friday” and “CEOs don’t blog” are a little 1980’s-esque, don’t you think?
You know, I am spending a lot of time meeting with OEM customers and prospects and my blogs (usually written while flying) generally reflect what I hear from them. They fear data lock-in and they want to do what is right for their customers in terms of data optimization and data safety while maintaining storage performance and features.
We are a supplier and partner to OEMs. They will make product announcements on their own timelines.
We are highly supportive of compression owned, operated and deployed by the storage vendor, because it makes sense and that is what OEMs tell me they are implementing. Isn’t that supportive of the future Storwize business model?
Cheers,
Tom
Tom,
First, thanks for your note. I know there is some sarcasm and trust me I am no CEO and what you have done with Permabit is nothing short of great so I am glad the comments were taking in the manner in which the were intended.
I actually like the dialog around what customers are looking for very much. I would agree with you 100% that customers don’t want vendor lock in. The real question is what do the OEM’s want? I would say the do care about the customer BUT I would also think that behind the scenes they do want vendor lock in because they want customers to keep buying their stuff. I also don’t think this is a bad strategy if you are a storage vendor. You need a competitive edge and if partnering with a provider who provides a unique solution can give you that edge, that is a good model.
I’ll leave the announcement thing alone for now and just say, best of luck and I hope they can come out with solutions soon. And as you are supportive of compression, I too am very supportive of the total, overall capacity optimization strategy, AS LONG AS – the end users get everything they require (and I believe you believe the same thing as well).
Thanks again Tom!
Steve
Ahh…re: vendor lock-in.
Remember what Alexis de Toqueville said: “America is great because she is good.” I have great faith in market forces driving “good”, customer centric behavior.
Keep charging.
Tom
Hey Steve:
Jered posted a follow up blog to address some of the technical points you discuss above. Rather than fill up your comment section, you can view them here: http://blog.permabit.com/index.php/2010/06/compression-and-dedupe-redux/
Regards,
Mike
Mike,
I’ll approve, but in general it is poor form to put a link in someone’s comments back to your blog. – Farley had posted something on his blog about good linking practices. I would be more than happy to write a response that links to you but no sweat. And I also already added some comments on his post. Thanks for the update Mike.
Steve
Fair enough Steve. Jered’s on another plane, so here is his post:
Yesterday The Storage Alchemist at Storwize posted a complaint about Tom’s discussion of compression and deduplication. We certainly aren’t savaging compression technologies — I think perhaps it’s clearer to consider our points not so much as a criticism of compression, but as a list of concerns regarding bump-in-the-wire optimization appliances. We absolutely agree with Steve the Alchemist that data compression and data deduplication are two technologies that complement one another well — we use both in our Enterprise Archive Value NAS and Cloud Storage offerings , and we make it possible for our partners to compress (if they so choose) when using our Albireo SDK.
I’ll comment on his technical concerns.
Compression and deduplication are very similar in that they identify and eliminate redundant data, but the scope of this duplicate identification is vastly different. Traditional compression works on a small window of data and with short duplicate segments so that the compression tables fit efficiently in a very small amount of memory. Storwize may not be using a 64 KB window, but I imagine the order of magnitude is about right… and that’s not a criticism of their technology at all. In fact, the way Storwize manages data in chunks so that they can maintain performance is very clever.
Calling deduplication lossy is nonsense; both compression and dedupe replace redundant data with references to other instances of that data, just at different scales as I note above. Unlike Ocarina’s NFO, which frighteningly throws away actual content, both dedupe and traditional compression return the original bitstream. Tom’s point was that Albireo embedded dedupe leverages existing file and block system concepts to make those references so no interaction with our software is required on read, while a compression appliance modifies the data format before it reaches the storage array, which creates data lock-in. Take away the appliance, and the storage is full of uninterpretable data. That’s a concern for storage vendors and users alike.
As to the chart, when you look at this as ‘embedded dedupe’ vs. ‘appliance-mediated compression’, you can see why Tom says that appliance compression alters the data, and Albireo dedupe does not require ‘rehydration’. As for ‘optimizes block’, I haven’t yet seen Storwize’s block optimzation products, so I can’t comment, but I do wonder how they make the space saved to compression available to the user? We agree that savings are absolutely data dependent. In general, deduplication alone offers more savings than compression alone, and both together give the best results by far. Perhaps we can work together to ensure Albireo and Storwize yield optimal results?
I particularly liked the “FUD” and “NON-FUD” dueling chart approach to storage marketing! We’ve been selling in-line primary storage deduplication, compression (with encryption as an included, no-cost check box item) for Windows 2008 and Linux servers to OEMs for some months. Called the BitWackr, Exar’s differentiator is the use of specialized silicon to perform hashing, compression and encryption simultaneously and at line speed. With an MSRP of less than $1,000.00 for the silicon/software combination (the silicon is supplied on a PCIe card and is the same used in leading VTLs), deduplication AND compression together has never been easier or less expensive. A Windows 2008 server, for example, can become a deduplication appliance in about 15 minutes. The product was developed under the direction of John Matze who was just recognized by CRN as one of 2010′s Storage Visionaries.