Betamax Redux
I often joke w/ customers that when my friends were growing up they would dream of being a professional baseball player or a rock star and I used to dream of becoming a data protection technologist. Recently I read something very profound in Chuck Hollis’s internal EMC blog. Chuck said, “Decide what you’re passionate about …and write about it… it is hard to write about stuff you don’t care about.” I am passionate about data protection. Not because data proteciton is “cool” or anything, but it is one of the most important practices in the data center. It is also one of the most challenging practices in the data center and it involes not just technology but people and process as well. I had an old boss once who said, “Where there is chaos, there is cash.” and given the fact that the data protection market is a $10B market, I would say he was correct. I have started this blog along with my colleagues because we truly believe in what we do, who we work for, the challenges we solve and benefits we bring to a customers challenging world around data protection. We write because we are passionate about data protection, not because we are being paid to.
Something I read a while ago in Tony Assaro’s blog, Leaders Dilemma as well as Setting the Record Straight really got me charged up but I wasn’t sure how I wanted to comment. Tony, you see, writes for money (not passion), which means he has to write ‘for’ the company that is paying him and at the same time, spend time ‘Manufacturing Confusion’ in the market. (Sorry Tony, I liked you better as an analyst when you heard all the vendors product messages and would form an opinion about what was really going on in the market.) What I am referring to are the comments specifically about “EMC is the one big player going after this market in earnest with three different products (which will confuse the market and themselves)”. Quite frankly, EMC’s philosophy and message to its customers regarding data deduplication isn’t confusing at all. In fact when I speak with our customers, they believe we have one of the more thoughtful and consistent messages around this topic. So in an effort to educate, let me share EMC’s data deduplication philosophy and how EMC will take backup, beyond. EMC will:
- Provide deduplication as a pervasive & architecturally consistent service
- Coordinate deduplication throughout combinations of data storage and data movement
- Deduplicate at the highest level of abstraction
- Deduplicate as close to the source as practical
When these values are leveraged, the entire spectrum of data protection morphs into methods that will be used to protect data well into the future.
Back to the subject of the blog. Data Domain will continue to sell good products to customers. Data Domain will continue to innovate their existing technology to meet customers’ demands. But they will do this at the expense of a lack of innovation. Remember, the hardest thing to change in IT is process, not technology. Backing data up to disk targets is nothing new and now, backing data up to disk devices that perform deduplication is not innovative. However, the paradigm of using traditional backup software to move full files across an expensive network is beginning to evolve. It MUST evolve, and when it does, what happens to the companies that have interesting features that are just one small morsel in the food chain? If you don’t own any significant IP in the extended processes that is data protection, then you will be left out of the backup buffet. And as Maslow would say, “If all you have is a hammer then everything looks like a nail.”
EMC has taken a leadership position in the data deduplication space not because they offer multiple products but because of the way we look at technology. Data deduplication is made up of different components:
- Data ‘chunking’
- Compression / Encryption
- Assign Content ID
- Store
The goal is to be able to leverage these components across multiple storage platforms providing deduplication at the highest level of abstraction as possible and as close to the sorce as practical based on the requirementsof the application . Preserve the content by deduplicating content instead of data. The objective, over time, is to provide deduplication as a pervasive and architecturally consistent service across EMC’s entire storage portfolio. When you do this the entire paradigm of protecting information evolves and this is why EMC is the leader in data deduplication. Not because we have 3 (or however many) products, but because of the way in which we look at data deduplication.
At the end of the day EMC has over 2000PB of deduplicated data under protection utilizing both source and target based deduplication solutions. And, I would venture to estimate that if you include NetWorker, RecoverPoint etc… EMC has exabytes of data under protection. EMC has a long history of changing with the times, listening to their customers, investing in new technologies and protecting customers data they way they want and need it to be protected. That is taking backup, beyond.
Posted by Steve Kenniston




Hi guys — thought I’d drop by and see how y’all are doing!
Nice blog — you’re off to a wonderful beginning!
– Chuck
Thanks Chuck and all – We appreciate the support.
Nice blog post, as someone who is also passionate about data protection, I’m glad to see you creating some focus around this important, yet oft neglected IT dicipline.
Of course, having had a long history with NetWorker, and now working for one of EMC’s competitors, we probably wont always see eye to eye, nonetheless, I look forward to seeing more from you on your road to recovery.
I like your post and think this blog is great. When you say, “duplicate as close to the source as possible” how does that get applied practically between source-based and target-based dedupe. If understand it correctly even the DL line has edge-to-core capability so it could be close to the source?
Matt – Great question. The thought process is, where it makes the most sense, deduplicate as close to the source as possilbe. This means that leverage deduplication as early in the process as possible. For example, primary storage and soruce based backup deduplication are great fits for ‘as close to the source as possible’ but an Oracle database with a high transaction rate wouldn’t be a good fit for ‘as close to the soruce as possible’. So we are trying to define use cases that give clear direction on where you get the most bang for you buck without impacting performance.
Thanks for the answer…one other thing I hear from sales people and industry types is that EMC is adding or selling target based dedupe with/into the VTL customer base. It seems like a natural extenstion for EMC and a huge market. One industry analyst estimated (to me) that EMC owns at least 60% of the VTL market – the size of which I don’t now recall?- and that would be a good way to approximate the opportunity for QTM-based dedupe with EMC i.e. dedupe with complement or leap frog the VTL market. This something Data Domain just won’t really penetrate…how should I think about this?