Storage’s 2010 Hottest Technology


Each year there tends to be one technology that stands out in the storage space.  In 2009 it was data deduplication.  At the end of 2008 EMC made an acquisition of a source based deduplicaiton solution called Avamar.  Later, in 2009, they announced a strategic partnership with Quantum for data deduplication at the target.  Then in 2009 EMC made a bid against NetApp for Data Domain and won.  In addition, NetApp had data deduplication announcements with its ASIS technology.  Quantum, Falconstor, and Symantec all had their own story with data deduplication and a host of non-public companies such as Permabit, Sepaton, and Exagrid all were talking about the merits of data deduplication.

As the story goes, if you haven’t put data deduplication in your backup environment yet you’re either in an environment where there is not one iota of duplicate data, which is highly unlikely, or the company you work for has gobs of money and has no problem:

  1. Backing up to slow tape
  2. No worries about slow recovery from tape
  3. Keeping massive amounts of data on unreliable tape
  4. Backing up full streams of data to disk (and wasting valuable storage space)

What I am saying is that if you haven’t implemented a data deduplication solution by now, you have been left in the technology dust.  Data deduplication just makes too much sense.  I know we have all heard the expression “No one ever got fired for buying X.”  But has anyone ever got promoted because they bought X?  I have to believe that the IT team that can save their company 50% or more of their storage will get promoted.  Storage is a cost drain on IT.  It’s the applications that make a company money.  Its time to start focusing some of those valuable IT dollars on the applications that make your company money, its time to be the IT Super Hero!

Real-time, Random Access Compression in 2010

In 2010 the main topic seems to be optimization for primary storage.  There have been a number of industry articles just in the last 10 days that discuss primary storage optimization.  The reality is that if you are going to make an impact on storage growth you need to attack the problem where it starts, with primary storage.  There are a number of reasons for this.  First, if you listen to the Webinar put out by Storwize and IBM - John Powers from IBM states that “the industry is out of tricks” when it comes to disk drives continuing on the aerial density curve to give users 2x as much capacity in the same space.  New technologies have to be used in order to maintain this trend (at least for the foreseeable future) and this is how compression or capacity optimization will play a key role in the disk drive, aerial density situation.  Additionally, optimization technologies can play a key role in helping the cost of SSD drives become more competitive.

It is important to note, as most of the articles written in the last few days point out, in order to do primary storage optimization, there can be no impact to performance.  The important point W. Curtis Preston points out  in his article, “Dedupe and Compression Cuts Storage Down to Size” is “The No. 1 rule when introducing a change in your primary data storage system is primum non nocere, or “First, do no harm.””  This means vendors who provide optimization solutions to customers cannot impact the fundamental reasons why end users buy storage.  Jeff Byren and Jeff Boles of the Taneja Group published an article recently in Infostor “Consider Compression for Primary Storage Optimization” that also states:

…”We believe that a data reduction technology must meet the following criteria to be considered PSO-capable in the enterprise:

• Reliably and consistently reduce primary storage capacity requirements by 50% or more (depending on the data type)
• Do not degrade performance of primary storage in terms of I/O or latency, even for data streams that are fully sequential or completely random I/O
• Completely preserve the original data set
• Provide full transparency (requires no changes to existing IT infrastructure or processes)”…

Providing users with a solution that can reduce storage capacity by 50% or greater has a significant impact to a company’s overall storage costs, but it cannot negatively impact the existing requirements for performance and availability or change any process.  Doing so could negate the value of an optimization solution.

Where to Optimize?

Ocarina Networks recently published a blog stating that they do ‘in-line’ storage optimization for primary storage.  (The post did focus on the fact that they do prefer to operate in a post-process mode.  This probably has much do to the impact the solution would have on the bullets outlined from the Taneja article.)  As we look to the different technologies that can fit the bill at being able to provide ‘real-time’ optimization for primary storage, we turn to Dave Vellante’s piece on “Dedupe Rates Matter… Just Not as Much as You Think”.  Here Vellante takes a scientific approach to proving how optimization adds real value to the end user.  Dave’s CORE (Capacity Optimization Ratio Effectiveness) ties together:

  1. Optimization ratio
  2. Horsepower required to achieve this ratio
  3. Cost

Dave states that if the solution does not achieve a CORE above 1000, then there is no point trying to use it in real-time.  This does not mean that real-time optimization can’t be done with the solutions he lists here, but in order to do so, it would require throwing much more horsepower at the solution that it would drive the cost of the solution too high.

And finally, Tom Trainer from Network Computing wrote a nice piece “Storwize Focuses on Optimization Without Compromise”.  Now just to be clear, I am a Storwize employee and Tom’s piece as very positive on Storwize.  Interestingly though, piece actually talks about customers who have used the Storwize technology to solve real storage challenges without negatively impacting their storage performance or processes.  (There is also a Wikibon article that points to a customer, Shopzilla, who uses Storwize to save on their storage capacity without any performance degradation.)

At the end of the day, my point is that primary storage optimization is quickly becoming Storage’s 2010 hottest technology.  There have been a number of articles on the topic all saying the same thing, if your going to deploy primary storage optimization, make sure you preserve all of the characteristics of that primary storage including, performance, availability, and transparency to all applications (including your storage functions such as snapshots) as well as preserve all of the downstream IT processes for that storage.  Vendors without real solutions are worried that the startups with great solutions will impact their disk sales.  They are being very short sighted.  Primary storage optimization is too important to the customer, it adds too much value.  Additionally, having been in the storage business for 20 years, disk is elastic.  Any time you provide solutions that allow customers to save storage space, end users find reasons to fill it up.  Don’t be left behind on this trend, the technology is available for IT to start saving a great deal of money on storage capacity, floor space, power and cooling and focusing their IT budgets on solutions that make their company money.  Become that IT Super Hero!

Post to Twitter

About the Author

Steve Kenniston - The Storage Alchemist.

Comments (2)

Trackback URL | Comments RSS Feed

  1. Carter says:

    The Wikibon CORE idea – a formula for weighing the different elements that matter in online data reduction – is a good one. The CORE formula itself is extremely flawed. It puts a ridiculously disproportionate value on “time to compress”, which is why Storewize gets such a good score. I’ll write up a full post on this soon – but the way this formula works, if you got 1% compression but did it really fast, you’d get a great score.

    The idea of the formula is right, but you need to take more than vendors’ random claims for how well they shrink data for the Capacity Reduction score, and you need balance in the formula that represents users real-world priorities. Performance IS important, but is “time to compress” even the most important performance metric? Maybe time-to-decompress (which is when users feel the effect) is more important?

    I don’t mind this sort of thing, right up to the point where I feel a vendor is trying to mislead or fool customers. This is right at the edge of that. Don’t you have to at least do something useful before it matters how fast you do it?

  2. Carter,
    First, thanks for taking the time and providing a thoughtful comment. My only question is – how come none of my comments on your blog have shown up? Ocarina is not practicing the same kind of censorship as EMC are they? :) JK

    Carter – I really take issue with anyone saying CORE is flawed. The Wikibon guys took me through it as well and I actually happen to know David Floyer, their CTO, very well. His logic is on target. Remember – the ultimate value of CORE – Capacity Optimization Ratio EFFECTIVENESS is in the “E” for customers.

    I agree with your last sentence – “Don’t you have to at least do something useful before it matters how fast you do it?” The answer – of course, but you only pay for products that add value. Some people are complaining online that if you just get 1% compression and it happens really fast, then the CORE number is unreasonably high. But I would argue that it SHOULD be a high CORE number because it is “E”ffective for Customers. Now, will they pay extra for it? Not likely, but they will use it. A good example would be extra performance on a Nehalem. If it helps, the Customer uses it, but will he pay an extra adder from another vendor for that 1%? Not likely. In this case the “C”ost of the solution would be zero, and the CORE number would thus be infinity. The point is that if you add something that adds “E”ffectiveness, and if it does not cost anything from a CPU or memory resource aspect of the system as it is currently designed, that is the optimal solution. Period.

    EVERY answer in IT is ‘It Depends’. Is doing compression in ‘real-time’ with no impact to performance (fast), applications or deduplication for backups, more or less “E”ffective than post process deduplication? Ask this question to 100 customers. See what the answer is – I’ll even help sponsor a survey with you guys, done independently, to see what they would say. Time to compress is highly critical IF you are going to do ‘real-time’ data compression.

    I am sure there are a number of instances and use cases that fall out-side the real-time, random access compression needs that are a huge fit for Ocarina, NTAP, EMC, etc… but if you want to do compression, in real-time, in an enterprise environment, in front of a customers active solution, you need to be very fast, agree? Customers buy storage for two reasons, performance and availability. Any vendor who thinks they can sit in front of that needs to preserve these characteristics. This is ultimately what CORE shows, Storwize does not impact end user performance.

    The Wikibon CORE is based on research – not a ‘vendor’ trying to ‘fool’ customers. I also agree that time to decompress is also important. I would say end users would see compress and decompress (read / write) as equal however. Again, CORE and the effectiveness is important when looking at adding value throughout the entire data life cycle. Example, if I use Ocarina deduplication, but have already purchased Data Domain, don’t I need to re-hydrate the Ocarina deduplicated, primary storage data before I use Data Domain? They say you do. That means I don’t really save on my primary storage if I need the space to re-hydrate before I back it up and that also means processing time on the array. Storwize, with random access compression doesn’t require decompression. (See the white paper on the web site, Storwize improves data deduplication.)

    Lets just be honest about a Product’s capabilities and allow the Product to prove itself within a customers environment, including things like ‘Does the customer need to change the application?’ ‘Does the solution integrate with deduplication?’ ‘Does the solution impact storage performance?” At the end of the day there are use cases for post process deduplication, and there are use cases for real-time compression. The customers need to decide where to best apply these technologies, and CORE seems to be a useful tool to help them gravitate toward the right solution for their various problems.

Leave a Reply




If you want a picture to show with your comment, go get a Gravatar.